From aradwen at gmail.com Sat May 1 06:45:18 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Sat, 1 May 2010 12:45:18 +0200 Subject: [Bioperl-l] Pfam_Scan Message-ID: Hello everyone, I would like to know if there is a way to cluster the output of Pfam_Scan results. I mean is we can parse it and then output clusters containing sequences sharing the same domains or Pfams. This is a bit special since we could have multidomains proteins inside, which rule we have to follow in this case ? Rad -- R. ANIBA From David.Messina at sbc.su.se Sat May 1 18:28:48 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 00:28:48 +0200 Subject: [Bioperl-l] Pfam_Scan In-Reply-To: References: Message-ID: <6CA3B4F2-CF3E-45DD-BE51-9F7218C5CEE9@sbc.su.se> Hi Rad, As far as I can tell the Pfam_Scan output is simply tab-delimited text (see details below), so you should be able to group sequences which share domains by sorting on the sixth column. I suspect that sequences with multiple domain hits will have multiple lines in the output, one per hit, so if you want to identify sequences which share the same _set_ of domains you will have to do the bookkeeping yourself. That being said, Pfam_Scan is not part of BioPerl ? it's distributed by the Pfam team ? so you may want to contact them directly for help (pfam-help at sanger.ac.uk). Dave [from the Pfam_Scan documentation] The output format is: Example output (with -pfamB, -as options): Q5NEL3.1 2 224 2 227 PB013481 Pfam-B_13481 Pfam-B 1 184 226 358.5 1.4e-107 NA NA O65039.1 38 93 38 93 PF08246 Inhibitor_I29 Domain 1 58 58 45.9 2.8e-12 1 No_clan O65039.1 126 342 126 342 PF00112 Peptidase_C1 Domain 1 216 216 296.0 1.1e-88 1 CL0125 predicted_active_site[150,285,307] From David.Messina at sbc.su.se Sun May 2 04:54:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 10:54:54 +0200 Subject: [Bioperl-l] RFC: SNP::Inherit In-Reply-To: References: Message-ID: Hi Christopher, Looks good! The only recommendation I would make is to change the namespace to Bio::SNP::Inherit. The convention on CPAN is to minimize the number of new toplevel namespaces (which SNP would be), and although many of the Bio::* modules are part of BioPerl, that namespace is not restricted to BioPerl and there are plenty of non-BioPerl packages there. Dave On Apr 29, 2010, at 10:26 PM, Christopher Bottoms wrote: > Dear Bioperl community, > > I was thinking of uploading a module to CPAN that converts SNP genotype data > to parental allele designations. Below is the perldoc. This is not a > "BioPerl" module per se, so I'm not sure what namespace to put it under. > > I would be glad to send anyone the source if they are interested in checking > it out more. I just did not want to send everyone an unsolicited attachment. > > Thank you for your time, > Christopher Bottoms (molecules) > From David.Messina at sbc.su.se Sun May 2 05:59:07 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 11:59:07 +0200 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <4BDA986D.3020302@bii.a-star.edu.sg> References: <4BDA986D.3020302@bii.a-star.edu.sg> Message-ID: <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> Hi Dimitar, The syntax you want is: # Build a Genewise alignment factory my $factory = Bio::Tools::Run::Genewise->new(); # turn on the quiet switch $factory->QUIET(1); # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects my @genes = $factory->run($protein_seq, $genomic_seq); This turns out be incorrectly documented on the man page, at least in part: > Available Params: > > NB: These should be passed without the '-' or they will be ignored, > except switches such as 'hmmer' (which have no corresponding value) > which should be set on the factory object using the AUTOLOADed methods > of the same name. > > Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] > Alg [-kbyte,-alg] > HMM [-hmmer] > Output [-gff,-gener,-alb,-pal,-block,-divide] > Standard [-help,-version,-silent,-quiet,-errorlog] That is, these don't work as expected: $factory->quiet; $factory->quiet(1); due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. Dave From maj at fortinbras.us Sun May 2 15:28:22 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 2 May 2010 15:28:22 -0400 Subject: [Bioperl-l] new core developers Rob Buels and Dave Messina Message-ID: Hi Folks, On behalf of the core team, I am delighted to announce two new members: Rob Buels and Dave Messina. They are so, er, honored on the basis of their selfless work on the list, on IRC, in development of new modules and their active and sustained participation in BioPerl maintenance, design and promotion. Welcome Rob and Dave! MAJ and the BioPerl core developers From skastu01 at students.poly.edu Sun May 2 22:41:04 2010 From: skastu01 at students.poly.edu (Lakshmi Kastury) Date: Mon, 3 May 2010 02:41:04 +0000 Subject: [Bioperl-l] Using BIO::SEARCHIO Message-ID: I am attempting to use the BIO::SEARCHIO system to parse a Blast output file. A new instance is he file is read through the following: my $input = new BIO::SearchIO (-file =>'blast_report_0.txt', -format =>'blast'); When I run my program, I receive the following message: "Can't locate object method "new" via package "BIO::SearchIO" (perhaps you forgot to load "BIO::SearchIO"? Is this an optional module which needs to be installed separately? Thanks, Lakshmi Kastury From maj at fortinbras.us Sun May 2 22:57:28 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 2 May 2010 22:57:28 -0400 Subject: [Bioperl-l] Using BIO::SEARCHIO In-Reply-To: References: Message-ID: you need to say "Bio::SearchIO", and not "BIO::SearchIO" MAJ ----- Original Message ----- From: "Lakshmi Kastury" To: Sent: Sunday, May 02, 2010 10:41 PM Subject: [Bioperl-l] Using BIO::SEARCHIO > > > > > > > > > > > > I am attempting to use the BIO::SEARCHIO system to parse a Blast output file. > > A new instance is he file is read through the following: > my $input = new BIO::SearchIO (-file =>'blast_report_0.txt', -format > =>'blast'); > > When I run my program, I receive the following message: > "Can't locate object method "new" via package "BIO::SearchIO" (perhaps you > forgot to load "BIO::SearchIO"? > > Is this an optional module which needs to be installed separately? > > > > Thanks, > Lakshmi Kastury > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 3 00:22:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 2 May 2010 23:22:46 -0500 Subject: [Bioperl-l] Full bioperl-live github demo Message-ID: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> All, I have pushed a demo of the bioperl-live (all branches and tags) to github here: http://github.com/bioperl/bioperl-test This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. chris From heikki.lehvaslaiho at gmail.com Mon May 3 07:45:10 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 3 May 2010 14:45:10 +0300 Subject: [Bioperl-l] BLAST parsing broken Message-ID: Chris, latest additions to Bio::SearchIO::blast.pm broke the parsing of normal blast output. $result->query_name returns now undef. (Using the anonymous git now). This change still works: commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 Author: cjfields Date: Sun Dec 20 04:39:58 2009 +0000 Robson's patch for buggy blastpgp output But this does not: commit 9a89c3434597104dd50553e3562983d78d14a544 Author: cjfields Date: Thu Apr 15 04:21:17 2010 +0000 [bug 3031] patches for catching algorithm ref, courtesy Razi Khaja. That makes it easy to find the diffs: $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm index 378023a..6f7eeeb 100644 --- a/Bio/SearchIO/blast.pm +++ b/Bio/SearchIO/blast.pm @@ -209,6 +209,7 @@ BEGIN { 'BlastOutput_program' => 'RESULT-algorithm_name', 'BlastOutput_version' => 'RESULT-algorithm_version', + 'BlastOutput_algorithm-reference' => 'RESULT-algorithm_reference', 'BlastOutput_query-def' => 'RESULT-query_name', 'BlastOutput_query-len' => 'RESULT-query_length', 'BlastOutput_query-acc' => 'RESULT-query_accession', @@ -504,6 +505,26 @@ sub next_result { } ); } + # parse the BLAST algorithm reference + elsif(/^Reference:\s+(.*)$/) { + # want to preserve newlines for the BLAST algorithm reference + my $algorithm_reference = "$1\n"; + $_ = $self->_readline; + # while the current line, does not match an empty line, a RID:, or a Database:, we are still looking at the + # algorithm_reference, append it to what we parsed so far + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { + $algorithm_reference .= "$_"; + $_ = $self->_readline; + } + # if we exited the while loop, we saw an empty line, a RID:, or a Database:, so push it back + $self->_pushback($_); + $self->element( + { + 'Name' => 'BlastOutput_algorithm-reference', + 'Data' => $algorithm_reference + } + ); + } # added Windows workaround for bug 1985 elsif (/^(Searching|Results from round)/) { next unless $1 =~ /Results from round/; I am not sure why reference parsing messes things up. Maybe it eats too many lines from the result file. Yours, -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia From cjfields at illinois.edu Mon May 3 08:08:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 07:08:01 -0500 Subject: [Bioperl-l] BLAST parsing broken In-Reply-To: References: Message-ID: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Odd, I ran tests on that prior to commit. I'll work on fixing that (in svn of course, until the migration is complete). chris On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > Chris, > > latest additions to Bio::SearchIO::blast.pm broke the parsing of normal > blast output. $result->query_name returns now undef. > > (Using the anonymous git now). This change still works: > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > Author: cjfields > Date: Sun Dec 20 04:39:58 2009 +0000 > > Robson's patch for buggy blastpgp output > > But this does not: > > commit 9a89c3434597104dd50553e3562983d78d14a544 > Author: cjfields > Date: Thu Apr 15 04:21:17 2010 +0000 > > [bug 3031] > > patches for catching algorithm ref, courtesy Razi Khaja. > > That makes it easy to find the diffs: > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > index 378023a..6f7eeeb 100644 > --- a/Bio/SearchIO/blast.pm > +++ b/Bio/SearchIO/blast.pm > @@ -209,6 +209,7 @@ BEGIN { > > 'BlastOutput_program' => 'RESULT-algorithm_name', > 'BlastOutput_version' => 'RESULT-algorithm_version', > + 'BlastOutput_algorithm-reference' => 'RESULT-algorithm_reference', > 'BlastOutput_query-def' => 'RESULT-query_name', > 'BlastOutput_query-len' => 'RESULT-query_length', > 'BlastOutput_query-acc' => 'RESULT-query_accession', > @@ -504,6 +505,26 @@ sub next_result { > } > ); > } > + # parse the BLAST algorithm reference > + elsif(/^Reference:\s+(.*)$/) { > + # want to preserve newlines for the BLAST algorithm reference > + my $algorithm_reference = "$1\n"; > + $_ = $self->_readline; > + # while the current line, does not match an empty line, a RID:, > or a Database:, we are still looking at the > + # algorithm_reference, append it to what we parsed so far > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > + $algorithm_reference .= "$_"; > + $_ = $self->_readline; > + } > + # if we exited the while loop, we saw an empty line, a RID:, or > a Database:, so push it back > + $self->_pushback($_); > + $self->element( > + { > + 'Name' => 'BlastOutput_algorithm-reference', > + 'Data' => $algorithm_reference > + } > + ); > + } > # added Windows workaround for bug 1985 > elsif (/^(Searching|Results from round)/) { > next unless $1 =~ /Results from round/; > > > I am not sure why reference parsing messes things up. Maybe it eats too many > lines from the result file. > > Yours, > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon May 3 08:25:10 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 3 May 2010 08:25:10 -0400 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> Message-ID: Hi Chris, I attempted a clone and got the following. Is this my problem? thanks MAJ $ git clone http://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ Getting alternates list for http://github.com/bioperl/bioperl-test.git Getting pack list for http://github.com/bioperl/bioperl-test.git Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, May 03, 2010 12:22 AM Subject: [Bioperl-l] Full bioperl-live github demo > All, > > I have pushed a demo of the bioperl-live (all branches and tags) to github > here: > > http://github.com/bioperl/bioperl-test > > This is separate from the 'bioperl-live' repo at the same github account for > the time being. The conversion was performed using svn2git (the gitorious > C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), > using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and > rerun can be performed very quickly. The actual conversion of the entire > bioperl repo took very little time, actually (less than 3 minutes). I think, > with some additional small work using the svn2git rules pretty much everything > is ready for migration. > > In this run, all subversion tags are converted to git tags (branches remain > git branches as expected). Just in case I'm missing something, I would like > everyone to take a look at this, though. In particular, I would like to make > sure tags and branches are as they are expected. So far I haven't seen > anything that stands out as odd. > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 3 09:07:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 08:07:46 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> Message-ID: <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): cjfields$ git clone git://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ remote: Counting objects: 86737, done. remote: Compressing objects: 100% (22309/22309), done. remote: Total 86737 (delta 64759), reused 85957 (delta 63979) Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. Resolving deltas: 100% (64759/64759), done. For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? chris On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > Hi Chris, > I attempted a clone and got the following. Is this my problem? > thanks MAJ > > $ git clone http://github.com/bioperl/bioperl-test.git > > Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ > Getting alternates list for http://github.com/bioperl/bioperl-test.git > Getting pack list for http://github.com/bioperl/bioperl-test.git > Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c > Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 > Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c > which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f > error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile > fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed > > > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, May 03, 2010 12:22 AM > Subject: [Bioperl-l] Full bioperl-live github demo > > >> All, >> >> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >> >> http://github.com/bioperl/bioperl-test >> >> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >> >> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 3 09:19:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 08:19:17 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <8796492301724F2CA132F97AE57C2700@NewLife> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> Message-ID: Added you in. SSH access should work with any ssh keys you have set in github. We can play around with this for the time being (try post commit hooks, etc), but obviously can't make any serious commits to it until we are ready for complete migration; everything will still need to go to dev svn until then. Also noticed that we are topping the account out at the moment, but removing the old read-only repos should help. May need to think about that in the long-term. chris On May 3, 2010, at 8:13 AM, Mark A. Jensen wrote: > That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with > majensen > cheers Chris- MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Monday, May 03, 2010 9:07 AM > Subject: Re: [Bioperl-l] Full bioperl-live github demo > > > This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): > > cjfields$ git clone git://github.com/bioperl/bioperl-test.git > Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ > remote: Counting objects: 86737, done. > remote: Compressing objects: 100% (22309/22309), done. > remote: Total 86737 (delta 64759), reused 85957 (delta 63979) > Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. > Resolving deltas: 100% (64759/64759), done. > > For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? > > chris > > On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > >> Hi Chris, >> I attempted a clone and got the following. Is this my problem? >> thanks MAJ >> >> $ git clone http://github.com/bioperl/bioperl-test.git >> >> Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ >> Getting alternates list for http://github.com/bioperl/bioperl-test.git >> Getting pack list for http://github.com/bioperl/bioperl-test.git >> Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 >> Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f >> error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile >> fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed >> >> >> ----- Original Message ----- From: "Chris Fields" >> To: "BioPerl List" >> Sent: Monday, May 03, 2010 12:22 AM >> Subject: [Bioperl-l] Full bioperl-live github demo >> >> >>> All, >>> >>> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >>> >>> http://github.com/bioperl/bioperl-test >>> >>> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >>> >>> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From maj at fortinbras.us Mon May 3 09:13:27 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 3 May 2010 09:13:27 -0400 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> Message-ID: <8796492301724F2CA132F97AE57C2700@NewLife> That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with majensen cheers Chris- MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Monday, May 03, 2010 9:07 AM Subject: Re: [Bioperl-l] Full bioperl-live github demo This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): cjfields$ git clone git://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ remote: Counting objects: 86737, done. remote: Compressing objects: 100% (22309/22309), done. remote: Total 86737 (delta 64759), reused 85957 (delta 63979) Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. Resolving deltas: 100% (64759/64759), done. For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? chris On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > Hi Chris, > I attempted a clone and got the following. Is this my problem? > thanks MAJ > > $ git clone http://github.com/bioperl/bioperl-test.git > > Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ > Getting alternates list for http://github.com/bioperl/bioperl-test.git > Getting pack list for http://github.com/bioperl/bioperl-test.git > Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c > Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 > Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c > which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f > error: file > /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack > is not a GIT packfile > fatal: packfile > /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack > cannot be accessed > > > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, May 03, 2010 12:22 AM > Subject: [Bioperl-l] Full bioperl-live github demo > > >> All, >> >> I have pushed a demo of the bioperl-live (all branches and tags) to github >> here: >> >> http://github.com/bioperl/bioperl-test >> >> This is separate from the 'bioperl-live' repo at the same github account for >> the time being. The conversion was performed using svn2git (the gitorious >> C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), >> using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and >> rerun can be performed very quickly. The actual conversion of the entire >> bioperl repo took very little time, actually (less than 3 minutes). I think, >> with some additional small work using the svn2git rules pretty much >> everything is ready for migration. >> >> In this run, all subversion tags are converted to git tags (branches remain >> git branches as expected). Just in case I'm missing something, I would like >> everyone to take a look at this, though. In particular, I would like to make >> sure tags and branches are as they are expected. So far I haven't seen >> anything that stands out as odd. >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 3 10:04:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 09:04:16 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <8796492301724F2CA132F97AE57C2700@NewLife> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> Message-ID: <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> I like this: http://github.com/bioperl/bioperl-test/graphs/impact Kinda cool yet scary. chris On May 3, 2010, at 8:13 AM, Mark A. Jensen wrote: > That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with > majensen > cheers Chris- MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Monday, May 03, 2010 9:07 AM > Subject: Re: [Bioperl-l] Full bioperl-live github demo > > > This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): > > cjfields$ git clone git://github.com/bioperl/bioperl-test.git > Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ > remote: Counting objects: 86737, done. > remote: Compressing objects: 100% (22309/22309), done. > remote: Total 86737 (delta 64759), reused 85957 (delta 63979) > Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. > Resolving deltas: 100% (64759/64759), done. > > For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? > > chris > > On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > >> Hi Chris, >> I attempted a clone and got the following. Is this my problem? >> thanks MAJ >> >> $ git clone http://github.com/bioperl/bioperl-test.git >> >> Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ >> Getting alternates list for http://github.com/bioperl/bioperl-test.git >> Getting pack list for http://github.com/bioperl/bioperl-test.git >> Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 >> Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f >> error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile >> fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed >> >> >> ----- Original Message ----- From: "Chris Fields" >> To: "BioPerl List" >> Sent: Monday, May 03, 2010 12:22 AM >> Subject: [Bioperl-l] Full bioperl-live github demo >> >> >>> All, >>> >>> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >>> >>> http://github.com/bioperl/bioperl-test >>> >>> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >>> >>> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mnrusimh at gmail.com Mon May 3 18:42:41 2010 From: mnrusimh at gmail.com (Ram Podicheti) Date: Mon, 03 May 2010 18:42:41 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID Message-ID: <4BDF5161.4030209@gmail.com> Is there a way to obtain the Ensembl Gene ID from an Entrez Gene ID? In other words, I am hoping to get 'ENSMUSG00000029372' as the output when I supply 57349. Many thanks, Ram Podicheti From sdavis2 at mail.nih.gov Mon May 3 19:14:58 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 3 May 2010 19:14:58 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID In-Reply-To: <4BDF5161.4030209@gmail.com> References: <4BDF5161.4030209@gmail.com> Message-ID: On Mon, May 3, 2010 at 6:42 PM, Ram Podicheti wrote: > Is there a way to obtain the Ensembl Gene ID from an Entrez Gene ID? In > other words, I am hoping to get 'ENSMUSG00000029372' as the output when > I supply 57349. > Check out the Biomart interface to Ensembl. You can supply any type of ID as a filter and get back gene information, including the ID, that map to that ID. I believe there is a perl interface to biomart, but I haven't used it to comment directly. There is also an R/Bioconductor interface. Sean From mnrusimh at gmail.com Mon May 3 20:42:49 2010 From: mnrusimh at gmail.com (Ram Podicheti) Date: Mon, 03 May 2010 20:42:49 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID In-Reply-To: References: <4BDF5161.4030209@gmail.com> Message-ID: <4BDF6D89.2000408@gmail.com> Thanks Sean, that definitely helped. Ram Sean Davis wrote: > > > On Mon, May 3, 2010 at 6:42 PM, Ram Podicheti > wrote: > > Is there a way to obtain the Ensembl Gene ID from an Entrez Gene > ID? In > other words, I am hoping to get 'ENSMUSG00000029372' as the output > when > I supply 57349. > > > Check out the Biomart interface to Ensembl. You can supply any type > of ID as a filter and get back gene information, including the ID, > that map to that ID. I believe there is a perl interface to biomart, > but I haven't used it to comment directly. There is also an > R/Bioconductor interface. > > Sean > From razi.khaja at gmail.com Tue May 4 13:55:00 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Tue, 4 May 2010 13:55:00 -0400 Subject: [Bioperl-l] BLAST parsing broken In-Reply-To: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: That is odd. Heikki, do you have a blast output file that produces this error? Could you attach the file and either send to the list or myself (if the list does not accept attachments). Thanks, Razi On Mon, May 3, 2010 at 8:08 AM, Chris Fields wrote: > Odd, I ran tests on that prior to commit. I'll work on fixing that (in svn > of course, until the migration is complete). > > chris > > On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > > > Chris, > > > > latest additions to Bio::SearchIO::blast.pm broke the parsing of normal > > blast output. $result->query_name returns now undef. > > > > (Using the anonymous git now). This change still works: > > > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > Author: cjfields > > Date: Sun Dec 20 04:39:58 2009 +0000 > > > > Robson's patch for buggy blastpgp output > > > > But this does not: > > > > commit 9a89c3434597104dd50553e3562983d78d14a544 > > Author: cjfields > > Date: Thu Apr 15 04:21:17 2010 +0000 > > > > [bug 3031] > > > > patches for catching algorithm ref, courtesy Razi Khaja. > > > > That makes it easy to find the diffs: > > > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > > index 378023a..6f7eeeb 100644 > > --- a/Bio/SearchIO/blast.pm > > +++ b/Bio/SearchIO/blast.pm > > @@ -209,6 +209,7 @@ BEGIN { > > > > 'BlastOutput_program' => 'RESULT-algorithm_name', > > 'BlastOutput_version' => 'RESULT-algorithm_version', > > + 'BlastOutput_algorithm-reference' => > 'RESULT-algorithm_reference', > > 'BlastOutput_query-def' => 'RESULT-query_name', > > 'BlastOutput_query-len' => 'RESULT-query_length', > > 'BlastOutput_query-acc' => 'RESULT-query_accession', > > @@ -504,6 +505,26 @@ sub next_result { > > } > > ); > > } > > + # parse the BLAST algorithm reference > > + elsif(/^Reference:\s+(.*)$/) { > > + # want to preserve newlines for the BLAST algorithm > reference > > + my $algorithm_reference = "$1\n"; > > + $_ = $self->_readline; > > + # while the current line, does not match an empty line, a > RID:, > > or a Database:, we are still looking at the > > + # algorithm_reference, append it to what we parsed so far > > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > > + $algorithm_reference .= "$_"; > > + $_ = $self->_readline; > > + } > > + # if we exited the while loop, we saw an empty line, a RID:, > or > > a Database:, so push it back > > + $self->_pushback($_); > > + $self->element( > > + { > > + 'Name' => 'BlastOutput_algorithm-reference', > > + 'Data' => $algorithm_reference > > + } > > + ); > > + } > > # added Windows workaround for bug 1985 > > elsif (/^(Searching|Results from round)/) { > > next unless $1 =~ /Results from round/; > > > > > > I am not sure why reference parsing messes things up. Maybe it eats too > many > > lines from the result file. > > > > Yours, > > > > -Heikki > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > #4216 > > 4700 King Abdullah University of Science and Technology (KAUST) > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue May 4 14:18:02 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 4 May 2010 14:18:02 -0400 Subject: [Bioperl-l] parsing GenBank file Message-ID: Hi All, i have a huge GenBank file ( downloaded from RDP containing all bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). I wrote a simple script for this: #!/usr/bin/perl -w use Bio::SeqIO; my $seqio_object = Bio::SeqIO->new(-file => "$ARGV[0]"); while(my $seq_object = $seqio_object->next_seq){ my $id = $seq_object->id; print "$id\t"; my $species_object = $seq_object->species; my @classification = $seq_object->species->classification; foreach my $val (@classification){print "$val\t";} print "\n"; } I am getting the output like: S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root S000148973 uncultured Geothrix sp. Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root S000431649 uncultured Acidobacteria bacterium Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root .. .. This is the exact output i want, but i am missing lot of records (they are there in the genbank file but not in my output). I also got a warning during parsing: --------------------- WARNING --------------------- MSG: Unbalanced quote in: /db_xref="taxon:35783" /germline" /mol_type="genomic DNA" /organism="Enterococcus sp." /strain="LMG12316"No further qualifiers will be added for this feature --------------------------------------------------- So i was just wondering that is this warning message causing that problem or i am doing something wrong? Thanks Shalabh From jay at jays.net Tue May 4 23:30:25 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 4 May 2010 22:30:25 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? Message-ID: $work[0] wants me to fire up Buildbot + Smolder to know when and who broke our tests, and how quickly (or not) our test count is growing over time. Then #moose asked me if I could also host the same for Moose and Class::MOP. And $work[1] uses the heck out of BioPerl. So I'm wondering if I can leverage all my synergies somehow and also host for BioPerl. http://buildbot.net/trac http://sourceforge.net/projects/smolder/ Has anything happened since this 2008 thread?: Subject: Test coverage for BioPerl now available http://article.gmane.org/gmane.comp.lang.perl.bio.general/17731/match=smolder If this would be a Good Thing for BioPerl I could try to try... :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Wed May 5 00:24:51 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 4 May 2010 23:24:51 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: Message-ID: On May 4, 2010, at 10:30 PM, Jay Hannah wrote: > http://sourceforge.net/projects/smolder/ Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) http://search.cpan.org/perldoc?Smolder http://github.com/mpeters/smolder Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From dimitark at bii.a-star.edu.sg Wed May 5 02:58:21 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Wed, 05 May 2010 14:58:21 +0800 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> References: <4BDA986D.3020302@bii.a-star.edu.sg> <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> Message-ID: <4BE1170D.8040108@bii.a-star.edu.sg> Hi Dave, thank you for the tip. Now it works like a charm :) Greetings Dimitar On 05/02/2010 05:59 PM, Dave Messina wrote: > Hi Dimitar, > > The syntax you want is: > > # Build a Genewise alignment factory > my $factory = Bio::Tools::Run::Genewise->new(); > > # turn on the quiet switch > $factory->QUIET(1); > > # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects > my @genes = $factory->run($protein_seq, $genomic_seq); > > > This turns out be incorrectly documented on the man page, at least in part: > >> Available Params: >> >> NB: These should be passed without the '-' or they will be ignored, >> except switches such as 'hmmer' (which have no corresponding value) >> which should be set on the factory object using the AUTOLOADed methods >> of the same name. >> >> Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] >> Alg [-kbyte,-alg] >> HMM [-hmmer] >> Output [-gff,-gener,-alb,-pal,-block,-divide] >> Standard [-help,-version,-silent,-quiet,-errorlog] >> > > That is, these don't work as expected: > > $factory->quiet; > $factory->quiet(1); > > due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. > > And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. > > > So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. > > > Dave > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore email: dimitark at bii.a-star.edu.sg tel: +65 6478 8514 From dimitark at bii.a-star.edu.sg Wed May 5 03:06:04 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Wed, 05 May 2010 15:06:04 +0800 Subject: [Bioperl-l] about gene "boundaries" In-Reply-To: References: <4BD8357B.5030804@bii.a-star.edu.sg> <24714E9B-B3E5-4703-92F8-64483FA59AFC@illinois.edu> <4BD90F94.4040608@bii.a-star.edu.sg> Message-ID: <4BE118DC.7000806@bii.a-star.edu.sg> Hi Malcolm, thank you very much for that information. Didnt even know such program existed :) I now use 'blastdbcmd' for extraction of DNA sequence from my DB. I only had to reformat my DB with 'parse seqids' parameter in order to be able to give the 'entry' parameter to 'blastdbcmd'. Now my script is working. Thanx again. Cheers Dimitar On 04/30/2010 10:16 PM, Cook, Malcolm wrote: > Dimitar, > > Since you have indexed your database with makeblastdb, you might simply use `blastdbcmd` to extract, in fasta format, sub-sequences from the indexed database using identifiers and integer ranges > > blastdbcmd is included in the blast+ suite of programs, which also included makeblastdb which you report you have running. > > see: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/user_maual.pdf > > I've not (yet) used the blast+ suite (still using the old blast) so I've not tested this myself yet, but I think something like the following will work for you: > > blastdbcmd -db yourBlastDatabase -entry chr2 -range 100-300 -outformat fasta > > will extract chr2:100-300 from yourBlastDatabase > > Good Luck > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dimitar Kenanov > Sent: Wednesday, April 28, 2010 11:48 PM > To: Chris Fields; bioperl-l at bioperl.org; scott at scottcain.net; hrh at fmi.ch > Subject: Re: [Bioperl-l] about gene "boundaries" > > Hi guys, > today with rested head and after some reading i found the solution to my problem in BioPerl. Its Bio::DB::Fasta. It does what i want sufficiently well. > Thank you again for the help and im sorry for the trouble caused. > > Cheers > Dimitar > > On 04/28/2010 11:10 PM, Chris Fields wrote: > >> By local DB, do you mean a BioPerl-based local DB? Or is it something else? This is a bit vague. >> >> On the BioPerl side I suggest looking into Bio::DB::SeqFeature::Store for storing and querying genome information (it does exactly what you want if the proper information is loaded), or maybe the Ensembl Perl API, which can be used with a local or remote Ensembl setup. Beyond that you'll need to be more specific. >> >> chris >> >> On Apr 28, 2010, at 8:17 AM, Dimitar Kenanov wrote: >> >> >> >>> Hello guys, >>> i have a question about gene "boundaries". Is there some module in BioPerl which can help me extract the DNA sequence from a genomic DB (from specific chromosome). I have my human genome in a local DB and some "from-to" data sets corresponding to different chromosomes. So i want to get the DNA seqs for these from-to's. I know i can do that the normal way but if there is a way to do it with BioPerl it will be more consistent with the rest of the code. >>> >>> Thanks for any tips :) >>> >>> Cheers >>> Dimitar >>> >>> -- >>> Dimitar Kenanov >>> Postdoctoral research fellow >>> Protein Sequence Analysis Group >>> Bioinformatics Institute >>> A*STAR, Singapore >>> email: dimitark at bii.a-star.edu.sg >>> tel: +65 6478 8514 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > email: dimitark at bii.a-star.edu.sg > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore email: dimitark at bii.a-star.edu.sg tel: +65 6478 8514 From David.Messina at sbc.su.se Wed May 5 03:46:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 09:46:17 +0200 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <4BE1170D.8040108@bii.a-star.edu.sg> References: <4BDA986D.3020302@bii.a-star.edu.sg> <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> <4BE1170D.8040108@bii.a-star.edu.sg> Message-ID: <9F2DC6C9-7707-4C4A-8DE1-0B37387F7F8A@sbc.su.se> Great, glad to hear that. Thanks for letting us know about the problem! Dave On May 5, 2010, at 8:58, Dimitar Kenanov wrote: > Hi Dave, > thank you for the tip. Now it works like a charm :) > > Greetings > Dimitar > > > On 05/02/2010 05:59 PM, Dave Messina wrote: >> Hi Dimitar, >> >> The syntax you want is: >> >> # Build a Genewise alignment factory >> my $factory = Bio::Tools::Run::Genewise->new(); >> >> # turn on the quiet switch >> $factory->QUIET(1); >> >> # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects >> my @genes = $factory->run($protein_seq, $genomic_seq); >> >> >> This turns out be incorrectly documented on the man page, at least in part: >> >>> Available Params: >>> >>> NB: These should be passed without the '-' or they will be ignored, >>> except switches such as 'hmmer' (which have no corresponding value) >>> which should be set on the factory object using the AUTOLOADed methods >>> of the same name. >>> >>> Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] >>> Alg [-kbyte,-alg] >>> HMM [-hmmer] >>> Output [-gff,-gener,-alb,-pal,-block,-divide] >>> Standard [-help,-version,-silent,-quiet,-errorlog] >>> >> >> That is, these don't work as expected: >> >> $factory->quiet; >> $factory->quiet(1); >> >> due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. >> >> And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. >> >> >> So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. >> >> >> Dave >> >> > > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > email: dimitark at bii.a-star.edu.sg > tel: +65 6478 8514 > From torsten.seemann at infotech.monash.edu.au Wed May 5 03:48:55 2010 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 5 May 2010 17:48:55 +1000 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: > ? ? ?i have a huge GenBank file ( downloaded from RDP containing all > bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). > I am getting the output like: > S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > Holophagales Holophagae "Acidobacteria" Bacteria Root > This is the exact output i want, but i am missing lot of records (they are > there in the genbank file but not in my output). > I also got a warning during parsing: > --------------------- WARNING --------------------- > MSG: Unbalanced quote in: > /db_xref="taxon:35783" /germline" > /mol_type="genomic DNA" > /organism="Enterococcus sp." > /strain="LMG12316"No further qualifiers will be added for this feature > --------------------------------------------------- > So i was just wondering that is this warning message causing that problem or > i am doing something wrong? "Unbalanced quote" means there is not an even number (multiple of 2) double-quote (") symbols around the tag's value. I can see that the first line (below) looks problematic: YOU HAVE: /db_xref="taxon:35783" /germline" SHOULD BE: /db_xref="taxon:35783" /germline I suspect there is a problem either with RDP's genbank producer, or Bioperl is having problem with the "germline" qualifier which is a 'null valued' qualifier like /pseudo - it takes no ="value" string. (I think in Bioperl this is handled by setting the value to "_no_value" ?) http://www.ncbi.nlm.nih.gov/collab/FT/ Qualifier /germline Definition the sequence presented in the entry has not undergone somatic rearrangement as part of an adaptive immune response; it is the unrearranged sequence that was inherited from the parental germline Value format none Example /germline Comment /germline should not be used to indicate that the source of the sequence is a gamete or germ cell; /germline and /rearranged cannot be used in the same source feature; /germline and /rearranged should only be used for molecules that can undergo somatic rearrangements as part of an adaptive immune response; these are the T-cell receptor (TCR) and immunoglobulin loci in the jawed vertebrates, and the unrelated variable lymphocyte receptor (VLR) locus in the jawless fish (lampreys and hagfish); /germline and /rearranged should not be used outside of the Craniata (taxid=89593) --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From cjfields at illinois.edu Wed May 5 08:12:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 07:12:30 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: Message-ID: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> On May 4, 2010, at 11:24 PM, Jay Hannah wrote: > On May 4, 2010, at 10:30 PM, Jay Hannah wrote: >> http://sourceforge.net/projects/smolder/ > > Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) > > http://search.cpan.org/perldoc?Smolder > http://github.com/mpeters/smolder > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? chris From cjfields at illinois.edu Wed May 5 08:30:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 07:30:30 -0500 Subject: [Bioperl-l] using default string values for undef/empty, was Re: parsing GenBank file In-Reply-To: References: Message-ID: On May 5, 2010, at 2:48 AM, Torsten Seemann wrote: >> i have a huge GenBank file ( downloaded from RDP containing all >> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). >> I am getting the output like: >> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >> Holophagales Holophagae "Acidobacteria" Bacteria Root >> This is the exact output i want, but i am missing lot of records (they are >> there in the genbank file but not in my output). >> I also got a warning during parsing: >> --------------------- WARNING --------------------- >> MSG: Unbalanced quote in: >> /db_xref="taxon:35783" /germline" >> /mol_type="genomic DNA" >> /organism="Enterococcus sp." >> /strain="LMG12316"No further qualifiers will be added for this feature >> --------------------------------------------------- >> So i was just wondering that is this warning message causing that problem or >> i am doing something wrong? > > "Unbalanced quote" means there is not an even number (multiple of 2) > double-quote (") symbols around the tag's value. I can see that the > first line (below) looks problematic: > > YOU HAVE: > > /db_xref="taxon:35783" /germline" > > SHOULD BE: > > /db_xref="taxon:35783" > /germline > > I suspect there is a problem either with RDP's genbank producer, or > Bioperl is having problem with the "germline" qualifier which is a > 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > think in Bioperl this is handled by setting the value to "_no_value" > ?) > ... > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA Ugh, didn't notice the '_no_value' bit. Probably my opinion, but I don't like stubs like that as they tend to be brittle and run into issues (like this one, for instance). I would prefer we just leave that as undef and only quote defined values (with the exceptions in %FTQUAL_NO_QUOTE). Any reason for this behavior (is it related to ORM-related stuff like bioperl-db)? Can we change that to something a bit more realistic? chris From David.Messina at sbc.su.se Wed May 5 09:00:39 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 15:00:39 +0200 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> Message-ID: <252790EC-6A2D-4DFA-B2A0-8D0F8E169E30@sbc.su.se> Yeah, absolutely, Jay! it would be wonderful to have this for BioPerl. Dave On May 5, 2010, at 14:12, Chris Fields wrote: > On May 4, 2010, at 11:24 PM, Jay Hannah wrote: > >> On May 4, 2010, at 10:30 PM, Jay Hannah wrote: >>> http://sourceforge.net/projects/smolder/ >> >> Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) >> >> http://search.cpan.org/perldoc?Smolder >> http://github.com/mpeters/smolder >> >> Jay Hannah >> http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? > > chris From cjfields at illinois.edu Wed May 5 10:46:23 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 09:46:23 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub Message-ID: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> All, I would like to finalize moving over to git/github very soon. We're sort of in limbo on this, so it needs to progress forward. We'll need to do some initial cleanup after the move (Heikki is already doing a few things on the test repo, which we'll need to diff over to the new one). So with that in mind, here are my thoughts. This is copied over to this wiki page, in case you don't want to reply here: http://www.bioperl.org/wiki/From_SVN_to_Git (thanks Mark!) 1) Timeline When? Sooner the better (weeks as opposed to months). Our anon. svn is down, likely permanently (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). 2) Migration strategy Now mainly worked out using svn2git, which is very fast. We would need to make the svn repo on dev read-only during this transition. My guess is it would take very little time. Do we want to retain the git-SVN metadata on commits? This is viewable with our current read-only mirror on github: http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca 3) Developers Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly The current authors file used for mapping commit authors to emails used their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I think, once one has signed up with github, you can add that same address to your current ones, and it should map to your github account. If we use dev.open-bio.org as our central git repo, we won't need to go through with that, but we will need a viewable version of dev available somehow (mirrored on github or otherwise). Speaking of... 4) Development strategy Are we sticking with a single centralized repo (SVN-like)? Will that be github, or will github be a downstream repo to our work on dev? We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). Git makes it very easy to make branches and merge in code to trunk. With that in mind, I would highly suggest we start working on branches for almost everything and merge over to trunk. There is very little to no overhead in doing so with git. I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model Also, several points were raised in a related project (Parrot) considering a move to git/github from svn. One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. 5) Encouraging outside contributors Do we want to adopt a policy similar to Moose? http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod This is easy with github and forks. 6) SVN Read/Write to GitHub It was recently announced that one can access a github repo using subversion as read-only, and just yesterday experimental write to github is allowed: http://github.com/blog/644-subversion-write-support I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? 7) Others? chris From shalabh.sharma7 at gmail.com Wed May 5 10:46:19 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 10:46:19 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: Hi Torsten, Thanks for pointing that out. But this is just a warning, it will not break the script. i found the the point where script is breaking. Its breaking and giving this message: Can't call method "classification" on an undefined value at parseGB.pl line 9, line 10067733. So the script is breaking when its coming to this record: LOCUS S001198291 1521 bp rRNA linear BCT 02-Feb-2009 DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2. ACCESSION AP010656 REGION: 61786..63306 PROJECT GenomeProject:29025 SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". REFERENCE 1 (bases 1 to 1521) AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; TITLE ; JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. Contact:Atsushi Toyoda National Institute of Genetics, Comparative Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, Japan REFERENCE 2 AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor T.D., Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; It is unable to parse this record, but i don't understand why it is doing so? The only reason i can think of is the organism's name which is very long as compared to others. Thanks Shalabh On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < torsten.seemann at infotech.monash.edu.au> wrote: > > i have a huge GenBank file ( downloaded from RDP containing all > > bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > linage (in ORGANISM). > > I am getting the output like: > > S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > > Holophagales Holophagae "Acidobacteria" Bacteria Root > > This is the exact output i want, but i am missing lot of records (they > are > > there in the genbank file but not in my output). > > I also got a warning during parsing: > > --------------------- WARNING --------------------- > > MSG: Unbalanced quote in: > > /db_xref="taxon:35783" /germline" > > /mol_type="genomic DNA" > > /organism="Enterococcus sp." > > /strain="LMG12316"No further qualifiers will be added for this feature > > --------------------------------------------------- > > So i was just wondering that is this warning message causing that problem > or > > i am doing something wrong? > > "Unbalanced quote" means there is not an even number (multiple of 2) > double-quote (") symbols around the tag's value. I can see that the > first line (below) looks problematic: > > YOU HAVE: > > /db_xref="taxon:35783" /germline" > > SHOULD BE: > > /db_xref="taxon:35783" > /germline > > I suspect there is a problem either with RDP's genbank producer, or > Bioperl is having problem with the "germline" qualifier which is a > 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > think in Bioperl this is handled by setting the value to "_no_value" > ?) > > http://www.ncbi.nlm.nih.gov/collab/FT/ > > Qualifier /germline > Definition the sequence presented in the entry has not undergone > somatic > rearrangement as part of an adaptive immune response; it is > the > unrearranged sequence that was inherited from the parental > germline > Value format none > Example /germline > Comment /germline should not be used to indicate that the source of > the sequence is a gamete or germ cell; > /germline and /rearranged cannot be used in the same source > feature; > /germline and /rearranged should only be used for molecules > that > can undergo somatic rearrangements as part of an > adaptive immune > response; these are the T-cell receptor (TCR) and > immunoglobulin > loci in the jawed vertebrates, and the unrelated variable > lymphocyte receptor (VLR) locus in the jawless fish > (lampreys > and hagfish); > /germline and /rearranged should not be used outside of the > Craniata (taxid=89593) > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > From cjfields at illinois.edu Wed May 5 11:32:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 10:32:41 -0500 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Shalabh, What is the source of this file? It's not from GenBank; if I look up the parent sequence using Bio::DB::GenBank it works fine: use Modern::Perl; use Bio::DB::GenBank; my $id = 'AP010656'; my $gb = Bio::DB::GenBank->new(); my $seq = $gb->get_Seq_by_acc($id); say join(',',$seq->species->classification); chris On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > Hi Torsten, > Thanks for pointing that out. But this is just a warning, > it will not break the script. i found the the point where script is > breaking. > Its breaking and giving this message: > Can't call method "classification" on an undefined value at parseGB.pl line > 9, line 10067733. > > So the script is breaking when its coming to this record: > > LOCUS S001198291 1521 bp rRNA linear BCT > 02-Feb-2009 > DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2. > ACCESSION AP010656 REGION: 61786..63306 > PROJECT GenomeProject:29025 > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > "Bacteroidales"; > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > REFERENCE 1 (bases 1 to 1521) > AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; > TITLE ; > JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > Contact:Atsushi Toyoda National Institute of Genetics, > Comparative > Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > Japan > REFERENCE 2 > AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > T.D., > Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > > It is unable to parse this record, but i don't understand why it is doing > so? The only reason i can think of is the organism's name which is very long > as compared to others. > > Thanks > Shalabh > > > > On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > torsten.seemann at infotech.monash.edu.au> wrote: > >>> i have a huge GenBank file ( downloaded from RDP containing all >>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's >> linage (in ORGANISM). >>> I am getting the output like: >>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >>> Holophagales Holophagae "Acidobacteria" Bacteria Root >>> This is the exact output i want, but i am missing lot of records (they >> are >>> there in the genbank file but not in my output). >>> I also got a warning during parsing: >>> --------------------- WARNING --------------------- >>> MSG: Unbalanced quote in: >>> /db_xref="taxon:35783" /germline" >>> /mol_type="genomic DNA" >>> /organism="Enterococcus sp." >>> /strain="LMG12316"No further qualifiers will be added for this feature >>> --------------------------------------------------- >>> So i was just wondering that is this warning message causing that problem >> or >>> i am doing something wrong? >> >> "Unbalanced quote" means there is not an even number (multiple of 2) >> double-quote (") symbols around the tag's value. I can see that the >> first line (below) looks problematic: >> >> YOU HAVE: >> >> /db_xref="taxon:35783" /germline" >> >> SHOULD BE: >> >> /db_xref="taxon:35783" >> /germline >> >> I suspect there is a problem either with RDP's genbank producer, or >> Bioperl is having problem with the "germline" qualifier which is a >> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I >> think in Bioperl this is handled by setting the value to "_no_value" >> ?) >> >> http://www.ncbi.nlm.nih.gov/collab/FT/ >> >> Qualifier /germline >> Definition the sequence presented in the entry has not undergone >> somatic >> rearrangement as part of an adaptive immune response; it is >> the >> unrearranged sequence that was inherited from the parental >> germline >> Value format none >> Example /germline >> Comment /germline should not be used to indicate that the source of >> the sequence is a gamete or germ cell; >> /germline and /rearranged cannot be used in the same source >> feature; >> /germline and /rearranged should only be used for molecules >> that >> can undergo somatic rearrangements as part of an >> adaptive immune >> response; these are the T-cell receptor (TCR) and >> immunoglobulin >> loci in the jawed vertebrates, and the unrelated variable >> lymphocyte receptor (VLR) locus in the jawless fish >> (lampreys >> and hagfish); >> /germline and /rearranged should not be used outside of the >> Craniata (taxid=89593) >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed May 5 11:38:11 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 11:38:11 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Message-ID: Hi Chris, I downloaded this file from RDP, it contain all bacterial 16s. Thanks Shalabh On Wed, May 5, 2010 at 11:32 AM, Chris Fields wrote: > Shalabh, > > What is the source of this file? It's not from GenBank; if I look up the > parent sequence using Bio::DB::GenBank it works fine: > > use Modern::Perl; > use Bio::DB::GenBank; > > my $id = 'AP010656'; > > my $gb = Bio::DB::GenBank->new(); > > my $seq = $gb->get_Seq_by_acc($id); > > say join(',',$seq->species->classification); > > chris > > On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > > > Hi Torsten, > > Thanks for pointing that out. But this is just a warning, > > it will not break the script. i found the the point where script is > > breaking. > > Its breaking and giving this message: > > Can't call method "classification" on an undefined value at parseGB.pl > line > > 9, line 10067733. > > > > So the script is breaking when its coming to this record: > > > > LOCUS S001198291 1521 bp rRNA linear BCT > > 02-Feb-2009 > > DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. > CFP2. > > ACCESSION AP010656 REGION: 61786..63306 > > PROJECT GenomeProject:29025 > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > > "Bacteroidales"; > > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > > REFERENCE 1 (bases 1 to 1521) > > AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; > > TITLE ; > > JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > > Contact:Atsushi Toyoda National Institute of Genetics, > > Comparative > > Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > > Japan > > REFERENCE 2 > > AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > > T.D., > > Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > > > > It is unable to parse this record, but i don't understand why it is doing > > so? The only reason i can think of is the organism's name which is very > long > > as compared to others. > > > > Thanks > > Shalabh > > > > > > > > On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > > torsten.seemann at infotech.monash.edu.au> wrote: > > > >>> i have a huge GenBank file ( downloaded from RDP containing all > >>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > >> linage (in ORGANISM). > >>> I am getting the output like: > >>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > >>> Holophagales Holophagae "Acidobacteria" Bacteria Root > >>> This is the exact output i want, but i am missing lot of records (they > >> are > >>> there in the genbank file but not in my output). > >>> I also got a warning during parsing: > >>> --------------------- WARNING --------------------- > >>> MSG: Unbalanced quote in: > >>> /db_xref="taxon:35783" /germline" > >>> /mol_type="genomic DNA" > >>> /organism="Enterococcus sp." > >>> /strain="LMG12316"No further qualifiers will be added for this feature > >>> --------------------------------------------------- > >>> So i was just wondering that is this warning message causing that > problem > >> or > >>> i am doing something wrong? > >> > >> "Unbalanced quote" means there is not an even number (multiple of 2) > >> double-quote (") symbols around the tag's value. I can see that the > >> first line (below) looks problematic: > >> > >> YOU HAVE: > >> > >> /db_xref="taxon:35783" /germline" > >> > >> SHOULD BE: > >> > >> /db_xref="taxon:35783" > >> /germline > >> > >> I suspect there is a problem either with RDP's genbank producer, or > >> Bioperl is having problem with the "germline" qualifier which is a > >> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > >> think in Bioperl this is handled by setting the value to "_no_value" > >> ?) > >> > >> http://www.ncbi.nlm.nih.gov/collab/FT/ > >> > >> Qualifier /germline > >> Definition the sequence presented in the entry has not undergone > >> somatic > >> rearrangement as part of an adaptive immune response; it is > >> the > >> unrearranged sequence that was inherited from the parental > >> germline > >> Value format none > >> Example /germline > >> Comment /germline should not be used to indicate that the source > of > >> the sequence is a gamete or germ cell; > >> /germline and /rearranged cannot be used in the same source > >> feature; > >> /germline and /rearranged should only be used for molecules > >> that > >> can undergo somatic rearrangements as part of an > >> adaptive immune > >> response; these are the T-cell receptor (TCR) and > >> immunoglobulin > >> loci in the jawed vertebrates, and the unrelated variable > >> lymphocyte receptor (VLR) locus in the jawless fish > >> (lampreys > >> and hagfish); > >> /germline and /rearranged should not be used outside of the > >> Craniata (taxid=89593) > >> > >> > >> --Torsten Seemann > >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > >> University, AUSTRALIA > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed May 5 12:01:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 11:01:55 -0500 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Message-ID: <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> Shalabh, There are several problems with this file that make it somewhat problematic and somewhat non-GenBank like. It does parse (it has seq data) but doesn't catch the SOURCE/ORGANISM b/c of the somewhat non-canonical way of displaying the classification: SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". It's different enough from the NCBI version (from here: http://www.ncbi.nlm.nih.gov/nuccore/212548595) that it's probably breaking the parser: SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Bacteria; Bacteroidetes; Bacteroidia; Bacteroidales; Candidatus Azobacteroides. Please file this as a bug, we can take a look at it. It's a bit non-standard so I can't promise it'll be fixed unless it's fairly easy to do. chris On May 5, 2010, at 10:38 AM, shalabh sharma wrote: > Hi Chris, > I downloaded this file from RDP, it contain all bacterial 16s. > > Thanks > Shalabh > > > On Wed, May 5, 2010 at 11:32 AM, Chris Fields wrote: > >> Shalabh, >> >> What is the source of this file? It's not from GenBank; if I look up the >> parent sequence using Bio::DB::GenBank it works fine: >> >> use Modern::Perl; >> use Bio::DB::GenBank; >> >> my $id = 'AP010656'; >> >> my $gb = Bio::DB::GenBank->new(); >> >> my $seq = $gb->get_Seq_by_acc($id); >> >> say join(',',$seq->species->classification); >> >> chris >> >> On May 5, 2010, at 9:46 AM, shalabh sharma wrote: >> >>> Hi Torsten, >>> Thanks for pointing that out. But this is just a warning, >>> it will not break the script. i found the the point where script is >>> breaking. >>> Its breaking and giving this message: >>> Can't call method "classification" on an undefined value at parseGB.pl >> line >>> 9, line 10067733. >>> >>> So the script is breaking when its coming to this record: >>> >>> LOCUS S001198291 1521 bp rRNA linear BCT >>> 02-Feb-2009 >>> DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. >> CFP2. >>> ACCESSION AP010656 REGION: 61786..63306 >>> PROJECT GenomeProject:29025 >>> SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 >>> ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 >>> Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; >>> "Bacteroidales"; >>> "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". >>> REFERENCE 1 (bases 1 to 1521) >>> AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; >>> TITLE ; >>> JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. >>> Contact:Atsushi Toyoda National Institute of Genetics, >>> Comparative >>> Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, >>> Japan >>> REFERENCE 2 >>> AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor >>> T.D., >>> Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; >>> >>> It is unable to parse this record, but i don't understand why it is doing >>> so? The only reason i can think of is the organism's name which is very >> long >>> as compared to others. >>> >>> Thanks >>> Shalabh >>> >>> >>> >>> On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < >>> torsten.seemann at infotech.monash.edu.au> wrote: >>> >>>>> i have a huge GenBank file ( downloaded from RDP containing all >>>>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's >>>> linage (in ORGANISM). >>>>> I am getting the output like: >>>>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >>>>> Holophagales Holophagae "Acidobacteria" Bacteria Root >>>>> This is the exact output i want, but i am missing lot of records (they >>>> are >>>>> there in the genbank file but not in my output). >>>>> I also got a warning during parsing: >>>>> --------------------- WARNING --------------------- >>>>> MSG: Unbalanced quote in: >>>>> /db_xref="taxon:35783" /germline" >>>>> /mol_type="genomic DNA" >>>>> /organism="Enterococcus sp." >>>>> /strain="LMG12316"No further qualifiers will be added for this feature >>>>> --------------------------------------------------- >>>>> So i was just wondering that is this warning message causing that >> problem >>>> or >>>>> i am doing something wrong? >>>> >>>> "Unbalanced quote" means there is not an even number (multiple of 2) >>>> double-quote (") symbols around the tag's value. I can see that the >>>> first line (below) looks problematic: >>>> >>>> YOU HAVE: >>>> >>>> /db_xref="taxon:35783" /germline" >>>> >>>> SHOULD BE: >>>> >>>> /db_xref="taxon:35783" >>>> /germline >>>> >>>> I suspect there is a problem either with RDP's genbank producer, or >>>> Bioperl is having problem with the "germline" qualifier which is a >>>> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I >>>> think in Bioperl this is handled by setting the value to "_no_value" >>>> ?) >>>> >>>> http://www.ncbi.nlm.nih.gov/collab/FT/ >>>> >>>> Qualifier /germline >>>> Definition the sequence presented in the entry has not undergone >>>> somatic >>>> rearrangement as part of an adaptive immune response; it is >>>> the >>>> unrearranged sequence that was inherited from the parental >>>> germline >>>> Value format none >>>> Example /germline >>>> Comment /germline should not be used to indicate that the source >> of >>>> the sequence is a gamete or germ cell; >>>> /germline and /rearranged cannot be used in the same source >>>> feature; >>>> /germline and /rearranged should only be used for molecules >>>> that >>>> can undergo somatic rearrangements as part of an >>>> adaptive immune >>>> response; these are the T-cell receptor (TCR) and >>>> immunoglobulin >>>> loci in the jawed vertebrates, and the unrelated variable >>>> lymphocyte receptor (VLR) locus in the jawless fish >>>> (lampreys >>>> and hagfish); >>>> /germline and /rearranged should not be used outside of the >>>> Craniata (taxid=89593) >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed May 5 12:10:33 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 12:10:33 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> Message-ID: Hi Chris, I will do that, so how i can solve my problem, do you have any suggestion? I am thinking of taking all the accessions from the file i have and use Bio::DB::Genbank to get classification. Thanks shalabh On Wed, May 5, 2010 at 12:01 PM, Chris Fields wrote: > Shalabh, > > There are several problems with this file that make it somewhat problematic > and somewhat non-GenBank like. It does parse (it has seq data) but doesn't > catch the SOURCE/ORGANISM b/c of the somewhat non-canonical way of > displaying the classification: > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > > It's different enough from the NCBI version (from here: > http://www.ncbi.nlm.nih.gov/nuccore/212548595) that it's probably breaking > the parser: > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Bacteria; Bacteroidetes; Bacteroidia; Bacteroidales; Candidatus > Azobacteroides. > > Please file this as a bug, we can take a look at it. It's a bit > non-standard so I can't promise it'll be fixed unless it's fairly easy to > do. > > chris > > On May 5, 2010, at 10:38 AM, shalabh sharma wrote: > > > Hi Chris, > > I downloaded this file from RDP, it contain all bacterial 16s. > > > > Thanks > > Shalabh > > > > > > On Wed, May 5, 2010 at 11:32 AM, Chris Fields > wrote: > > > >> Shalabh, > >> > >> What is the source of this file? It's not from GenBank; if I look up > the > >> parent sequence using Bio::DB::GenBank it works fine: > >> > >> use Modern::Perl; > >> use Bio::DB::GenBank; > >> > >> my $id = 'AP010656'; > >> > >> my $gb = Bio::DB::GenBank->new(); > >> > >> my $seq = $gb->get_Seq_by_acc($id); > >> > >> say join(',',$seq->species->classification); > >> > >> chris > >> > >> On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > >> > >>> Hi Torsten, > >>> Thanks for pointing that out. But this is just a warning, > >>> it will not break the script. i found the the point where script is > >>> breaking. > >>> Its breaking and giving this message: > >>> Can't call method "classification" on an undefined value at parseGB.pl > >> line > >>> 9, line 10067733. > >>> > >>> So the script is breaking when its coming to this record: > >>> > >>> LOCUS S001198291 1521 bp rRNA linear BCT > >>> 02-Feb-2009 > >>> DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. > >> CFP2. > >>> ACCESSION AP010656 REGION: 61786..63306 > >>> PROJECT GenomeProject:29025 > >>> SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. > CFP2 > >>> ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > >>> Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > >>> "Bacteroidales"; > >>> "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > >>> REFERENCE 1 (bases 1 to 1521) > >>> AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki > Y.; > >>> TITLE ; > >>> JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > >>> Contact:Atsushi Toyoda National Institute of Genetics, > >>> Comparative > >>> Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > >>> Japan > >>> REFERENCE 2 > >>> AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > >>> T.D., > >>> Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > >>> > >>> It is unable to parse this record, but i don't understand why it is > doing > >>> so? The only reason i can think of is the organism's name which is very > >> long > >>> as compared to others. > >>> > >>> Thanks > >>> Shalabh > >>> > >>> > >>> > >>> On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > >>> torsten.seemann at infotech.monash.edu.au> wrote: > >>> > >>>>> i have a huge GenBank file ( downloaded from RDP containing all > >>>>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > >>>> linage (in ORGANISM). > >>>>> I am getting the output like: > >>>>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > >>>>> Holophagales Holophagae "Acidobacteria" Bacteria Root > >>>>> This is the exact output i want, but i am missing lot of records > (they > >>>> are > >>>>> there in the genbank file but not in my output). > >>>>> I also got a warning during parsing: > >>>>> --------------------- WARNING --------------------- > >>>>> MSG: Unbalanced quote in: > >>>>> /db_xref="taxon:35783" /germline" > >>>>> /mol_type="genomic DNA" > >>>>> /organism="Enterococcus sp." > >>>>> /strain="LMG12316"No further qualifiers will be added for this > feature > >>>>> --------------------------------------------------- > >>>>> So i was just wondering that is this warning message causing that > >> problem > >>>> or > >>>>> i am doing something wrong? > >>>> > >>>> "Unbalanced quote" means there is not an even number (multiple of 2) > >>>> double-quote (") symbols around the tag's value. I can see that the > >>>> first line (below) looks problematic: > >>>> > >>>> YOU HAVE: > >>>> > >>>> /db_xref="taxon:35783" /germline" > >>>> > >>>> SHOULD BE: > >>>> > >>>> /db_xref="taxon:35783" > >>>> /germline > >>>> > >>>> I suspect there is a problem either with RDP's genbank producer, or > >>>> Bioperl is having problem with the "germline" qualifier which is a > >>>> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > >>>> think in Bioperl this is handled by setting the value to "_no_value" > >>>> ?) > >>>> > >>>> http://www.ncbi.nlm.nih.gov/collab/FT/ > >>>> > >>>> Qualifier /germline > >>>> Definition the sequence presented in the entry has not undergone > >>>> somatic > >>>> rearrangement as part of an adaptive immune response; it is > >>>> the > >>>> unrearranged sequence that was inherited from the parental > >>>> germline > >>>> Value format none > >>>> Example /germline > >>>> Comment /germline should not be used to indicate that the > source > >> of > >>>> the sequence is a gamete or germ cell; > >>>> /germline and /rearranged cannot be used in the same source > >>>> feature; > >>>> /germline and /rearranged should only be used for molecules > >>>> that > >>>> can undergo somatic rearrangements as part of an > >>>> adaptive immune > >>>> response; these are the T-cell receptor (TCR) and > >>>> immunoglobulin > >>>> loci in the jawed vertebrates, and the unrelated variable > >>>> lymphocyte receptor (VLR) locus in the jawless fish > >>>> (lampreys > >>>> and hagfish); > >>>> /germline and /rearranged should not be used outside of the > >>>> Craniata (taxid=89593) > >>>> > >>>> > >>>> --Torsten Seemann > >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > >>>> University, AUSTRALIA > >>>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jay at jays.net Wed May 5 12:28:10 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 11:28:10 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> Message-ID: <512A88E4-85A0-4841-B6A7-9915FE0800BA@jays.net> On May 5, 2010, at 10:59 AM, Jay Hannah wrote: > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah Oops. Should have checked Smolder before sending that email... Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah $ prove -v t/email_signatures.t t/email_signatures.t .. 1..7 ok 1 - $work->[0]->{Outlook} email signatures up to date ok 2 - $work->[0]->{Netmail} email signatures up to date ok 3 - $work->[1]->{Lotus_Notes} email signatures up to date not ok 4 - $home->[0]->{MacMini_Mail.app} email signatures up to date ok 5 - $home->[0]->{MacMini_Entourage.app} email signatures up to date ok 6 - $home->[0]->{laptop_Mail.app} email signatures up to date ok 7 - $home->[0]->{laptop_Entourage.app} email signatures up to date # Failed test '$home->[0]->{MacMini_Mail.app} email signatures up to date' # at t/email_signatures.t line 5. # Looks like you failed 1 test of 7. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/7 subtests Test Summary Report ------------------- t/email_signatures.t (Wstat: 256 Tests: 7 Failed: 1) Failed test: 4 Non-zero exit status: 1 Files=1, Tests=7, 0 wallclock secs ( 0.03 usr 0.01 sys + 0.03 cusr 0.00 csys = 0.07 CPU) Result: FAIL From jay at jays.net Wed May 5 11:59:37 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 10:59:37 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> Message-ID: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> On May 5, 2010, at 7:12 AM, Chris Fields wrote: > I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? I would definitely start with trunk and see how it goes. Last night I tried to smoke all our old $work[0] tags and failed impressively. Our tests were (and probably still are) too reliant on 3rd party black boxes being online and responsive, and servers tend to move and get reconfigured over the years. Presumably BioPerl and Moose and more self-contained (unless external deps are explicitly enabled), so perhaps historical smoking would work fairly well. In Moose land the request is that I smoke not only Moose, but everything on CPAN that *depends on Moose*: export MOOSE_TEST_MD=1; prove xt/test-my-dependents.t Which should be ... educational. :) While exciting, I don't think that concept translates to the BioPerl monolith. If I'm the only one smoking, you'll get a very limited number of architecture + perl version combinations reported. Which begs the question of how to harness a broader tester pool. It's great that 342 systems smoked our latest CPAN upload: http://static.cpantesters.org/distro/B/bioperl.html But the crazy I'm embarking on would mean several smokes each day (every svn/git commit?), compared to the cpantesters who haven't had a new CPAN release to smoke since Sep 2009 (1.6.1). Maybe I'd just do one or two a day or something? Whoever wanted to could report into our central Smolder server using their architectures + perl versions. A volunteer would just install Smolder from CPAN and run this in their bioperl-live directory: prove -I . --recurse --archive test_run.tar.gz smolder_smoke_signal --server smolder.jays.net \ --username MyUserName --password MyPass \ --file test_run.tar.gz --project bioperl-live --tags trunk Deep ponderings, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From David.Messina at sbc.su.se Wed May 5 17:27:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 23:27:24 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> > Do we want to retain the git-SVN metadata on commits? What are the tradeoffs with this? >From the little reading I've done, it seems that space and clutter are the chief drawbacks, but that it's easy to strip this metadata out later. Does that jibe with your impression? > Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly My github account name is: DaveMessina Do I have an @bioperl.org address? I tried sending mail to a few likely permutations without success. In any case, I added dave_messina -at- bioperl.org as an email address on my github account. > Are we sticking with a single centralized repo (SVN-like)? I am a total git novice, but it's my understanding that it's still a good idea, particularly with a big many-author project like BioPerl, to have a primary, official repo. But I'd be interested in hearing more discussion on this. We're at a good place to make large-ish changes to how we do things, I think. > Will that be github, or will github be a downstream repo to our work on dev? My only concern with github being primary is in case something happens to github. Not likely, I know, but it seems prudent to maintain a certain amount of control over our destiny. So I'm inclined to make dev be primary and github downstream, with the assumption that it'd trivial to abandon dev and make github primary in the future if we want. Or would it be enough to auto-mirror to dev.open-bio.org, which could serve as a fallback in case github goes offline, temporarily or permanently? > We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). Are there any git-familiar folks out there who could comment on the pros and cons of this? Perhaps some of the other Bio* projects who have switched to git could advise. Right now, without further technical details, I think it'd be better to have one true primary just because it's less confusing and easier to manage, particularly if we're to follow a model like the one mentioned just below: > I would highly suggest we start working on branches for almost everything and merge over to trunk. > [...] > I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model Yep, that looks good to me, too. > One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. We should try to make sure we have this sorted before going "live". > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? Yes! We want more people to jump in ? one of the benefits of git and github is that they encourage this. > 6) SVN Read/Write to GitHub > > I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? Read-only for sure ? that seems harmless, and we want to give people lots of ways to get BioPerl. Write ? let's play with it a bit, making a few test commits to bioperl-test, and see what happens. It would be nice if we don't force everyone who contributes to BioPerl to have to switch over to git immediately. Me included. :) > 7) Others? What happens when we start splitting up bioperl into separate distros? Do we put them each into a separate repo? Dave From David.Messina at sbc.su.se Wed May 5 17:40:46 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 23:40:46 +0200 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> Message-ID: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> > Presumably BioPerl and Moose and more self-contained (unless external deps are explicitly enabled), so perhaps historical smoking would work fairly well. Very few of BioPerl's tests rely on outside servers, and those that do have to be turned on explicitly with a network-tests flag. So hopefully that won't be an issue. > In Moose land the request is that I smoke not only Moose, but everything on CPAN that *depends on Moose*: > [...] > While exciting, I don't think that concept translates to the BioPerl monolith. Agreed, not really. Except for some of the GMOD stuff. And anyway this could always be done later if desired. Probably much later. :) > Whoever wanted to could report into our central Smolder server using their architectures + perl versions. A volunteer would just install Smolder from CPAN and run this in their bioperl-live directory: > > prove -I . --recurse --archive test_run.tar.gz > smolder_smoke_signal --server smolder.jays.net \ > --username MyUserName --password MyPass \ > --file test_run.tar.gz --project bioperl-live --tags trunk Would the reporter need to have any special setup to do this? Could this kind of reporting be written into the BioPerl Build.PL as a user-settable option (just like the options for installing scripts or running network tests)? If so, then we could get lots of feedback on trunk (master) commits and not just releases. Dave From jason at bioperl.org Wed May 5 18:45:41 2010 From: jason at bioperl.org (Jason Stajich) Date: Wed, 05 May 2010 15:45:41 -0700 Subject: [Bioperl-l] Modules in Bio:Tree In-Reply-To: <4BE1D0E2.9010500@mail.mcgill.ca> References: <4BE1D0E2.9010500@mail.mcgill.ca> Message-ID: <4BE1F515.7090604@bioperl.org> Please use the mailing list for questions. The nodes are objects not strings you print - as it shows in http://bioperl.org/wiki/HOWTO:Trees#Example_Code you access information from them with the object methods like 'id' so print $leaf->id, "\n" would probably accomplish what you are looking for right now. -jason Sudeep Mehrotra wrote, On 5/5/10 1:11 PM: > Hello Jason, > I am using the Bio:Tree modules to get a list of all the leaves in > their respective clusters. I looked at the examples and followed the > functions of various modules but I am not able to get the desired result. > > My input looks as follows: > ((((Candidatus_Korarchaeum)Korarchaeota,((((Cenarchaeum_symbiosum)Cenarchaeum)Cenarchaeaceae)Cenarchaeales,((((Nitrosopumilus_maritimus)Nitrosopumilus)Nitrosopumilaceae)Nitrosopumilales)marine_archaeal_group_1)Thaumarchaeota,(((((Archaeoglobus_fulgidus)Archaeoglobus)Archaeoglobaceae)Archaeoglobales)Archaeoglobi, > > and so on.... > > Code is like this: > $input = new Bio::TreeIO(-file =>"$file1",-format => "newick"); > $tree = $input->next_tree; > @leaves = $tree->get_leaf_nodes(); > foreach $leaf (@leaves) > { > print "$leaf\n"; > } > The ouput I get is: > Bio::Tree::Node=HASH(0xa783e0) > Bio::Tree::Node=HASH(0xa78710) > Bio::Tree::Node=HASH(0xa78ab0) > > Not sure what I am doing wrong. > > Objective is to get a cluster of all the leaves. > > Thanks From florent.angly at gmail.com Wed May 5 20:16:05 2010 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 06 May 2010 10:16:05 +1000 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <4BE20A45.5090206@gmail.com> Hi Chris, On 06/05/10 00:46, Chris Fields wrote: > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I think, once one has signed up with github, you can add that same address to your current ones, and it should map to your github account. If we use dev.open-bio.org as our central git repo, we won't need to go through with that, but we will need a viewable version of dev available somehow (mirrored on github or otherwise). Speaking of... > I have a GitHub account, fangly, on which I just added the email address fangly at bioperl.org . Thanks for your efforts working on the Git migration. Florent From jay at jays.net Wed May 5 23:18:47 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:18:47 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: I smoked trunk a few times. Check out all the pretty buttons and graphs and such: http://biobase2.ist.unomaha.edu:8080/app/projects/smoke_reports/1 How you too can submit smoke results: http://jays.net/wiki/Smolder Neat? Not? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Wed May 5 23:31:05 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:31:05 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: On May 5, 2010, at 4:40 PM, Dave Messina wrote: > Very few of BioPerl's tests rely on outside servers, and those that do have to be turned on explicitly with a network-tests flag. So hopefully that won't be an issue. I said "no" to the network tests for my smoke runs. Haven't really examined the results enough to know if the failures are my fault or what. Since I always use bioperl-live out of SVN (soon git) I may not be following the ./Build.PL procedure correctly. > Agreed, not really. Except for some of the GMOD stuff. And anyway this could always be done later if desired. Probably much later. :) Ya. Some day http://smolder.open-bio.org hosting jillions of projects would be dreamy! :) Any open-bio.org projects using TAP other than BioPerl? Smolder can host anything TAP, and TAP producers are available in at least 17 languages: http://testanything.org/wiki/index.php/TAP_Producers > Would the reporter need to have any special setup to do this? LWP::UserAgent or Smolder's smolder_smoke_signal are the two methods I've successfully executed so far: http://jays.net/wiki/Smolder > Could this kind of reporting be written into the BioPerl Build.PL as a user-settable option (just like the options for installing scripts or running network tests)? > > If so, then we could get lots of feedback on trunk (master) commits and not just releases. Ya, wow. I've never built BioPerl "the right way" (I'm an SVN/git junkie) so I'm not sure how this would get put into Build.PL. Would you prompt the user, something like "Since you just installed BioPerl, we'd like to connect to the Internet and report in your test results. Is this ok? [yes] " ? It would be very cool to collect and trend thousands of reports, assuming it can be 100% automated for the user. Thanks for the feedback! :) Time to putter my motorcycle home before it gets too cold. G'night, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Wed May 5 23:43:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 22:43:14 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. chris On May 5, 2010, at 10:18 PM, Jay Hannah wrote: > I smoked trunk a few times. Check out all the pretty buttons and graphs and such: > > http://biobase2.ist.unomaha.edu:8080/app/projects/smoke_reports/1 > > How you too can submit smoke results: > > http://jays.net/wiki/Smolder > > Neat? Not? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Wed May 5 23:55:40 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:55:40 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: On May 5, 2010, at 10:43 PM, Chris Fields wrote: > Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. Ya, seems like the way to go. LWP is all over inside BioPerl already, whereas Smolder itself has 147 dependencies, most of which probably aren't relevant to most BioPerl users. :) http://deps.cpantesters.org/?module=Smolder;perl=latest So a stand-alone script that could be run whenever, plus (eventually) a prompt in Build.PL asking about running it? Not sure if Build.PL can somehow use the "prove --archive" hook to store the results during the normal installation run through all the tests... Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From lincoln.stein at gmail.com Thu May 6 08:01:09 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 6 May 2010 08:01:09 -0400 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: My github username is lstein and I've just added lstein at bioperl.org to my linked email addresses. I hope I have a bioperl.org address; I never use it! Lincoln On Wed, May 5, 2010 at 10:46 AM, Chris Fields wrote: > All, > > I would like to finalize moving over to git/github very soon. We're sort > of in limbo on this, so it needs to progress forward. We'll need to do some > initial cleanup after the move (Heikki is already doing a few things on the > test repo, which we'll need to diff over to the new one). > > So with that in mind, here are my thoughts. This is copied over to this > wiki page, in case you don't want to reply here: > > http://www.bioperl.org/wiki/From_SVN_to_Git > > (thanks Mark!) > > 1) Timeline > > When? Sooner the better (weeks as opposed to months). Our anon. svn is > down, likely permanently ( > http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > 2) Migration strategy > > Now mainly worked out using svn2git, which is very fast. We would need to > make the svn repo on dev read-only during this transition. My guess is it > would take very little time. Do we want to retain the git-SVN metadata on > commits? This is viewable with our current read-only mirror on github: > > > http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca > > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on > github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used > their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I > think, once one has signed up with github, you can add that same address to > your current ones, and it should map to your github account. If we use > dev.open-bio.org as our central git repo, we won't need to go through with > that, but we will need a viewable version of dev available somehow (mirrored > on github or otherwise). Speaking of... > > 4) Development strategy > > Are we sticking with a single centralized repo (SVN-like)? Will that be > github, or will github be a downstream repo to our work on dev? We could > feasibly have github be an active, forkable repo that could be > bidirectionally synced with dev, but I'm not sure of the logistics on this > (this popped up before with svn migration and was rejected b/c it was > considered too difficult to maintain). > > Git makes it very easy to make branches and merge in code to trunk. With > that in mind, I would highly suggest we start working on branches for almost > everything and merge over to trunk. There is very little to no overhead in > doing so with git. > > I like this strategy (Mark Jensen pointed this out): > http://nvie.com/git-model > > Also, several points were raised in a related project (Parrot) considering > a move to git/github from svn. One in particular was that git allows > destructive commits. Jonathan Leto indicated we can set up specific > branches that don't allow this, using commit hooks, so my guess is the > master branch and release branches wouldn't allow rewinds. > > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? > > http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod > > This is easy with github and forks. > > 6) SVN Read/Write to GitHub > > It was recently announced that one can access a github repo using > subversion as read-only, and just yesterday experimental write to github is > allowed: > > http://github.com/blog/644-subversion-write-support > > I can see allowing read-only svn, but write support is still experimental. > Do we want to allow that? > > 7) Others? > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Thu May 6 09:01:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 08:01:56 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> Message-ID: <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> (comments interspersed below) On May 5, 2010, at 4:27 PM, Dave Messina wrote: >> Do we want to retain the git-SVN metadata on commits? > > What are the tradeoffs with this? > > From the little reading I've done, it seems that space and clutter are the chief drawbacks, but that it's easy to strip this metadata out later. Does that jibe with your impression? I don't really see much use for it personally, beyond retaining the SVN commit #. >> Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly > > My github account name is: DaveMessina > > Do I have an @bioperl.org address? I tried sending mail to a few likely permutations without success. In any case, I added dave_messina -at- bioperl.org as an email address on my github account. I think if you have a bioperl dev account you should have a bioperl.org set up. That's one thing I'm not absolutely sure of. >> Are we sticking with a single centralized repo (SVN-like)? > > I am a total git novice, but it's my understanding that it's still a good idea, particularly with a big many-author project like BioPerl, to have a primary, official repo. But I'd be interested in hearing more discussion on this. We're at a good place to make large-ish changes to how we do things, I think. > > >> Will that be github, or will github be a downstream repo to our work on dev? > > My only concern with github being primary is in case something happens to github. Not likely, I know, but it seems prudent to maintain a certain amount of control over our destiny. > > So I'm inclined to make dev be primary and github downstream, with the assumption that it'd trivial to abandon dev and make github primary in the future if we want. > > Or would it be enough to auto-mirror to dev.open-bio.org, which could serve as a fallback in case github goes offline, temporarily or permanently? Well, the nice thing about git is essentially everyone who pulls has a copy of the repo. It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. We could also use alternate mirrors for github besides dev. http://repo.or.cz/w is one example. >> We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). > > Are there any git-familiar folks out there who could comment on the pros and cons of this? Perhaps some of the other Bio* projects who have switched to git could advise. > > Right now, without further technical details, I think it'd be better to have one true primary just because it's less confusing and easier to manage, particularly if we're to follow a model like the one mentioned just below: We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. >> I would highly suggest we start working on branches for almost everything and merge over to trunk. >> [...] >> I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model > > Yep, that looks good to me, too. > > > >> One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. > > We should try to make sure we have this sorted before going "live". Would be adding a pre-commit hook to disallow this. I'll look into it. >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? > > Yes! > > We want more people to jump in ? one of the benefits of git and github is that they encourage this. > > > >> 6) SVN Read/Write to GitHub >> >> I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? > > Read-only for sure ? that seems harmless, and we want to give people lots of ways to get BioPerl. > > Write ? let's play with it a bit, making a few test commits to bioperl-test, and see what happens. It would be nice if we don't force everyone who contributes to BioPerl to have to switch over to git immediately. Me included. :) Sounds good to me. >> 7) Others? > > What happens when we start splitting up bioperl into separate distros? Do we put them each into a separate repo? Yes. > Dave Thanks! chris From cjfields at illinois.edu Thu May 6 10:19:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 09:19:06 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: <3E35F38F-29A0-4419-AE24-AD25A0D6A6A1@illinois.edu> prove generally is just a perl script frontend for Test::Harness and App::Prove, correct? It is included in core from perl 5 on. Here is the code for 'prove' on my local setup: use strict; use App::Prove; my $app = App::Prove->new; $app->process_args(@ARGV); exit( $app->run ? 0 : 1 ); We could add a 'Build smoke' or somesuch that does this internally. I'm tending to shift away from Bio::Root::Build for such things at the moment, but maybe add something there? chris On May 5, 2010, at 10:55 PM, Jay Hannah wrote: > On May 5, 2010, at 10:43 PM, Chris Fields wrote: >> Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. > > Ya, seems like the way to go. LWP is all over inside BioPerl already, whereas Smolder itself has 147 dependencies, most of which probably aren't relevant to most BioPerl users. :) > > http://deps.cpantesters.org/?module=Smolder;perl=latest > > So a stand-alone script that could be run whenever, plus (eventually) a prompt in Build.PL asking about running it? Not sure if Build.PL can somehow use the "prove --archive" hook to store the results during the normal installation run through all the tests... > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Thu May 6 10:50:42 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 6 May 2010 09:50:42 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> Message-ID: <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> Chris, I added 'jhannah at bioperl.org' to my github list of email addresses. Can you add jhannah to the list of github committers in case github becomes the master repo? I need to clean up branches 'jhannah' and 'yapc10hackathon' whenever the transition is official and the master repo is declared (github or open-bio.org). Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Thu May 6 10:56:25 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 6 May 2010 09:56:25 -0500 Subject: [Bioperl-l] new core developers Rob Buels and Dave Messina In-Reply-To: References: Message-ID: On May 2, 2010, at 2:28 PM, Mark A. Jensen wrote: > On behalf of the core team, I am delighted to announce two new members: Rob Buels and Dave Messina. Woot! Congrats! Suddenly we WILL have a core dev at YAPC::NA for the hackathon! I'm now expecting great things from us. :) http://bioperl.org/wiki/YAPC Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Thu May 6 11:02:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 10:02:36 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> Message-ID: Done. I think, unless there are a terrible number of objections, we'll push this in the next week or two. Need to look into the pre-commit hook setup for non-destructive commits, post-commit hook for posting commits to bioperl-guts, etc. chris On May 6, 2010, at 9:50 AM, Jay Hannah wrote: > Chris, > > I added 'jhannah at bioperl.org' to my github list of email addresses. Can you add jhannah to the list of github committers in case github becomes the master repo? > > I need to clean up branches 'jhannah' and 'yapc10hackathon' whenever the transition is official and the master repo is declared (github or open-bio.org). > > Thanks, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki.lehvaslaiho at gmail.com Thu May 6 13:26:48 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 6 May 2010 20:26:48 +0300 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: On 5 May 2010 17:46, Chris Fields wrote: > All, > > I would like to finalize moving over to git/github very soon. We're sort > of in limbo on this, so it needs to progress forward. We'll need to do some > initial cleanup after the move (Heikki is already doing a few things on the > test repo, which we'll need to diff over to the new one). > Do not worry about those, I'll move them into the final repo once it is there. I am just making sure everything works. > So with that in mind, here are my thoughts. This is copied over to this > wiki page, in case you don't want to reply here: > > http://www.bioperl.org/wiki/From_SVN_to_Git > > (thanks Mark!) > > 1) Timeline > > When? Sooner the better (weeks as opposed to months). Our anon. svn is > down, likely permanently ( > http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > ASAP. > 2) Migration strategy > > Now mainly worked out using svn2git, which is very fast. We would need to > make the svn repo on dev read-only during this transition. My guess is it > would take very little time. Do we want to retain the git-SVN metadata on > commits? This is viewable with our current read-only mirror on github: > > > http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca > > Keep it. It does no harm. > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on > github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used > their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I > think, once one has signed up with github, you can add that same address to > your current ones, and it should map to your github account. If we use > dev.open-bio.org as our central git repo, we won't need to go through with > that, but we will need a viewable version of dev available somehow (mirrored > on github or otherwise). Speaking of... > Let's go for github as the main repo. It adds visibility and has the coolness factor that helps. > 4) Development strategy > > Are we sticking with a single centralized repo (SVN-like)? Will that be > github, or will github be a downstream repo to our work on dev? We could > feasibly have github be an active, forkable repo that could be > bidirectionally synced with dev, but I'm not sure of the logistics on this > (this popped up before with svn migration and was rejected b/c it was > considered too difficult to maintain). > > Git makes it very easy to make branches and merge in code to trunk. With > that in mind, I would highly suggest we start working on branches for almost > everything and merge over to trunk. There is very little to no overhead in > doing so with git. > > I like this strategy (Mark Jensen pointed this out): > http://nvie.com/git-model > Lets try to follow this strategy. I do not think moving away from svn and going decentralized at one go would work at all. > Also, several points were raised in a related project (Parrot) considering > a move to git/github from svn. One in particular was that git allows > destructive commits. Jonathan Leto indicated we can set up specific > branches that don't allow this, using commit hooks, so my guess is the > master branch and release branches wouldn't allow rewinds. > I would not worry too much about that. With git we'll have dozens if not not hundreds of full copies of the repo as a backup. > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? > > http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod > Interesting and educational document. Let's learn as much a we can from it. This is easy with github and forks. > The more the merrier. BTW, I can see Moose using Shipit, http://search.cpan.org/~bradfitz/ShipIt-0.55/ that might be worth using in BioPerl. > 6) SVN Read/Write to GitHub > > It was recently announced that one can access a github repo using > subversion as read-only, and just yesterday experimental write to github is > allowed: > > http://github.com/blog/644-subversion-write-support > > I can see allowing read-only svn, but write support is still experimental. > Do we want to allow that? > Why not is someone insists on using it. Once people get over the initial problems of moving to a different mind set in git, very few will want to use svn. There might be situtations when git does not work, however, so lets allow for svn usage. > > 7) Others? > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Thu May 6 14:35:55 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 6 May 2010 20:35:55 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> Message-ID: [ git-SVN metadata ] > I don't really see much use for it personally, beyond retaining the SVN commit #. Oh well heck, in that case we may as well ditch it. If there's some way we could easily keep an inactive, archived version with the SVN to github commit # mapping, that would be a nice safety measure, but if it's too much trouble we needn't bother. [ github or dev as primary ] > It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. Great, okay, sounds like there won't be any problem there. [ single repo? ] > We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. Sounds like a plan. I'm pretty swamped until late next week, but if there's anything I can do to help at that time, just holler... Dave From cseligman at earthlink.net Thu May 6 15:23:40 2010 From: cseligman at earthlink.net (Chet Seligman) Date: Thu, 6 May 2010 12:23:40 -0700 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 Message-ID: <001b01caed51$a2e745c0$e8b5d140$@net> I need some help in installing this as it is not in the Active-perl repository. Here's what I have done: 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz 2. Extracted it into an empty directory IN 3. Planned to install by specifying the ppd file directly: ppm install c:\IN\whatever module-name.ppd However, there is no .ppd file extracted. I'd appreciate it if someone would explain how to get Bio::Graphics installed? Chet From scott at scottcain.net Thu May 6 15:44:04 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 6 May 2010 15:44:04 -0400 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 In-Reply-To: <001b01caed51$a2e745c0$e8b5d140$@net> References: <001b01caed51$a2e745c0$e8b5d140$@net> Message-ID: Hi Chet, Install it via the cpan shell: $ cpan cpan> install Bio::Graphics Scott On Thu, May 6, 2010 at 3:23 PM, Chet Seligman wrote: > I need some help in installing this as it is not in the Active-perl > repository. Here's what I have done: > 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz > 2. Extracted it into an empty directory IN > 3. Planned to install by specifying the ppd file directly: > ppm install c:\IN\whatever module-name.ppd > > However, there is no .ppd file extracted. > > I'd appreciate it if someone would explain how to get Bio::Graphics > installed? > > Chet > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Thu May 6 15:57:03 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 6 May 2010 15:57:03 -0400 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 In-Reply-To: <002301caed55$53bfc400$fb3f4c00$@net> References: <001b01caed51$a2e745c0$e8b5d140$@net> <002301caed55$53bfc400$fb3f4c00$@net> Message-ID: Hi Chet, Please keep your responses on the bioperl mailing list. As long as you install BioPerl and GD before you try to install Bio::Graphics from cpan, yes, it is perfectly doable. You need to do that in the cmd shell. GD needs to be installed from ppm because it requires compiled code. Scott On Thu, May 6, 2010 at 3:50 PM, Chet Seligman wrote: > Hey Scott: > Is your suggestion doable in Windows? > > How? > > Chet > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Scott Cain > Sent: Thursday, May 06, 2010 12:44 PM > To: Chet Seligman > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Installing Bio-Graphics-2.06 > > Hi Chet, > > Install it via the cpan shell: > > $ cpan > cpan> install Bio::Graphics > > Scott > > > On Thu, May 6, 2010 at 3:23 PM, Chet Seligman > wrote: >> I need some help in installing this as it is not in the Active-perl >> repository. Here's what I have done: >> 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz >> 2. Extracted it into an empty directory IN >> 3. Planned to install by specifying the ppd file directly: >> ppm install c:\IN\whatever module-name.ppd >> >> However, there is no .ppd file extracted. >> >> I'd appreciate it if someone would explain how to get Bio::Graphics >> installed? >> >> Chet >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Thu May 6 16:04:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 15:04:39 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> Message-ID: <48C987D6-A7F2-4FBC-AB75-38F0B234961C@illinois.edu> On May 6, 2010, at 1:35 PM, Dave Messina wrote: > [ git-SVN metadata ] > >> I don't really see much use for it personally, beyond retaining the SVN commit #. > > Oh well heck, in that case we may as well ditch it. > > If there's some way we could easily keep an inactive, archived version with the SVN to github commit # mapping, that would be a nice safety measure, but if it's too much trouble we needn't bother. I think we'll keep it in for the SVN commits. Better to have it just in case. > [ github or dev as primary ] > >> It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. > > Great, okay, sounds like there won't be any problem there. > > > [ single repo? ] > >> We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. > > Sounds like a plan. > > > I'm pretty swamped until late next week, but if there's anything I can do to help at that time, just holler... > > > Dave Okay, will prep another email for the final push over to git. chris From cjfields at illinois.edu Thu May 6 16:13:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 15:13:44 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> On May 6, 2010, at 12:26 PM, Heikki Lehvaslaiho wrote: > ... >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? >> >> http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod >> > > Interesting and educational document. Let's learn as much a we can from it. > > This is easy with github and forks. >> > > The more the merrier. > > BTW, I can see Moose using Shipit, > http://search.cpan.org/~bradfitz/ShipIt-0.55/ > that might be worth using in BioPerl. I agree. Have thought about that, primarily for easier releases down the road. >> 6) SVN Read/Write to GitHub >> >> It was recently announced that one can access a github repo using >> subversion as read-only, and just yesterday experimental write to github is >> allowed: >> >> http://github.com/blog/644-subversion-write-support >> >> I can see allowing read-only svn, but write support is still experimental. >> Do we want to allow that? >> > > Why not is someone insists on using it. Once people get over the initial > problems of moving to a different mind set in git, very few will want to use > svn. There might be situtations when git does not work, however, so lets > allow for svn usage. Nothing really stopping it, unless we add something to a pre-commit hook that prevents it somehow. I'm thinking a move in the next 5 days, maybe starting Monday? I'll try getting a post out on it. chris From rmb32 at cornell.edu Thu May 6 17:09:03 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 06 May 2010 14:09:03 -0700 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> Message-ID: <4BE32FEF.6080707@cornell.edu> The branching model at http://nvie.com/git-model is a good one, but the diagram might be a little intimidating for devs that are new to git. Note that the only branches that most devs will need to be concerned with are the feature branches (sometimes called topic branches), and the main development branch. The other branches are mostly concerned with making releases. To weigh in on other issues on this thread: * Might as well keep the svn metadata, it doesn't hurt and could help in any situations that call for historical digging around. * I don't think we should allow any svn write support. Anybody that truly cannot get over the hump can send patches to the list. Thanks so much for heading this up Chris. Rob From cjfields at illinois.edu Thu May 6 17:28:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 16:28:25 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <4BE32FEF.6080707@cornell.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> Message-ID: <9676F5A9-A778-4440-95EF-14282DF72454@illinois.edu> On May 6, 2010, at 4:09 PM, Robert Buels wrote: > The branching model at http://nvie.com/git-model is a good one, but the diagram might be a little intimidating for devs that are new to git. > > Note that the only branches that most devs will need to be concerned with are the feature branches (sometimes called topic branches), and the main development branch. The other branches are mostly concerned with making releases. > > To weigh in on other issues on this thread: > > * Might as well keep the svn metadata, it doesn't hurt and could help in > any situations that call for historical digging around. > * I don't think we should allow any svn write support. Anybody that > truly cannot get over the hump can send patches to the list. > > Thanks so much for heading this up Chris. > > Rob One stumbling block that I'm seeing is there is a current lack of pre-commit hook support in github (to prevent destructive or history-changing commits). I don't think this will be a problem, but it's worth noting. post-commit is fine. chris From David.Messina at sbc.su.se Thu May 6 17:59:56 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 6 May 2010 23:59:56 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <4BE32FEF.6080707@cornell.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> Message-ID: <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> > * I don't think we should allow any svn write support. Anybody that > truly cannot get over the hump can send patches to the list. Unless svn commits are somehow problematic, is there another reason to disallow it? We're switching to git soon and with little advance notice. We'd be asking all the devs to make the move on our schedule. Dave From dimitark at bii.a-star.edu.sg Thu May 6 22:25:23 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 07 May 2010 10:25:23 +0800 Subject: [Bioperl-l] about Genewise Message-ID: <4BE37A13.6010309@bii.a-star.edu.sg> Hi guys, i have a question about Genewise. Is it possible to get the percent identity between query and target? I am now trying to figure that out. I found no such method so i suppose i should calculate it myself. Thank you for your time and help. Greetings Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From dimitark at bii.a-star.edu.sg Fri May 7 01:03:58 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 07 May 2010 13:03:58 +0800 Subject: [Bioperl-l] more genewise Message-ID: <4BE39F3E.4090204@bii.a-star.edu.sg> Hi guys, another question about genewise. Is it possible to get the query seq and the protein translation of the target seq somehow? So, up to now i could not find a way to get the percent identity between query and target(the protein translation) :( I spent some time on CPAN and perldoc and even checked the code of several modules but still no solution. Then i decided to extract the sequences out of the output file and compare them somehow but i could not find a way and for that. I found that the module 'Bio::Tools::Run::Genewise' is creating internal temp output file which i cant access so i can parse it myself and extract whatever. Because with current implementation i cant access that temp output i hacked a bit 'Bio::Tools::Run::Genewise' so i can pass my output file to the constructor, like that: my $factory = Bio::Tools::Run::Genewise->new( output => $tmpout); #not "-output" cos the module currently doesnt like it I modified the BEGIN section and the '_run' subroutine. My lines and the originals are marked : -------------- BEGIN { @GENEWISE_PARAMS = qw( DYMEM CODON GENE CFREQ SPLICE GENESTATS INIT SUBS INDEL INTRON NULL INSERT SPLICE_MAX_COLLAR SPLICE_MIN_COLLAR GW_EDGEQUERY GW_EDGETARGET GW_SPLICESPREAD KBYTE HNAME ALG BLOCK DIVIDE GENER U V S T G E M); @GENEWISE_SWITCHES = qw(HELP SILENT QUIET ERROROFFSTD TREV PSEUDO NOSPLICE_GTAG SPLICE_GTAG NOGWHSP GWHSP TFOR TABS BOTH HMMER ); $OK_FIELD{OUTPUT}++; *#dimitar * # Authorize attribute fields foreach my $attr ( @GENEWISE_PARAMS, @GENEWISE_SWITCHES, @OTHER_SWITCHES) { $OK_FIELD{$attr}++; } } ----------------------- ----------------------- my ($tfh1,$outfile1) = $self->io->tempfile(-dir=>$self->tempdir); $self->debug("genewise command = $commandstring"); my $outfile2=$self->output; *#dimitar* # my $status = system("$commandstring > $outfile1"); *#original* my $status = system("$commandstring > $outfile2 "); *#dimitar* $self->throw("Genewies call $commandstring crashed: $? \n") unless $status==0; # my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile1); *#original* my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile2); *#dimitar* ----------------------- More the method 'cds' from 'Bio::SeqFeature::Gene::Exon/I' gives nothing back it doesnt matter what i tried. And i tried a lot :) Fortunately for me i dont need that for now. But tried and didnt work so had to say. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From O.Niehuis.zfmk at uni-bonn.de Fri May 7 02:34:54 2010 From: O.Niehuis.zfmk at uni-bonn.de (Dr. Oliver Niehuis) Date: Fri, 7 May 2010 08:34:54 +0200 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifying alignment parameters Message-ID: Hi, I have a question about how to specify parameters for the alignment program MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run MAFFT with the following alignment parameters: --maxiterate 1000 --localpair Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module before, I specified the MAFFT run parameters as follows: @params = ('localpair', 'maxiterate' => 1000); $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); Unfortunately, this code causes an exception error: ------------- EXCEPTION ------------- MSG: Unallowed parameter: LOCALPAIR ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/ Generate_FASTA_files_of_orthologs.pl:55 ------------------------------------- I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT module, but only when leaving the @params array empty; MAFFT then runs with the default parameters. Has anyone an idea how I can specify run parameters for MAFFT via the Bio::Tools::Run::Alignment::MAFFT module? Any help is much appreciated! Best wishes, Oliver From biopython at maubp.freeserve.co.uk Fri May 7 04:51:38 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 May 2010 09:51:38 +0100 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> Message-ID: On Thu, May 6, 2010 at 10:59 PM, Dave Messina wrote: >> * I don't think we should allow any svn write support. ?Anybody that >> ?truly cannot get over the hump can send patches to the list. > > Unless svn commits are somehow problematic, is there another reason to disallow it? >From my reading of the github blog post, svn merges are potentially problematic. http://github.com/blog/644-subversion-write-support Peter From maj at fortinbras.us Fri May 7 07:53:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 07:53:55 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters In-Reply-To: References: Message-ID: Hi Oliver, This module looks like it needs some updating. Here's a hack that should make it work (or at least prevent that exception); put the following lines before the new() call: push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_PARAMS, 'MAXITERATE'; push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, 'LOCALPAIR'; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; HTH, Mark ----- Original Message ----- From: "Dr. Oliver Niehuis" To: Sent: Friday, May 07, 2010 2:34 AM Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters > Hi, > > I have a question about how to specify parameters for the alignment program > MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run > MAFFT with the following alignment parameters: > > --maxiterate 1000 --localpair > > Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module > before, I specified the MAFFT run parameters as follows: > > @params = ('localpair', 'maxiterate' => 1000); > $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); > > Unfortunately, this code causes an exception error: > > ------------- EXCEPTION ------------- > MSG: Unallowed parameter: LOCALPAIR ! > STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ > Bio/Tools/Run/Alignment/MAFFT.pm:211 > STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ > Tools/Run/Alignment/MAFFT.pm:196 > STACK toplevel /Users/Oliver/Desktop/Orthologs/ > Generate_FASTA_files_of_orthologs.pl:55 > ------------------------------------- > > I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT > module, but only when leaving the @params array empty; MAFFT then runs with > the default parameters. > > Has anyone an idea how I can specify run parameters for MAFFT via the > Bio::Tools::Run::Alignment::MAFFT module? > > Any help is much appreciated! > > Best wishes, > Oliver > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri May 7 08:12:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 7 May 2010 07:12:05 -0500 Subject: [Bioperl-l] more genewise In-Reply-To: <4BE39F3E.4090204@bii.a-star.edu.sg> References: <4BE39F3E.4090204@bii.a-star.edu.sg> Message-ID: <4899F495-FA46-4030-B984-EEFF81579C27@illinois.edu> Dimitar, It would be better if you could create a bug report describing the problem (with minimal example data and code) and provide a diff file or patch. This gives us a chance to do some code review and commit the patch if it passes tests. Here's a HOWTO on this: http://www.bioperl.org/wiki/HOWTO:SubmitPatch Let us know when it's submitted and we can take a look. chris On May 7, 2010, at 12:03 AM, Dimitar Kenanov wrote: > Hi guys, > another question about genewise. Is it possible to get the query seq and the protein translation of the target seq somehow? > > So, up to now i could not find a way to get the percent identity between query and target(the protein translation) :( I spent some time on CPAN and perldoc and even checked the code of several modules but still no solution. Then i decided to extract the sequences out of the output file and compare them somehow but i could not find a way and for that. I found that the module 'Bio::Tools::Run::Genewise' is creating internal temp output file which i cant access so i can parse it myself and extract whatever. > > Because with current implementation i cant access that temp output i hacked a bit 'Bio::Tools::Run::Genewise' so i can pass my output file to the constructor, like that: > > my $factory = Bio::Tools::Run::Genewise->new( output => $tmpout); #not "-output" cos the module currently doesnt like it > > I modified the BEGIN section and the '_run' subroutine. My lines and the originals are marked : > -------------- > BEGIN { > @GENEWISE_PARAMS = qw( DYMEM CODON GENE CFREQ SPLICE GENESTATS INIT > SUBS INDEL INTRON NULL INSERT SPLICE_MAX_COLLAR SPLICE_MIN_COLLAR > GW_EDGEQUERY GW_EDGETARGET GW_SPLICESPREAD > KBYTE HNAME ALG BLOCK DIVIDE GENER U V S T G E M); > > @GENEWISE_SWITCHES = qw(HELP SILENT QUIET ERROROFFSTD TREV PSEUDO NOSPLICE_GTAG > SPLICE_GTAG NOGWHSP GWHSP > TFOR TABS BOTH HMMER ); > > $OK_FIELD{OUTPUT}++; *#dimitar > * # Authorize attribute fields > foreach my $attr ( @GENEWISE_PARAMS, @GENEWISE_SWITCHES, > @OTHER_SWITCHES) { $OK_FIELD{$attr}++; } > } > ----------------------- > ----------------------- > my ($tfh1,$outfile1) = $self->io->tempfile(-dir=>$self->tempdir); > $self->debug("genewise command = $commandstring"); > my $outfile2=$self->output; *#dimitar* > # my $status = system("$commandstring > $outfile1"); *#original* > my $status = system("$commandstring > $outfile2 "); *#dimitar* > $self->throw("Genewies call $commandstring crashed: $? \n") unless $status==0; > > # my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile1); *#original* > my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile2); *#dimitar* > ----------------------- > > More the method 'cds' from 'Bio::SeqFeature::Gene::Exon/I' gives nothing back it doesnt matter what i tried. And i tried a lot :) Fortunately for me i dont need that for now. But tried and didnt work so had to say. > > Cheers > Dimitar > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri May 7 11:34:09 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 11:34:09 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters In-Reply-To: <332A01DD-64DA-41EC-B5CE-2BC74BE78038@uni-bonn.de> References: <332A01DD-64DA-41EC-B5CE-2BC74BE78038@uni-bonn.de> Message-ID: <9764564B5CC44A89883498C6309DA045@NewLife> Hi Oliver, I think so, looking at the module again. Instead of the lines in the previous post, put push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, '(LOCALPAIR', 'MAXITERATE'); $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; and create your @params array with @params = ('localpair' => 1, 'maxiterate' => 1000); The switches need to be set with something that returns true, I believe. I *think* this should work for you. But if you would, please submit your original problem as a bug at http://bugzilla.bioperl.org. The module definitely needs some tender loving care. Thanks Mark ----- Original Message ----- From: Dr. Oliver Niehuis To: Mark A. Jensen Sent: Friday, May 07, 2010 11:07 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters Dear Mark, Thanks for your quick reply and the MAFFT module hack. I added your code to my script and it seems to works, except that I can't specify the number of iterations (at least, I don't know how). I can specify my @params = ('localpair', 'maxiterate'); but when I assign 1000 to 'maxiterate' (i.e. 'maxiterate' => 1000), I get again an exception error, complaining about 1000 being an unallowed parameter. ------------- EXCEPTION ------------- MSG: Unallowed parameter: 1000 ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/Generate_FASTA_files_of_orthologs.pl:61 ------------------------------------- Do you know how to fix this? Best wishes, Oliver Am 07.05.2010 um 13:53 schrieb Mark A. Jensen: Hi Oliver, This module looks like it needs some updating. Here's a hack that should make it work (or at least prevent that exception); put the following lines before the new() call: push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_PARAMS, 'MAXITERATE'; push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, 'LOCALPAIR'; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; HTH, Mark ----- Original Message ----- From: "Dr. Oliver Niehuis" To: Sent: Friday, May 07, 2010 2:34 AM Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters Hi, I have a question about how to specify parameters for the alignment program MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run MAFFT with the following alignment parameters: --maxiterate 1000 --localpair Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module before, I specified the MAFFT run parameters as follows: @params = ('localpair', 'maxiterate' => 1000); $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); Unfortunately, this code causes an exception error: ------------- EXCEPTION ------------- MSG: Unallowed parameter: LOCALPAIR ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/ Generate_FASTA_files_of_orthologs.pl:55 ------------------------------------- I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT module, but only when leaving the @params array empty; MAFFT then runs with the default parameters. Has anyone an idea how I can specify run parameters for MAFFT via the Bio::Tools::Run::Alignment::MAFFT module? Any help is much appreciated! Best wishes, Oliver _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Fri May 7 12:42:38 2010 From: hartzell at alerce.com (George Hartzell) Date: Fri, 7 May 2010 09:42:38 -0700 Subject: [Bioperl-l] [job] Contract programmer in Bioinformatics at Genentech. Message-ID: <19428.17150.181595.755965@gargle.gargle.HOWL> Genentech's Bioinformatics department seeks an experienced software engineer for a six month contract. Modern Perl (or enlightened, or ..., just not circa 1998) style is required. We build tools to support our Research labs, collecting, storing, massaging, and presenting information to computer-philes and -phobes. We have more to do than we can handle, you'll be pitching in. Exactly what you'd be doing will be a function of your skills and our needs, and will probably vary a bit over the six month period. You write tests, sometimes even before you write code. You're not afraid of a little SQL and are comfortable collaborating with folks who were born speaking it. You're familiar with things like Moose, Rose::DB::Object, CGI::Application, NYTProf, and their ilk (or brethren) and more importantly are excited about learning more about them and using them in real-world work. Smoothing out our in-house DPAN, setting up an automated build/smoke system (we have Hudson handling Java builds already) and helping with some other infrastructure stuff is also on the table. You'll be working more-or-less full time in South San Fransisco, there's the potential for a bit of telecommuting once things get running smoothly but the bulk of the job is onsite. Things that you should be comfortable with include: Perl ("modern") SQL, object relational mappers Web application (CGI::Application, or similar) CPAN, Module::Build, Dist::Zilla, etc.... Linux Software engineering in a professional environment. Experience in bioinformatics, biology, or supporting scientists would be helpful but is not required. Please send cover letters and resumes to my work address: georgewh at gene.com (the ability to follow directions is important). Bonus points for easy formats (PDF is great!), demerits for sending me stuff in DOS specific archive formats. g. From qqq2395 at gmail.com Thu May 6 14:51:13 2010 From: qqq2395 at gmail.com (visitor555) Date: Thu, 6 May 2010 11:51:13 -0700 (PDT) Subject: [Bioperl-l] Bio::Align - alignment by position? Message-ID: <28478022.post@talk.nabble.com> Hi, I have a list alignment positions and I want to get each column them from the alignment. If I slice the alignment the sequence with gaps in these positions disappear. I can rotate on each seq and then split the sequence. Is there better way to go over the alignment position by position? thanks ! -- View this message in context: http://old.nabble.com/Bio%3A%3AAlign---alignment-by-position--tp28478022p28478022.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jillianrowe91286 at gmail.com Mon May 3 08:42:56 2010 From: jillianrowe91286 at gmail.com (mindlessbrain) Date: Mon, 3 May 2010 05:42:56 -0700 (PDT) Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall Message-ID: <28434717.post@talk.nabble.com> Hey all, I'm trying to run some code for StandAloneBLast in Windows Vista: [code] #!/usr/bin/perl use Bio::DB::SwissProt; use Bio::Tools::Run::StandAloneBlast; BEGIN { $ENV{PATH}="D:/blast-2.2.23+/bin/:"; } my $database = new Bio::DB::SwissProt; my $query = $database->get_Seq_by_id('TAUD_ECOLI'); my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastp', 'database' => 'swissprot', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); my $result = $blast_report->next_result; while( my $hit = $result->next_hit()) { print "\thit name: ", $hit->name(), " significance: ", $hit->significance(), "\n"; } [/code] I installed BLAST from the NCBI website. I get this when I run dir on the bin: D:\blast-2.2.23+\bin>dir Volume in drive D has no label. Volume Serial Number is 224C-0190 Directory of D:\blast-2.2.23+\bin 05/03/2010 03:02 PM . 05/03/2010 03:02 PM .. 03/08/2010 11:09 PM 2,789,376 blastdbcheck.exe 03/08/2010 11:09 PM 4,009,984 blastdbcmd.exe 03/08/2010 11:09 PM 1,810,432 blastdb_aliastool.exe 03/08/2010 11:09 PM 6,225,920 blastn.exe 03/08/2010 11:09 PM 6,221,824 blastp.exe 03/08/2010 11:09 PM 6,213,632 blastx.exe 03/08/2010 11:09 PM 5,316,608 blast_formatter.exe 03/08/2010 11:09 PM 3,215,360 convert2blastmask.exe 03/08/2010 11:09 PM 3,211,264 dustmasker.exe 03/08/2010 11:09 PM 51,178 legacy_blast.pl 03/08/2010 11:09 PM 3,866,624 makeblastdb.exe 03/08/2010 11:09 PM 3,612,672 makembindex.exe 03/08/2010 11:09 PM 6,344,704 psiblast.exe 03/08/2010 11:09 PM 6,201,344 rpsblast.exe 03/08/2010 11:09 PM 6,205,440 rpstblastn.exe 03/08/2010 11:09 PM 3,608,576 segmasker.exe 03/08/2010 11:09 PM 6,320,128 tblastn.exe 03/08/2010 11:09 PM 6,209,536 tblastx.exe 03/08/2010 11:09 PM 10,010 update_blastdb.pl 03/08/2010 11:09 PM 3,530,752 windowmasker.exe 20 File(s) 84,975,364 bytes 2 Dir(s) 122,390,626,304 bytes free I have an ncbi.ini file in my windows directory that contains: [NCBI] DATA=D:\blast-2.2.23+\data [BLAST] BLASTDB=D:\blast-2.2.23+\db Here's what my environmental variables looks like: http://old.nabble.com/file/p28434717/environmental%2Bvariables.jpg Help would be very, very appreciated! -- View this message in context: http://old.nabble.com/Bio%3A%3ATools%3A%3ARun%3A%3AStandAloneBlast-can%27t-find-path-to-blastall-tp28434717p28434717.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Fri May 7 16:07:58 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 16:07:58 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall In-Reply-To: <28434717.post@talk.nabble.com> References: <28434717.post@talk.nabble.com> Message-ID: <670B2E492D9E4D158618EC4750C595AF@NewLife> You've got blast+, so have a look at Bio::Tools::Run::StandAloneBlastPlus, should solve it. MAJ ----- Original Message ----- From: "mindlessbrain" To: Sent: Monday, May 03, 2010 8:42 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall > > Hey all, > > I'm trying to run some code for StandAloneBLast in Windows Vista: > > [code] > #!/usr/bin/perl > > use Bio::DB::SwissProt; > use Bio::Tools::Run::StandAloneBlast; > > BEGIN > { > $ENV{PATH}="D:/blast-2.2.23+/bin/:"; > } > > my $database = new Bio::DB::SwissProt; > my $query = $database->get_Seq_by_id('TAUD_ECOLI'); > > my $factory = Bio::Tools::Run::StandAloneBlast->new( > 'program' => 'blastp', > 'database' => 'swissprot', > _READMETHOD => "Blast" > ); > my $blast_report = $factory->blastall($query); > my $result = $blast_report->next_result; > while( my $hit = $result->next_hit()) { > print "\thit name: ", $hit->name(), > " significance: ", $hit->significance(), "\n"; > } > [/code] > > I installed BLAST from the NCBI website. I get this when I run dir on the > bin: > > D:\blast-2.2.23+\bin>dir > Volume in drive D has no label. > Volume Serial Number is 224C-0190 > > Directory of D:\blast-2.2.23+\bin > > 05/03/2010 03:02 PM . > 05/03/2010 03:02 PM .. > 03/08/2010 11:09 PM 2,789,376 blastdbcheck.exe > 03/08/2010 11:09 PM 4,009,984 blastdbcmd.exe > 03/08/2010 11:09 PM 1,810,432 blastdb_aliastool.exe > 03/08/2010 11:09 PM 6,225,920 blastn.exe > 03/08/2010 11:09 PM 6,221,824 blastp.exe > 03/08/2010 11:09 PM 6,213,632 blastx.exe > 03/08/2010 11:09 PM 5,316,608 blast_formatter.exe > 03/08/2010 11:09 PM 3,215,360 convert2blastmask.exe > 03/08/2010 11:09 PM 3,211,264 dustmasker.exe > 03/08/2010 11:09 PM 51,178 legacy_blast.pl > 03/08/2010 11:09 PM 3,866,624 makeblastdb.exe > 03/08/2010 11:09 PM 3,612,672 makembindex.exe > 03/08/2010 11:09 PM 6,344,704 psiblast.exe > 03/08/2010 11:09 PM 6,201,344 rpsblast.exe > 03/08/2010 11:09 PM 6,205,440 rpstblastn.exe > 03/08/2010 11:09 PM 3,608,576 segmasker.exe > 03/08/2010 11:09 PM 6,320,128 tblastn.exe > 03/08/2010 11:09 PM 6,209,536 tblastx.exe > 03/08/2010 11:09 PM 10,010 update_blastdb.pl > 03/08/2010 11:09 PM 3,530,752 windowmasker.exe > 20 File(s) 84,975,364 bytes > 2 Dir(s) 122,390,626,304 bytes free > > I have an ncbi.ini file in my windows directory that contains: > [NCBI] > DATA=D:\blast-2.2.23+\data > [BLAST] > BLASTDB=D:\blast-2.2.23+\db > > Here's what my environmental variables looks like: > > http://old.nabble.com/file/p28434717/environmental%2Bvariables.jpg > > Help would be very, very appreciated! > > > -- > View this message in context: > http://old.nabble.com/Bio%3A%3ATools%3A%3ARun%3A%3AStandAloneBlast-can%27t-find-path-to-blastall-tp28434717p28434717.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From manchunjohn-ma at uiowa.edu Fri May 7 16:17:52 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Fri, 7 May 2010 15:17:52 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> Hi, Right now I'm migrating some of my bioperl scripts from remote to stand-alone BLAST, and stumbled at how RemoteBlast->submit_blast and the StandAloneNCBIBlast->blastall deal with an array parameter. Common code for both versions: My p3_machine=Tools::Run::Primer3(@p3_parameters); [...] My $primer3_results=$p3_machine->run($seq); My $p3_results=$primers3_results->next_primer(); My @temp_primer_info=$p3_results->get_primer; My %primer_info; $primer_info{primer}[0]=$temp_primer_info[0]->seq; $primer_info{primer}[1]=$temp_primer_info[1]->seq; $primer_into{primer}[0]->display_id('F'); $primer_into{primer}[1]->display_id('R'); Code using RemoteBlast: My $remote_blast_machine=Tools::Run::RemoteBlast->new(@remote_blast_params) ; [Parameter setting skipped] $my $r=$remote_blast_machine->submit_blast(@primer_info{primer}); [etc, etc for iteration] Using this code, I have been able to put both sequences forth to the NCBI server and obtain results accordingly; each result object contains hits from an input sequence. However, when I switched to StandAlongBlast this way: My $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_blast _params); My $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); While (my $result=$blast_report->next_result()){ [etc, etc for iteration] } There is only one result object for sequence "F"-- and even so the loop went through twice. I would first suspect I made a mistake first-- but where? John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 From sumanth41277 at yahoo.com Fri May 7 17:34:53 2010 From: sumanth41277 at yahoo.com (polsum) Date: Fri, 7 May 2010 14:34:53 -0700 (PDT) Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU Message-ID: <28491725.post@talk.nabble.com> Hi - We have a pretty powerful computer with Dual-Quadcore intel Xeon w5580 prcoessor with 24 GB ram. When I use Bioperl programs for routine operations like Blastn and blast parsing etc. the programs dont seem to utilize the computer power to the fullest. I mean they just use one of the 8 cores and only 8GB of RAM. Is there a way to ask Perl to use all the available power? I have 64 bit windows and 64 bit Ubuntu and Ubuntu is definitely faster but still it also doesnt use entire cores of the cpu. thanks in advance -- View this message in context: http://old.nabble.com/Bio-Perl-and-multiple-cores-of-CPU-tp28491725p28491725.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Fri May 7 17:46:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 7 May 2010 16:46:24 -0500 Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU In-Reply-To: <28491725.post@talk.nabble.com> References: <28491725.post@talk.nabble.com> Message-ID: You can specify the number of processors to use. With legacy BLAST this is -a 8, with BLAST+ I think this is -num_threads 8 (with the explicit caveat I haven't tried the latter much, so no guarantees, we're not liable for explosions and such). chris On May 7, 2010, at 4:34 PM, polsum wrote: > Hi - We have a pretty powerful computer with Dual-Quadcore intel Xeon w5580 > prcoessor with 24 GB ram. When I use Bioperl programs for routine operations > like Blastn and blast parsing etc. the programs dont seem to utilize the > computer power to the fullest. I mean they just use one of the 8 cores and > only 8GB of RAM. Is there a way to ask Perl to use all the available power? > I have 64 bit windows and 64 bit Ubuntu and Ubuntu is definitely faster but > still it also doesnt use entire cores of the cpu. > > thanks in advance > -- > View this message in context: http://old.nabble.com/Bio-Perl-and-multiple-cores-of-CPU-tp28491725p28491725.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri May 7 18:14:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 00:14:24 +0200 Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU In-Reply-To: References: <28491725.post@talk.nabble.com> Message-ID: On May 7, 2010, at 11:46 PM, Chris Fields wrote: > With legacy BLAST this is -a 8, with BLAST+ I think this is -num_threads 8 (with the explicit caveat I haven't tried the latter much, so no guarantees, we're not liable for explosions and such). Once other caveat if you use BLAST+: be sure you have the latest version 2.2.23. In my informal testing, the num_threads option wasn't working correctly in 2.2.22. Blast parsing will still be single-threaded, by the way. BioPerl programs, like everything else unfortunately, need to explicitly spawn multiple threads or forks to take advantage of multiple cores. While I've never done it myself, I ran across this post which may be helpful in case you want to try it: http://computationalbiologynews.blogspot.com/2008/07/harnessing-power-of-multicore.html Dave From David.Messina at sbc.su.se Fri May 7 18:34:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 00:34:10 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> Hi John, You're right that passing parameters should work similarly for both RemoteBlast and StandAloneBlast, but without seeing exactly the parameter array you're passing, it's not possible to identify the problem. Could you perhaps post a small, but complete test program that demonstrates the problem? Dave PS ? is this the actual code you ran? > My $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_blast_params); > My $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); > While (my $result=$blast_report->next_result()){ > [etc, etc for iteration] > } I'm guessing you were paraphrasing, but I ask because My, with a capital "M", will generate an error, you're calling Tools::Run::StandAloneBlast instead of Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), i.e. it should be: my $Stand_alone_blast_machine = Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); From florent.angly at gmail.com Sat May 8 00:42:18 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 08 May 2010 14:42:18 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: References: <28491725.post@talk.nabble.com> Message-ID: <4BE4EBAA.5010709@gmail.com> Hi all, I am working on updating some of the Bio::Assembly::* modules right now. I need to sort a list of IDs. These IDs could be numbers, "words" or a mix of the two, for example: @arr = ('singlet1', 'contig10', 'contig2', '101', '3'); I cannot sort them with the numerical sort: sort { $a <=> $b } @array This would generates warnings because some of'singlet1' the IDs are numbers. I cannot sort them lexically: sort @array Lexical sorting would not take into account numbers properly and result in: singlet1 contig10 contig2 3 101 So, what I really need is natural sorting, which is not in any core function of Perl. I'd like to use the CPAN module Sort::Naturally for this purpose: nsort @arr The results would be what we expect, i.e.: 3 101 contig2 contig10 singlet1 Can I add this module as an additional dependency of BioPerl? I imagine that some other modules might want to use this. On the assembly side, it would be used by the writing methods of Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around my problem that doesn't require any external module? Florent From manchunjohn-ma at uiowa.edu Sat May 8 17:37:13 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 16:37:13 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> Hi, And that's my problem here: I checked the BLAST output, and the two sequences did get aligned-- just that SearchIO, in whatever flavour (I tried blast, blasttable and blastxml) didn't see to do to the next result when next_result() is called. It knows there're two results, but still getting the first result on the second call. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 4:33 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Hi John, Please remember to keep Cc'ing the mailing list so that everyone can participate in the discussion. If I understand your question correctly, yes, you can iterate through the blast results in a report called $blast_report using next_result. If you haven't already, you may want to look at the SearchIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SearchIO (although the BioPerl website appears to be temporarily offline, so check back a little later.) Dave On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > Hi, > > I have did some more investigation and found that the issue is > probably that of SearchIO rather than StandAloneBlast--in case I made > a mistake, so if I parsed a standard @array of Bio::Seq objects into > StandAloneBlast (blastn with SearchIO output), the result for each of > the seqs in the array can be assessed by $blast_report->next_result, > right? > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, May 07, 2010 5:34 PM > To: Ma, Man Chun John > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Array Handling Differences between > RemoteBlast and StandAloneBlast > > Hi John, > > You're right that passing parameters should work similarly for both > RemoteBlast and StandAloneBlast, but without seeing exactly the > parameter array you're passing, it's not possible to identify the > problem. > > Could you perhaps post a small, but complete test program that > demonstrates the problem? > > > Dave > > > PS - is this the actual code you ran? > >> My >> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bl >> a >> st_params); My >> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >> While (my $result=$blast_report->next_result()){ >> [etc, etc for iteration] >> } > > I'm guessing you were paraphrasing, but I ask because My, with a > capital "M", will generate an error, you're calling > Tools::Run::StandAloneBlast instead of > Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), i.e. it should be: > > my $Stand_alone_blast_machine = > Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); > > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5095 (20100507) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > From David.Messina at sbc.su.se Sat May 8 17:32:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:32:42 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> Hi John, Please remember to keep Cc'ing the mailing list so that everyone can participate in the discussion. If I understand your question correctly, yes, you can iterate through the blast results in a report called $blast_report using next_result. If you haven't already, you may want to look at the SearchIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SearchIO (although the BioPerl website appears to be temporarily offline, so check back a little later.) Dave On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > Hi, > > I have did some more investigation and found that the issue is probably > that of SearchIO rather than StandAloneBlast--in case I made a mistake, > so if I parsed a standard @array of Bio::Seq objects into > StandAloneBlast (blastn with SearchIO output), the result for each of > the seqs in the array can be assessed by $blast_report->next_result, > right? > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, May 07, 2010 5:34 PM > To: Ma, Man Chun John > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast > and StandAloneBlast > > Hi John, > > You're right that passing parameters should work similarly for both > RemoteBlast and StandAloneBlast, but without seeing exactly the > parameter array you're passing, it's not possible to identify the > problem. > > Could you perhaps post a small, but complete test program that > demonstrates the problem? > > > Dave > > > PS - is this the actual code you ran? > >> My >> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bla >> st_params); My >> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >> While (my $result=$blast_report->next_result()){ >> [etc, etc for iteration] >> } > > I'm guessing you were paraphrasing, but I ask because My, with a capital > "M", will generate an error, you're calling Tools::Run::StandAloneBlast > instead of Bio::Tools::Run::StandAloneBlast, and there's no method call > to new(), i.e. it should be: > > my $Stand_alone_blast_machine = > Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); > > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5095 (20100507) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > From cjfields at illinois.edu Sat May 8 15:41:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 14:41:58 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> Lincoln, Just an update, I've added you, as well as Dave and Florent. Still not sure about the bioperl.org address myself, but it seems to work for Dave and others. We posted to root-l and Chris D. to make sure that's correct or if we should be using open-bio.org instead, but I believe it is. chris On May 6, 2010, at 7:01 AM, Lincoln Stein wrote: > My github username is lstein and I've just added lstein at bioperl.org to my > linked email addresses. I hope I have a bioperl.org address; I never use it! > > Lincoln > > On Wed, May 5, 2010 at 10:46 AM, Chris Fields wrote: > >> All, >> >> I would like to finalize moving over to git/github very soon. We're sort >> of in limbo on this, so it needs to progress forward. We'll need to do some >> initial cleanup after the move (Heikki is already doing a few things on the >> test repo, which we'll need to diff over to the new one). >> >> So with that in mind, here are my thoughts. This is copied over to this >> wiki page, in case you don't want to reply here: >> >> http://www.bioperl.org/wiki/From_SVN_to_Git >> >> (thanks Mark!) >> >> 1) Timeline >> >> When? Sooner the better (weeks as opposed to months). Our anon. svn is >> down, likely permanently ( >> http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). >> >> 2) Migration strategy >> >> Now mainly worked out using svn2git, which is very fast. We would need to >> make the svn repo on dev read-only during this transition. My guess is it >> would take very little time. Do we want to retain the git-SVN metadata on >> commits? This is viewable with our current read-only mirror on github: >> >> >> http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca >> >> 3) Developers >> >> Not everyone has a github account. Recent ones who I couldn't find on >> github: dmessina, fangly >> >> The current authors file used for mapping commit authors to emails used >> their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I >> think, once one has signed up with github, you can add that same address to >> your current ones, and it should map to your github account. If we use >> dev.open-bio.org as our central git repo, we won't need to go through with >> that, but we will need a viewable version of dev available somehow (mirrored >> on github or otherwise). Speaking of... >> >> 4) Development strategy >> >> Are we sticking with a single centralized repo (SVN-like)? Will that be >> github, or will github be a downstream repo to our work on dev? We could >> feasibly have github be an active, forkable repo that could be >> bidirectionally synced with dev, but I'm not sure of the logistics on this >> (this popped up before with svn migration and was rejected b/c it was >> considered too difficult to maintain). >> >> Git makes it very easy to make branches and merge in code to trunk. With >> that in mind, I would highly suggest we start working on branches for almost >> everything and merge over to trunk. There is very little to no overhead in >> doing so with git. >> >> I like this strategy (Mark Jensen pointed this out): >> http://nvie.com/git-model >> >> Also, several points were raised in a related project (Parrot) considering >> a move to git/github from svn. One in particular was that git allows >> destructive commits. Jonathan Leto indicated we can set up specific >> branches that don't allow this, using commit hooks, so my guess is the >> master branch and release branches wouldn't allow rewinds. >> >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? >> >> http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod >> >> This is easy with github and forks. >> >> 6) SVN Read/Write to GitHub >> >> It was recently announced that one can access a github repo using >> subversion as read-only, and just yesterday experimental write to github is >> allowed: >> >> http://github.com/blog/644-subversion-write-support >> >> I can see allowing read-only svn, but write support is still experimental. >> Do we want to allow that? >> >> 7) Others? >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 8 15:23:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 14:23:35 -0500 Subject: [Bioperl-l] GitHub migration Wednesday Message-ID: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> Seems like we're all pretty much in agreement that this needs to happen sooner than later. So, I'm scheduling the git/github migration aggressively, for this Wednesday. Key steps: 1) Notify the list prior to locking the svn repo and/or making it read-only. 2) We need to set up post-commit hooks to forward commit messages on to bioperl-guts and elsewhere. I have tried this out off github and so far it's a little problematic (not working off bioperl-test, but working off my own github commits). 3) The current bioperl github repos will all be replaced with their live counterparts (branches and all), generated off the latest SVN via svn2git (including metadata). I'll have to reinstate collaborators at that time, but the author mapping should be the same as before (DEVACCOUNT at bioperl.org, where DEVACCOUNT is one's user name on dev.open-bio.org). 4) Update the wiki pages as needed to point to the github repo instead of the code.open-bio.org one. Also, I'm sure this will catch many devs not paying attention to the list by surprise, so we'll need a developer migration page set up. Anything else? chris From cjfields at illinois.edu Sat May 8 16:33:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 15:33:36 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <7EC12A62-249D-4816-9FDD-6D321095AA4B@illinois.edu> I don't have a problem with this personally, seeing how complex the code can get for natural sorting. It would become a recommended module, though, not a full dependency. chris On May 7, 2010, at 11:42 PM, Florent Angly wrote: > Hi all, > > I am working on updating some of the Bio::Assembly::* modules right now. > I need to sort a list of IDs. These IDs could be numbers, "words" or a mix of the two, for example: @arr = ('singlet1', 'contig10', 'contig2', '101', '3'); > > I cannot sort them with the numerical sort: sort { $a <=> $b } @array > This would generates warnings because some of'singlet1' the IDs are numbers. > > I cannot sort them lexically: sort @array > Lexical sorting would not take into account numbers properly and result in: > singlet1 contig10 contig2 3 101 > > So, what I really need is natural sorting, which is not in any core function of Perl. I'd like to use the CPAN module Sort::Naturally for this purpose: nsort @arr > The results would be what we expect, i.e.: > 3 101 contig2 contig10 singlet1 > > Can I add this module as an additional dependency of BioPerl? I imagine that some other modules might want to use this. On the assembly side, it would be used by the writing methods of Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around my problem that doesn't require any external module? > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Sat May 8 17:47:07 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:47:07 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> There was a report last week of a possible problem with BLAST parsing introduced in the last few days. I don't know what the status of that is, but it's possible that it's related. In any case, if you post your code and the blast report you're parsing, we might be able to diagnose the problem. Also, what version of BioPerl are you using? Dave On May 8, 2010, at 11:37 PM, Ma, Man Chun John wrote: > Hi, > > And that's my problem here: I checked the BLAST output, and the two > sequences did get aligned-- just that SearchIO, in whatever flavour (I > tried blast, blasttable and blastxml) didn't see to do to the next > result when next_result() is called. It knows there're two results, but > still getting the first result on the second call. > > Cheers, > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Saturday, May 08, 2010 4:33 PM > To: Ma, Man Chun John > Cc: BioPerl List > Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast > and StandAloneBlast > > Hi John, > > Please remember to keep Cc'ing the mailing list so that everyone can > participate in the discussion. > > If I understand your question correctly, yes, you can iterate through > the blast results in a report called $blast_report using next_result. > > If you haven't already, you may want to look at the SearchIO HOWTO: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > (although the BioPerl website appears to be temporarily offline, so > check back a little later.) > > > Dave > > > > On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > >> Hi, >> >> I have did some more investigation and found that the issue is >> probably that of SearchIO rather than StandAloneBlast--in case I made >> a mistake, so if I parsed a standard @array of Bio::Seq objects into >> StandAloneBlast (blastn with SearchIO output), the result for each of >> the seqs in the array can be assessed by $blast_report->next_result, >> right? >> >> >> John MC Ma >> Graduate Assistant >> Kwitek Lab >> Department of Internal Medicine >> 3125E MERF >> 375 Newton Road >> Iowa City IA 52242 >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, May 07, 2010 5:34 PM >> To: Ma, Man Chun John >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Array Handling Differences between >> RemoteBlast and StandAloneBlast >> >> Hi John, >> >> You're right that passing parameters should work similarly for both >> RemoteBlast and StandAloneBlast, but without seeing exactly the >> parameter array you're passing, it's not possible to identify the >> problem. >> >> Could you perhaps post a small, but complete test program that >> demonstrates the problem? >> >> >> Dave >> >> >> PS - is this the actual code you ran? >> >>> My >>> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bl >>> a >>> st_params); My >>> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >>> While (my $result=$blast_report->next_result()){ >>> [etc, etc for iteration] >>> } >> >> I'm guessing you were paraphrasing, but I ask because My, with a >> capital "M", will generate an error, you're calling >> Tools::Run::StandAloneBlast instead of >> Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), > i.e. it should be: >> >> my $Stand_alone_blast_machine = >> Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); >> >> >> >> __________ Information from ESET NOD32 Antivirus, version of virus >> signature database 5095 (20100507) __________ >> >> The message was checked by ESET NOD32 Antivirus. >> >> http://www.eset.com >> > From cjfields at illinois.edu Sat May 8 14:59:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 13:59:13 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> Message-ID: <73BDDA86-F487-484F-A87C-1DF37CDEA7D8@illinois.edu> On May 7, 2010, at 3:51 AM, Peter wrote: > On Thu, May 6, 2010 at 10:59 PM, Dave Messina wrote: >>> * I don't think we should allow any svn write support. Anybody that >>> truly cannot get over the hump can send patches to the list. >> >> Unless svn commits are somehow problematic, is there another reason to disallow it? > >> From my reading of the github blog post, svn merges are potentially problematic. > http://github.com/blog/644-subversion-write-support > > Peter Yes, they're still working out the kinks. I think we would only support read until the bugs get worked out of write. chris From David.Messina at sbc.su.se Sat May 8 17:33:53 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:33:53 +0200 Subject: [Bioperl-l] wiki offline? Message-ID: <064068F0-FF78-4557-9356-54CB1DB1783B@sbc.su.se> Hi, The BioPerl website appears to be down, at least from my spot on the net ? could someone please look into it? Thanks, Dave From David.Messina at sbc.su.se Sat May 8 16:07:02 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 22:07:02 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> Message-ID: <9A27A797-027E-445D-A8C3-6A7B6FBF4F13@sbc.su.se> Thanks, Chris. It took a few days for github to "notice" my @bioperl.org address and connect it to my commits. Since Lincoln added his @bioperl.org email to github a little later than I did, it may just be still trickling through the github pipes. Dave From florent.angly at gmail.com Sat May 8 07:34:15 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 08 May 2010 21:34:15 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <4BE54C37.7020304@gmail.com> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. It looks like the Bio::SeqIO modules tests could use it as well. Cheers, Florent From David.Messina at sbc.su.se Sat May 8 18:40:22 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 9 May 2010 00:40:22 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Message-ID: Hi John, Your blast report works fine for me with the following code taken from the Bio::SearchIO HOWTO: #!usr/bin/perl use strict; use warnings; use Bio::SearchIO; my $in = Bio::SearchIO->new('-file' => 'blastout', '-format' => 'blast'); while(my $result = $in->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; } } } ## Here is the output: Query=F Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Query=F Hit=ref|NC_005117.2|NC_005117 Length=18 Percent_id=100 Query=F Hit=ref|NC_005105.2|NC_005105 Length=18 Percent_id=100 Query=R Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Dave From manchunjohn-ma at uiowa.edu Sat May 8 18:43:11 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 17:43:11 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> Hi Dave, Yes, I tried to write a separate script to parse all those files, and they came out fine. It just happens when I run the entire target script; and if I replace the StandAloneBlast part with the standard RemoteBlast code, it's file, too. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 5:40 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Hi John, Your blast report works fine for me with the following code taken from the Bio::SearchIO HOWTO: #!usr/bin/perl use strict; use warnings; use Bio::SearchIO; my $in = Bio::SearchIO->new('-file' => 'blastout', '-format' => 'blast'); while(my $result = $in->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; } } } ## Here is the output: Query=F Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Query=F Hit=ref|NC_005117.2|NC_005117 Length=18 Percent_id=100 Query=F Hit=ref|NC_005105.2|NC_005105 Length=18 Percent_id=100 Query=R Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Dave From David.Messina at sbc.su.se Sat May 8 18:58:41 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 9 May 2010 00:58:41 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <41281436-08D3-46F9-BDD0-A8D5306DB412@sbc.su.se> I cannot help you without seeing the code. It sounds like you've already tested the parsing part in a script by itself and that works. If you haven't already, you can test the running Blast part in its own script and see if that works. If both parts work separately, then there's something wrong with the way they have been put together. Dave From jason at bioperl.org Sat May 8 12:06:28 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 08 May 2010 09:06:28 -0700 Subject: [Bioperl-l] Bio::Align - alignment by position? In-Reply-To: <28478022.post@talk.nabble.com> References: <28478022.post@talk.nabble.com> Message-ID: <4BE58C04.8090901@bioperl.org> Not clear what you want to make. You want a new alignment that only contains the columns in your list or You want to extract each column in your list one by one? visitor555 wrote, On 5/6/10 11:51 AM: > Hi, > > I have a list alignment positions and I want to get each column them from > the alignment. If I slice the alignment the sequence with gaps in these > positions disappear. I can rotate on each seq and then split the sequence. > Is there better way to go over the alignment position by position? > > thanks ! > From jason at bioperl.org Sat May 8 12:12:26 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 08 May 2010 09:12:26 -0700 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <4BE58D6A.9080601@bioperl.org> Unless necessary I don't know if adding yet another dependency is warranted here. I don't know how complicated the words will be but can't you just strip out the numbers and do this in a schwartzian transformation? #!/usr/bin/perl -w use strict; my @arr = qw(single1 contig10 101 contig2 3); my @sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ /(\d+)/, $_] } @arr; print join("\n", at sorted),"\n"; But I'm not sure how do you want to sort 10 vs contig10 vs singlet10 reliably? -jason Florent Angly wrote, On 5/7/10 9:42 PM: > Hi all, > > I am working on updating some of the Bio::Assembly::* modules right now. > I need to sort a list of IDs. These IDs could be numbers, "words" or a > mix of the two, for example: @arr = ('singlet1', 'contig10', > 'contig2', '101', '3'); > > I cannot sort them with the numerical sort: sort { $a <=> $b } @array > This would generates warnings because some of'singlet1' the IDs are > numbers. > > I cannot sort them lexically: sort @array > Lexical sorting would not take into account numbers properly and > result in: > singlet1 contig10 contig2 3 101 > > So, what I really need is natural sorting, which is not in any core > function of Perl. I'd like to use the CPAN module Sort::Naturally for > this purpose: nsort @arr > The results would be what we expect, i.e.: > 3 101 contig2 contig10 singlet1 > > Can I add this module as an additional dependency of BioPerl? I > imagine that some other modules might want to use this. On the > assembly side, it would be used by the writing methods of > Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around > my problem that doesn't require any external module? > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 8 19:47:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 18:47:58 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE54C37.7020304@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> Message-ID: To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. chris On May 8, 2010, at 6:34 AM, Florent Angly wrote: > Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. > > It looks like the Bio::SeqIO modules tests could use it as well. > > Cheers, > > Florent > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 8 20:02:28 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 19:02:28 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> Message-ID: <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. chris On May 8, 2010, at 6:47 PM, Chris Fields wrote: > To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. > > chris > > On May 8, 2010, at 6:34 AM, Florent Angly wrote: > >> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >> >> It looks like the Bio::SeqIO modules tests could use it as well. >> >> Cheers, >> >> Florent >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sat May 8 19:30:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 8 May 2010 19:30:48 -0400 Subject: [Bioperl-l] GitHub migration Wednesday In-Reply-To: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> References: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> Message-ID: <9B5043D308B942AEB4F9AA199470812B@NewLife> Sail on, great Ship of State. ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Saturday, May 08, 2010 3:23 PM Subject: [Bioperl-l] GitHub migration Wednesday > Seems like we're all pretty much in agreement that this needs to happen sooner > than later. So, I'm scheduling the git/github migration aggressively, for > this Wednesday. Key steps: > > 1) Notify the list prior to locking the svn repo and/or making it read-only. > > 2) We need to set up post-commit hooks to forward commit messages on to > bioperl-guts and elsewhere. I have tried this out off github and so far it's > a little problematic (not working off bioperl-test, but working off my own > github commits). > > 3) The current bioperl github repos will all be replaced with their live > counterparts (branches and all), generated off the latest SVN via svn2git > (including metadata). I'll have to reinstate collaborators at that time, but > the author mapping should be the same as before (DEVACCOUNT at bioperl.org, where > DEVACCOUNT is one's user name on dev.open-bio.org). > > 4) Update the wiki pages as needed to point to the github repo instead of the > code.open-bio.org one. Also, I'm sure this will catch many devs not paying > attention to the list by surprise, so we'll need a developer migration page > set up. > > Anything else? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From manchunjohn-ma at uiowa.edu Sat May 8 17:59:08 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 16:59:08 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Hi, I use bioperl-live 16950 with blast 2.2.23 I haven't been able to put together a simplier script with problem at this time, so I'd put the BLASTn outputs (in blast, blasttable and blastxml formats) here-- they look perfectly normal except that look like 2 separate output files appended together. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 4:47 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast There was a report last week of a possible problem with BLAST parsing introduced in the last few days. I don't know what the status of that is, but it's possible that it's related. In any case, if you post your code and the blast report you're parsing, we might be able to diagnose the problem. Also, what version of BioPerl are you using? Dave On May 8, 2010, at 11:37 PM, Ma, Man Chun John wrote: > Hi, > > And that's my problem here: I checked the BLAST output, and the two > sequences did get aligned-- just that SearchIO, in whatever flavour (I > tried blast, blasttable and blastxml) didn't see to do to the next > result when next_result() is called. It knows there're two results, > but still getting the first result on the second call. > > Cheers, > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Saturday, May 08, 2010 4:33 PM > To: Ma, Man Chun John > Cc: BioPerl List > Subject: Re: [Bioperl-l] Array Handling Differences between > RemoteBlast and StandAloneBlast > > Hi John, > > Please remember to keep Cc'ing the mailing list so that everyone can > participate in the discussion. > > If I understand your question correctly, yes, you can iterate through > the blast results in a report called $blast_report using next_result. > > If you haven't already, you may want to look at the SearchIO HOWTO: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > (although the BioPerl website appears to be temporarily offline, so > check back a little later.) > > > Dave > > > > On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > >> Hi, >> >> I have did some more investigation and found that the issue is >> probably that of SearchIO rather than StandAloneBlast--in case I made >> a mistake, so if I parsed a standard @array of Bio::Seq objects into >> StandAloneBlast (blastn with SearchIO output), the result for each of >> the seqs in the array can be assessed by $blast_report->next_result, >> right? >> >> >> John MC Ma >> Graduate Assistant >> Kwitek Lab >> Department of Internal Medicine >> 3125E MERF >> 375 Newton Road >> Iowa City IA 52242 >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, May 07, 2010 5:34 PM >> To: Ma, Man Chun John >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Array Handling Differences between >> RemoteBlast and StandAloneBlast >> >> Hi John, >> >> You're right that passing parameters should work similarly for both >> RemoteBlast and StandAloneBlast, but without seeing exactly the >> parameter array you're passing, it's not possible to identify the >> problem. >> >> Could you perhaps post a small, but complete test program that >> demonstrates the problem? >> >> >> Dave >> >> >> PS - is this the actual code you ran? >> >>> My >>> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_b >>> l >>> a >>> st_params); My >>> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >>> While (my $result=$blast_report->next_result()){ >>> [etc, etc for iteration] >>> } >> >> I'm guessing you were paraphrasing, but I ask because My, with a >> capital "M", will generate an error, you're calling >> Tools::Run::StandAloneBlast instead of >> Bio::Tools::Run::StandAloneBlast, and there's no method call to >> new(), > i.e. it should be: >> >> my $Stand_alone_blast_machine = >> Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); >> >> >> >> __________ Information from ESET NOD32 Antivirus, version of virus >> signature database 5095 (20100507) __________ >> >> The message was checked by ESET NOD32 Antivirus. >> >> http://www.eset.com >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: blasttable Type: application/octet-stream Size: 842 bytes Desc: blasttable URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blast.xml Type: text/xml Size: 7598 bytes Desc: blast.xml URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastout Type: application/octet-stream Size: 3576 bytes Desc: blastout URL: From florent.angly at gmail.com Sun May 9 01:12:03 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sun, 09 May 2010 15:12:03 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE58D6A.9080601@bioperl.org> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE58D6A.9080601@bioperl.org> Message-ID: <4BE64423.1040104@gmail.com> Within one assembly file, contig IDs typically tend to follow one formatting convention. The two most popular ones are a numerical ID, or an alphanumeric ID, such as 'contig13'. The later case already requires natural sorting. There is no way to know in advance what format to expect, and in fact, the format being specified by the user, it could be arbitrarily complicated, although probably, IDs would be sorted naturally. I will follow Chris's recommendation of using Sort::Naturally as a recommended package. The users who don't have this dependency will have their IDs sorted in a safe way, lexically. Florent On 09/05/10 02:12, Jason Stajich wrote: > Unless necessary I don't know if adding yet another dependency is > warranted here. > > I don't know how complicated the words will be but can't you just > strip out the numbers and do this in a schwartzian transformation? > > #!/usr/bin/perl -w > use strict; > my @arr = qw(single1 contig10 101 contig2 3); > my @sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ > /(\d+)/, $_] } @arr; > print join("\n", at sorted),"\n"; > > But I'm not sure how do you want to sort > 10 vs contig10 vs singlet10 reliably? > > -jason > > Florent Angly wrote, On 5/7/10 9:42 PM: >> Hi all, >> >> I am working on updating some of the Bio::Assembly::* modules right now. >> I need to sort a list of IDs. These IDs could be numbers, "words" or >> a mix of the two, for example: @arr = ('singlet1', >> 'contig10', 'contig2', '101', '3'); >> >> I cannot sort them with the numerical sort: sort { $a <=> $b } @array >> This would generates warnings because some of'singlet1' the IDs are >> numbers. >> >> I cannot sort them lexically: sort @array >> Lexical sorting would not take into account numbers properly and >> result in: >> singlet1 contig10 contig2 3 101 >> >> So, what I really need is natural sorting, which is not in any core >> function of Perl. I'd like to use the CPAN module Sort::Naturally for >> this purpose: nsort @arr >> The results would be what we expect, i.e.: >> 3 101 contig2 contig10 singlet1 >> >> Can I add this module as an additional dependency of BioPerl? I >> imagine that some other modules might want to use this. On the >> assembly side, it would be used by the writing methods of >> Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around >> my problem that doesn't require any external module? >> >> Florent >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Sun May 9 03:26:19 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sun, 09 May 2010 17:26:19 +1000 Subject: [Bioperl-l] Read/write round-tripping Was: Re: New Bioperl dependency? Sort::Naturally In-Reply-To: <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> Message-ID: <4BE6639B.6060004@gmail.com> Chris, I've thought some more on the problem and I now agree with you that round-tripping at the object-level is more powerful. It has the problem that some objects are given IDs dynamically every time, which means that identical input files won't have an identical object. > is_deeply( $obj_out , $obj_in , 'deep compare' ); > not ok 1 - deep compare > # Failed test 'deep compare' > # at ./test_roundtrip.pl line 33. > # Structures begin differing at: > # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '56438592' > # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '54980512' > 1..1 > # Looks like you failed 1 test of 1. And when I re-run this again: > not ok 1 - deep compare > # Failed test 'deep compare' > # at ./test_roundtrip.pl line 33. > # Structures begin differing at: > # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '47763264' > # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '46305184' > 1..1 > # Looks like you failed 1 test of 1. Note how the value of _btree changes everytime. Maybe using Test::Deep would be a good approach (http://search.cpan.org/~fdaly/Test-Deep-0.106/lib/Test/Deep.pod): > Where it becomes more interesting is in allowing you to do something > besides simple exact comparisons. With strings, the |eq| operator > checks that 2 strings are exactly equal but sometimes that's not what > you want. When you don't know exactly what the string should be but > you do know some things about how it should look, |eq| is no good and > you must use pattern matching instead. Test::Deep provides pattern > matching for complex data structures Florent On 09/05/10 10:02, Chris Fields wrote: > Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. > > chris > > On May 8, 2010, at 6:47 PM, Chris Fields wrote: > > >> To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. >> >> chris >> >> On May 8, 2010, at 6:34 AM, Florent Angly wrote: >> >> >>> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >>> >>> It looks like the Bio::SeqIO modules tests could use it as well. >>> >>> Cheers, >>> >>> Florent >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From ibi2008006 at iiita.ac.in Sun May 9 10:46:28 2010 From: ibi2008006 at iiita.ac.in (roserp) Date: Sun, 9 May 2010 07:46:28 -0700 (PDT) Subject: [Bioperl-l] where to find standard substitution matrices Message-ID: <28503204.post@talk.nabble.com> hi , I want blosum62, blosum80 , pam30, and pam70 matrices. I am getting different values in different sites for these matrices. can anyone suggest some authenticated site for getting these ?? thanks in advance -- View this message in context: http://old.nabble.com/where-to-find-standard-substitution-matrices-tp28503204p28503204.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From razi.khaja at gmail.com Sun May 9 15:23:47 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 15:23:47 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: Attached (blast.pm.diff) is a patch that fixes Heikki's problem. Can someone advise an appropriate way to have this patch applied, given that it is an amendment to a previous patch? Thanks Razi ---------- Forwarded message ---------- From: Heikki Lehvaslaiho Date: Wed, May 5, 2010 at 2:11 AM Subject: Re: [Bioperl-l] BLAST parsing broken To: Razi Khaja Hi Raja, Thanks for trying to fix this. I am attaching an example output file to this message. I just tested again that master from git repository fails to get query ID, but the previous version works. bala ~/src/bioperl-live> git checkout master Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp output Switched to branch 'master' When I started using the latest mpiBLAST code a few months ago I did compare the 0 output from it to standard NCBI blast and they were identical. Also, I've noticed a discrepancy between within bioperl blast parsing that I have not had time to work on. Would you be interested in having a look? I am creating output from mpiBLAST in 0 format and then converting it into tab-delimited 8 format. I am unable to get 100% similarity for all cases when I compare the conversion to the output straight from mpiBLAST in format 8. Sometimes the mismatch and gap values are off by one. I am attaching a script that does the conversion. It is the same one I was using when I noticed the problem above. I was going to put the code into bioperl but that got delayed when I noticed the discrepancies. Cheers, -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 4 May 2010 20:55, Razi Khaja wrote: > That is odd. Heikki, do you have a blast output file that produces this > error? > Could you attach the file and either send to the list or myself (if the > list > does not accept attachments). > Thanks, > Razi > > > On Mon, May 3, 2010 at 8:08 AM, Chris Fields > wrote: > > > Odd, I ran tests on that prior to commit. I'll work on fixing that (in > svn > > of course, until the migration is complete). > > > > chris > > > > On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > > > > > Chris, > > > > > > latest additions to Bio::SearchIO::blast.pm broke the parsing of > normal > > > blast output. $result->query_name returns now undef. > > > > > > (Using the anonymous git now). This change still works: > > > > > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > > Author: cjfields > > > Date: Sun Dec 20 04:39:58 2009 +0000 > > > > > > Robson's patch for buggy blastpgp output > > > > > > But this does not: > > > > > > commit 9a89c3434597104dd50553e3562983d78d14a544 > > > Author: cjfields > > > Date: Thu Apr 15 04:21:17 2010 +0000 > > > > > > [bug 3031] > > > > > > patches for catching algorithm ref, courtesy Razi Khaja. > > > > > > That makes it easy to find the diffs: > > > > > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > > > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > > > index 378023a..6f7eeeb 100644 > > > --- a/Bio/SearchIO/blast.pm > > > +++ b/Bio/SearchIO/blast.pm > > > @@ -209,6 +209,7 @@ BEGIN { > > > > > > 'BlastOutput_program' => 'RESULT-algorithm_name', > > > 'BlastOutput_version' => > 'RESULT-algorithm_version', > > > + 'BlastOutput_algorithm-reference' => > > 'RESULT-algorithm_reference', > > > 'BlastOutput_query-def' => 'RESULT-query_name', > > > 'BlastOutput_query-len' => 'RESULT-query_length', > > > 'BlastOutput_query-acc' => 'RESULT-query_accession', > > > @@ -504,6 +505,26 @@ sub next_result { > > > } > > > ); > > > } > > > + # parse the BLAST algorithm reference > > > + elsif(/^Reference:\s+(.*)$/) { > > > + # want to preserve newlines for the BLAST algorithm > > reference > > > + my $algorithm_reference = "$1\n"; > > > + $_ = $self->_readline; > > > + # while the current line, does not match an empty line, a > > RID:, > > > or a Database:, we are still looking at the > > > + # algorithm_reference, append it to what we parsed so far > > > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > > > + $algorithm_reference .= "$_"; > > > + $_ = $self->_readline; > > > + } > > > + # if we exited the while loop, we saw an empty line, a > RID:, > > or > > > a Database:, so push it back > > > + $self->_pushback($_); > > > + $self->element( > > > + { > > > + 'Name' => 'BlastOutput_algorithm-reference', > > > + 'Data' => $algorithm_reference > > > + } > > > + ); > > > + } > > > # added Windows workaround for bug 1985 > > > elsif (/^(Searching|Results from round)/) { > > > next unless $1 =~ /Results from round/; > > > > > > > > > I am not sure why reference parsing messes things up. Maybe it eats too > > many > > > lines from the result file. > > > > > > Yours, > > > > > > -Heikki > > > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > > #4216 > > > 4700 King Abdullah University of Science and Technology (KAUST) > > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: mpiblast.out Type: application/octet-stream Size: 34844 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastparser028.pl Type: application/x-perl Size: 2024 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blast.pm.diff Type: text/x-patch Size: 994 bytes Desc: not available URL: From cjfields at illinois.edu Sun May 9 16:43:29 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 15:43:29 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> If the patch is against main trunk it isn't a problem, otherwise the diff should be vs. that code. chris On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > Can someone advise an appropriate way to have this patch applied, given that > it is an amendment to a previous patch? > Thanks > Razi > > > ---------- Forwarded message ---------- > From: Heikki Lehvaslaiho > Date: Wed, May 5, 2010 at 2:11 AM > Subject: Re: [Bioperl-l] BLAST parsing broken > To: Razi Khaja > > > Hi Raja, > > Thanks for trying to fix this. > > I am attaching an example output file to this message. I just tested again > that master from git repository fails to get query ID, but the previous > version works. > > bala ~/src/bioperl-live> git checkout master > Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > output > Switched to branch 'master' > > When I started using the latest mpiBLAST code a few months ago I did compare > the 0 output from it to standard NCBI blast and they were identical. > > > > > Also, I've noticed a discrepancy between within bioperl blast parsing that > I have not had time to work on. Would you be interested in having a look? > > I am creating output from mpiBLAST in 0 format and then converting it into > tab-delimited 8 format. I am unable to get 100% similarity for all cases > when I compare the conversion to the output straight from mpiBLAST in format > 8. Sometimes the mismatch and gap values are off by one. > > I am attaching a script that does the conversion. It is the same one I was > using when I noticed the problem above. I was going to put the code into > bioperl but that got delayed when I noticed the discrepancies. > > > Cheers, > > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > On 4 May 2010 20:55, Razi Khaja wrote: > >> That is odd. Heikki, do you have a blast output file that produces this >> error? >> Could you attach the file and either send to the list or myself (if the >> list >> does not accept attachments). >> Thanks, >> Razi >> >> >> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >> wrote: >> >>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in >> svn >>> of course, until the migration is complete). >>> >>> chris >>> >>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>> >>>> Chris, >>>> >>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >> normal >>>> blast output. $result->query_name returns now undef. >>>> >>>> (Using the anonymous git now). This change still works: >>>> >>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>> Author: cjfields >>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>> >>>> Robson's patch for buggy blastpgp output >>>> >>>> But this does not: >>>> >>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>> Author: cjfields >>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>> >>>> [bug 3031] >>>> >>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>> >>>> That makes it easy to find the diffs: >>>> >>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>> index 378023a..6f7eeeb 100644 >>>> --- a/Bio/SearchIO/blast.pm >>>> +++ b/Bio/SearchIO/blast.pm >>>> @@ -209,6 +209,7 @@ BEGIN { >>>> >>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>> 'BlastOutput_version' => >> 'RESULT-algorithm_version', >>>> + 'BlastOutput_algorithm-reference' => >>> 'RESULT-algorithm_reference', >>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>> @@ -504,6 +505,26 @@ sub next_result { >>>> } >>>> ); >>>> } >>>> + # parse the BLAST algorithm reference >>>> + elsif(/^Reference:\s+(.*)$/) { >>>> + # want to preserve newlines for the BLAST algorithm >>> reference >>>> + my $algorithm_reference = "$1\n"; >>>> + $_ = $self->_readline; >>>> + # while the current line, does not match an empty line, a >>> RID:, >>>> or a Database:, we are still looking at the >>>> + # algorithm_reference, append it to what we parsed so far >>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { >>>> + $algorithm_reference .= "$_"; >>>> + $_ = $self->_readline; >>>> + } >>>> + # if we exited the while loop, we saw an empty line, a >> RID:, >>> or >>>> a Database:, so push it back >>>> + $self->_pushback($_); >>>> + $self->element( >>>> + { >>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>> + 'Data' => $algorithm_reference >>>> + } >>>> + ); >>>> + } >>>> # added Windows workaround for bug 1985 >>>> elsif (/^(Searching|Results from round)/) { >>>> next unless $1 =~ /Results from round/; >>>> >>>> >>>> I am not sure why reference parsing messes things up. Maybe it eats too >>> many >>>> lines from the result file. >>>> >>>> Yours, >>>> >>>> -Heikki >>>> >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>> >>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>> #4216 >>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Sun May 9 17:15:38 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 17:15:38 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: Hi Chris, The patch is against the main trunk. I checked out version 11326 of the repository today. Razi On Sun, May 9, 2010 at 4:43 PM, Chris Fields wrote: > If the patch is against main trunk it isn't a problem, otherwise the diff > should be vs. that code. > > chris > > On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > > > Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > > Can someone advise an appropriate way to have this patch applied, given > that > > it is an amendment to a previous patch? > > Thanks > > Razi > > > > > > ---------- Forwarded message ---------- > > From: Heikki Lehvaslaiho > > Date: Wed, May 5, 2010 at 2:11 AM > > Subject: Re: [Bioperl-l] BLAST parsing broken > > To: Razi Khaja > > > > > > Hi Raja, > > > > Thanks for trying to fix this. > > > > I am attaching an example output file to this message. I just tested > again > > that master from git repository fails to get query ID, but the previous > > version works. > > > > bala ~/src/bioperl-live> git checkout master > > Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > > output > > Switched to branch 'master' > > > > When I started using the latest mpiBLAST code a few months ago I did > compare > > the 0 output from it to standard NCBI blast and they were identical. > > > > > > > > > > Also, I've noticed a discrepancy between within bioperl blast parsing > that > > I have not had time to work on. Would you be interested in having a look? > > > > I am creating output from mpiBLAST in 0 format and then converting it > into > > tab-delimited 8 format. I am unable to get 100% similarity for all cases > > when I compare the conversion to the output straight from mpiBLAST in > format > > 8. Sometimes the mismatch and gap values are off by one. > > > > I am attaching a script that does the conversion. It is the same one I > was > > using when I noticed the problem above. I was going to put the code into > > bioperl but that got delayed when I noticed the discrepancies. > > > > > > Cheers, > > > > > > -Heikki > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > #4216 > > 4700 King Abdullah University of Science and Technology (KAUST) > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > > > > > On 4 May 2010 20:55, Razi Khaja wrote: > > > >> That is odd. Heikki, do you have a blast output file that produces this > >> error? > >> Could you attach the file and either send to the list or myself (if the > >> list > >> does not accept attachments). > >> Thanks, > >> Razi > >> > >> > >> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > >> wrote: > >> > >>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in > >> svn > >>> of course, until the migration is complete). > >>> > >>> chris > >>> > >>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>> > >>>> Chris, > >>>> > >>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >> normal > >>>> blast output. $result->query_name returns now undef. > >>>> > >>>> (Using the anonymous git now). This change still works: > >>>> > >>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>> Author: cjfields > >>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>> > >>>> Robson's patch for buggy blastpgp output > >>>> > >>>> But this does not: > >>>> > >>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>> Author: cjfields > >>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>> > >>>> [bug 3031] > >>>> > >>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>> > >>>> That makes it easy to find the diffs: > >>>> > >>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>> index 378023a..6f7eeeb 100644 > >>>> --- a/Bio/SearchIO/blast.pm > >>>> +++ b/Bio/SearchIO/blast.pm > >>>> @@ -209,6 +209,7 @@ BEGIN { > >>>> > >>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>> 'BlastOutput_version' => > >> 'RESULT-algorithm_version', > >>>> + 'BlastOutput_algorithm-reference' => > >>> 'RESULT-algorithm_reference', > >>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>> @@ -504,6 +505,26 @@ sub next_result { > >>>> } > >>>> ); > >>>> } > >>>> + # parse the BLAST algorithm reference > >>>> + elsif(/^Reference:\s+(.*)$/) { > >>>> + # want to preserve newlines for the BLAST algorithm > >>> reference > >>>> + my $algorithm_reference = "$1\n"; > >>>> + $_ = $self->_readline; > >>>> + # while the current line, does not match an empty line, a > >>> RID:, > >>>> or a Database:, we are still looking at the > >>>> + # algorithm_reference, append it to what we parsed so far > >>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) > { > >>>> + $algorithm_reference .= "$_"; > >>>> + $_ = $self->_readline; > >>>> + } > >>>> + # if we exited the while loop, we saw an empty line, a > >> RID:, > >>> or > >>>> a Database:, so push it back > >>>> + $self->_pushback($_); > >>>> + $self->element( > >>>> + { > >>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>> + 'Data' => $algorithm_reference > >>>> + } > >>>> + ); > >>>> + } > >>>> # added Windows workaround for bug 1985 > >>>> elsif (/^(Searching|Results from round)/) { > >>>> next unless $1 =~ /Results from round/; > >>>> > >>>> > >>>> I am not sure why reference parsing messes things up. Maybe it eats > too > >>> many > >>>> lines from the result file. > >>>> > >>>> Yours, > >>>> > >>>> -Heikki > >>>> > >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>> > >>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>> #4216 > >>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >_______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun May 9 17:30:52 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 16:30:52 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: Then something is wrong, as current trunk is at r16969. Where are you pulling your code from? Our only working anon. server is the sync'ed github one. chris On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > Hi Chris, > The patch is against the main trunk. I checked out version 11326 of the > repository today. > Razi > > > On Sun, May 9, 2010 at 4:43 PM, Chris Fields wrote: > >> If the patch is against main trunk it isn't a problem, otherwise the diff >> should be vs. that code. >> >> chris >> >> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: >> >>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. >>> Can someone advise an appropriate way to have this patch applied, given >> that >>> it is an amendment to a previous patch? >>> Thanks >>> Razi >>> >>> >>> ---------- Forwarded message ---------- >>> From: Heikki Lehvaslaiho >>> Date: Wed, May 5, 2010 at 2:11 AM >>> Subject: Re: [Bioperl-l] BLAST parsing broken >>> To: Razi Khaja >>> >>> >>> Hi Raja, >>> >>> Thanks for trying to fix this. >>> >>> I am attaching an example output file to this message. I just tested >> again >>> that master from git repository fails to get query ID, but the previous >>> version works. >>> >>> bala ~/src/bioperl-live> git checkout master >>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp >>> output >>> Switched to branch 'master' >>> >>> When I started using the latest mpiBLAST code a few months ago I did >> compare >>> the 0 output from it to standard NCBI blast and they were identical. >>> >>> >>> >>> >>> Also, I've noticed a discrepancy between within bioperl blast parsing >> that >>> I have not had time to work on. Would you be interested in having a look? >>> >>> I am creating output from mpiBLAST in 0 format and then converting it >> into >>> tab-delimited 8 format. I am unable to get 100% similarity for all cases >>> when I compare the conversion to the output straight from mpiBLAST in >> format >>> 8. Sometimes the mismatch and gap values are off by one. >>> >>> I am attaching a script that does the conversion. It is the same one I >> was >>> using when I noticed the problem above. I was going to put the code into >>> bioperl but that got delayed when I noticed the discrepancies. >>> >>> >>> Cheers, >>> >>> >>> -Heikki >>> >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +966 545 595 849 office: +966 2 808 2429 >>> >>> Computational Bioscience Research Centre (CBRC), Building #2, Office >> #4216 >>> 4700 King Abdullah University of Science and Technology (KAUST) >>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>> >>> >>> >>> On 4 May 2010 20:55, Razi Khaja wrote: >>> >>>> That is odd. Heikki, do you have a blast output file that produces this >>>> error? >>>> Could you attach the file and either send to the list or myself (if the >>>> list >>>> does not accept attachments). >>>> Thanks, >>>> Razi >>>> >>>> >>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >>>> wrote: >>>> >>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in >>>> svn >>>>> of course, until the migration is complete). >>>>> >>>>> chris >>>>> >>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>>>> >>>>>> Chris, >>>>>> >>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >>>> normal >>>>>> blast output. $result->query_name returns now undef. >>>>>> >>>>>> (Using the anonymous git now). This change still works: >>>>>> >>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>> Author: cjfields >>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>>>> >>>>>> Robson's patch for buggy blastpgp output >>>>>> >>>>>> But this does not: >>>>>> >>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>>>> Author: cjfields >>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>>>> >>>>>> [bug 3031] >>>>>> >>>>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>>>> >>>>>> That makes it easy to find the diffs: >>>>>> >>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>>>> index 378023a..6f7eeeb 100644 >>>>>> --- a/Bio/SearchIO/blast.pm >>>>>> +++ b/Bio/SearchIO/blast.pm >>>>>> @@ -209,6 +209,7 @@ BEGIN { >>>>>> >>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>>>> 'BlastOutput_version' => >>>> 'RESULT-algorithm_version', >>>>>> + 'BlastOutput_algorithm-reference' => >>>>> 'RESULT-algorithm_reference', >>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>>>> @@ -504,6 +505,26 @@ sub next_result { >>>>>> } >>>>>> ); >>>>>> } >>>>>> + # parse the BLAST algorithm reference >>>>>> + elsif(/^Reference:\s+(.*)$/) { >>>>>> + # want to preserve newlines for the BLAST algorithm >>>>> reference >>>>>> + my $algorithm_reference = "$1\n"; >>>>>> + $_ = $self->_readline; >>>>>> + # while the current line, does not match an empty line, a >>>>> RID:, >>>>>> or a Database:, we are still looking at the >>>>>> + # algorithm_reference, append it to what we parsed so far >>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) >> { >>>>>> + $algorithm_reference .= "$_"; >>>>>> + $_ = $self->_readline; >>>>>> + } >>>>>> + # if we exited the while loop, we saw an empty line, a >>>> RID:, >>>>> or >>>>>> a Database:, so push it back >>>>>> + $self->_pushback($_); >>>>>> + $self->element( >>>>>> + { >>>>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>>>> + 'Data' => $algorithm_reference >>>>>> + } >>>>>> + ); >>>>>> + } >>>>>> # added Windows workaround for bug 1985 >>>>>> elsif (/^(Searching|Results from round)/) { >>>>>> next unless $1 =~ /Results from round/; >>>>>> >>>>>> >>>>>> I am not sure why reference parsing messes things up. Maybe it eats >> too >>>>> many >>>>>> lines from the result file. >>>>>> >>>>>> Yours, >>>>>> >>>>>> -Heikki >>>>>> >>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>>> >>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>>> #4216 >>>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Sun May 9 19:48:28 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 19:48:28 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: I checked out bioperl-live from github: svn checkout http://svn.github.com/bioperl/bioperl-live.git I just checked it out again, a few seconds ago and by default I got revision 11326. Razi On Sun, May 9, 2010 at 5:30 PM, Chris Fields wrote: > Then something is wrong, as current trunk is at r16969. Where are you > pulling your code from? Our only working anon. server is the sync'ed github > one. > > chris > > On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > > > Hi Chris, > > The patch is against the main trunk. I checked out version 11326 of the > > repository today. > > Razi > > > > > > On Sun, May 9, 2010 at 4:43 PM, Chris Fields > wrote: > > > >> If the patch is against main trunk it isn't a problem, otherwise the > diff > >> should be vs. that code. > >> > >> chris > >> > >> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > >> > >>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > >>> Can someone advise an appropriate way to have this patch applied, given > >> that > >>> it is an amendment to a previous patch? > >>> Thanks > >>> Razi > >>> > >>> > >>> ---------- Forwarded message ---------- > >>> From: Heikki Lehvaslaiho > >>> Date: Wed, May 5, 2010 at 2:11 AM > >>> Subject: Re: [Bioperl-l] BLAST parsing broken > >>> To: Razi Khaja > >>> > >>> > >>> Hi Raja, > >>> > >>> Thanks for trying to fix this. > >>> > >>> I am attaching an example output file to this message. I just tested > >> again > >>> that master from git repository fails to get query ID, but the previous > >>> version works. > >>> > >>> bala ~/src/bioperl-live> git checkout master > >>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > >>> output > >>> Switched to branch 'master' > >>> > >>> When I started using the latest mpiBLAST code a few months ago I did > >> compare > >>> the 0 output from it to standard NCBI blast and they were identical. > >>> > >>> > >>> > >>> > >>> Also, I've noticed a discrepancy between within bioperl blast parsing > >> that > >>> I have not had time to work on. Would you be interested in having a > look? > >>> > >>> I am creating output from mpiBLAST in 0 format and then converting it > >> into > >>> tab-delimited 8 format. I am unable to get 100% similarity for all > cases > >>> when I compare the conversion to the output straight from mpiBLAST in > >> format > >>> 8. Sometimes the mismatch and gap values are off by one. > >>> > >>> I am attaching a script that does the conversion. It is the same one I > >> was > >>> using when I noticed the problem above. I was going to put the code > into > >>> bioperl but that got delayed when I noticed the discrepancies. > >>> > >>> > >>> Cheers, > >>> > >>> > >>> -Heikki > >>> > >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>> cell: +966 545 595 849 office: +966 2 808 2429 > >>> > >>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >> #4216 > >>> 4700 King Abdullah University of Science and Technology (KAUST) > >>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>> > >>> > >>> > >>> On 4 May 2010 20:55, Razi Khaja wrote: > >>> > >>>> That is odd. Heikki, do you have a blast output file that produces > this > >>>> error? > >>>> Could you attach the file and either send to the list or myself (if > the > >>>> list > >>>> does not accept attachments). > >>>> Thanks, > >>>> Razi > >>>> > >>>> > >>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > >>>> wrote: > >>>> > >>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that > (in > >>>> svn > >>>>> of course, until the migration is complete). > >>>>> > >>>>> chris > >>>>> > >>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>>>> > >>>>>> Chris, > >>>>>> > >>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >>>> normal > >>>>>> blast output. $result->query_name returns now undef. > >>>>>> > >>>>>> (Using the anonymous git now). This change still works: > >>>>>> > >>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>> Author: cjfields > >>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>>>> > >>>>>> Robson's patch for buggy blastpgp output > >>>>>> > >>>>>> But this does not: > >>>>>> > >>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>>>> Author: cjfields > >>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>>>> > >>>>>> [bug 3031] > >>>>>> > >>>>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>>>> > >>>>>> That makes it easy to find the diffs: > >>>>>> > >>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>>>> index 378023a..6f7eeeb 100644 > >>>>>> --- a/Bio/SearchIO/blast.pm > >>>>>> +++ b/Bio/SearchIO/blast.pm > >>>>>> @@ -209,6 +209,7 @@ BEGIN { > >>>>>> > >>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>>>> 'BlastOutput_version' => > >>>> 'RESULT-algorithm_version', > >>>>>> + 'BlastOutput_algorithm-reference' => > >>>>> 'RESULT-algorithm_reference', > >>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>>>> @@ -504,6 +505,26 @@ sub next_result { > >>>>>> } > >>>>>> ); > >>>>>> } > >>>>>> + # parse the BLAST algorithm reference > >>>>>> + elsif(/^Reference:\s+(.*)$/) { > >>>>>> + # want to preserve newlines for the BLAST algorithm > >>>>> reference > >>>>>> + my $algorithm_reference = "$1\n"; > >>>>>> + $_ = $self->_readline; > >>>>>> + # while the current line, does not match an empty line, > a > >>>>> RID:, > >>>>>> or a Database:, we are still looking at the > >>>>>> + # algorithm_reference, append it to what we parsed so > far > >>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ > /^Database:/) > >> { > >>>>>> + $algorithm_reference .= "$_"; > >>>>>> + $_ = $self->_readline; > >>>>>> + } > >>>>>> + # if we exited the while loop, we saw an empty line, a > >>>> RID:, > >>>>> or > >>>>>> a Database:, so push it back > >>>>>> + $self->_pushback($_); > >>>>>> + $self->element( > >>>>>> + { > >>>>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>>>> + 'Data' => $algorithm_reference > >>>>>> + } > >>>>>> + ); > >>>>>> + } > >>>>>> # added Windows workaround for bug 1985 > >>>>>> elsif (/^(Searching|Results from round)/) { > >>>>>> next unless $1 =~ /Results from round/; > >>>>>> > >>>>>> > >>>>>> I am not sure why reference parsing messes things up. Maybe it eats > >> too > >>>>> many > >>>>>> lines from the result file. > >>>>>> > >>>>>> Yours, > >>>>>> > >>>>>> -Heikki > >>>>>> > >>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>>> > >>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>>>> #4216 > >>>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun May 9 20:39:33 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 19:39:33 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> Ok, that's fine. It may be something off with revision numbers when using svn with github (git doesn't have incremental revisions, but a SHA). Committed the patch to dev svn, in r16970. chris On May 9, 2010, at 6:48 PM, Razi Khaja wrote: > I checked out bioperl-live from github: > svn checkout http://svn.github.com/bioperl/bioperl-live.git > > I just checked it out again, a few seconds ago and by default I got revision > 11326. > Razi > > > On Sun, May 9, 2010 at 5:30 PM, Chris Fields wrote: > >> Then something is wrong, as current trunk is at r16969. Where are you >> pulling your code from? Our only working anon. server is the sync'ed github >> one. >> >> chris >> >> On May 9, 2010, at 4:15 PM, Razi Khaja wrote: >> >>> Hi Chris, >>> The patch is against the main trunk. I checked out version 11326 of the >>> repository today. >>> Razi >>> >>> >>> On Sun, May 9, 2010 at 4:43 PM, Chris Fields >> wrote: >>> >>>> If the patch is against main trunk it isn't a problem, otherwise the >> diff >>>> should be vs. that code. >>>> >>>> chris >>>> >>>> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: >>>> >>>>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. >>>>> Can someone advise an appropriate way to have this patch applied, given >>>> that >>>>> it is an amendment to a previous patch? >>>>> Thanks >>>>> Razi >>>>> >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: Heikki Lehvaslaiho >>>>> Date: Wed, May 5, 2010 at 2:11 AM >>>>> Subject: Re: [Bioperl-l] BLAST parsing broken >>>>> To: Razi Khaja >>>>> >>>>> >>>>> Hi Raja, >>>>> >>>>> Thanks for trying to fix this. >>>>> >>>>> I am attaching an example output file to this message. I just tested >>>> again >>>>> that master from git repository fails to get query ID, but the previous >>>>> version works. >>>>> >>>>> bala ~/src/bioperl-live> git checkout master >>>>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp >>>>> output >>>>> Switched to branch 'master' >>>>> >>>>> When I started using the latest mpiBLAST code a few months ago I did >>>> compare >>>>> the 0 output from it to standard NCBI blast and they were identical. >>>>> >>>>> >>>>> >>>>> >>>>> Also, I've noticed a discrepancy between within bioperl blast parsing >>>> that >>>>> I have not had time to work on. Would you be interested in having a >> look? >>>>> >>>>> I am creating output from mpiBLAST in 0 format and then converting it >>>> into >>>>> tab-delimited 8 format. I am unable to get 100% similarity for all >> cases >>>>> when I compare the conversion to the output straight from mpiBLAST in >>>> format >>>>> 8. Sometimes the mismatch and gap values are off by one. >>>>> >>>>> I am attaching a script that does the conversion. It is the same one I >>>> was >>>>> using when I noticed the problem above. I was going to put the code >> into >>>>> bioperl but that got delayed when I noticed the discrepancies. >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> >>>>> -Heikki >>>>> >>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>> >>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>> #4216 >>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>> >>>>> >>>>> >>>>> On 4 May 2010 20:55, Razi Khaja wrote: >>>>> >>>>>> That is odd. Heikki, do you have a blast output file that produces >> this >>>>>> error? >>>>>> Could you attach the file and either send to the list or myself (if >> the >>>>>> list >>>>>> does not accept attachments). >>>>>> Thanks, >>>>>> Razi >>>>>> >>>>>> >>>>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >>>>>> wrote: >>>>>> >>>>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that >> (in >>>>>> svn >>>>>>> of course, until the migration is complete). >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>>>>>> >>>>>>>> Chris, >>>>>>>> >>>>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >>>>>> normal >>>>>>>> blast output. $result->query_name returns now undef. >>>>>>>> >>>>>>>> (Using the anonymous git now). This change still works: >>>>>>>> >>>>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>>>> Author: cjfields >>>>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>>>>>> >>>>>>>> Robson's patch for buggy blastpgp output >>>>>>>> >>>>>>>> But this does not: >>>>>>>> >>>>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>>>>>> Author: cjfields >>>>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>>>>>> >>>>>>>> [bug 3031] >>>>>>>> >>>>>>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>>>>>> >>>>>>>> That makes it easy to find the diffs: >>>>>>>> >>>>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>>>>>> index 378023a..6f7eeeb 100644 >>>>>>>> --- a/Bio/SearchIO/blast.pm >>>>>>>> +++ b/Bio/SearchIO/blast.pm >>>>>>>> @@ -209,6 +209,7 @@ BEGIN { >>>>>>>> >>>>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>>>>>> 'BlastOutput_version' => >>>>>> 'RESULT-algorithm_version', >>>>>>>> + 'BlastOutput_algorithm-reference' => >>>>>>> 'RESULT-algorithm_reference', >>>>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>>>>>> @@ -504,6 +505,26 @@ sub next_result { >>>>>>>> } >>>>>>>> ); >>>>>>>> } >>>>>>>> + # parse the BLAST algorithm reference >>>>>>>> + elsif(/^Reference:\s+(.*)$/) { >>>>>>>> + # want to preserve newlines for the BLAST algorithm >>>>>>> reference >>>>>>>> + my $algorithm_reference = "$1\n"; >>>>>>>> + $_ = $self->_readline; >>>>>>>> + # while the current line, does not match an empty line, >> a >>>>>>> RID:, >>>>>>>> or a Database:, we are still looking at the >>>>>>>> + # algorithm_reference, append it to what we parsed so >> far >>>>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ >> /^Database:/) >>>> { >>>>>>>> + $algorithm_reference .= "$_"; >>>>>>>> + $_ = $self->_readline; >>>>>>>> + } >>>>>>>> + # if we exited the while loop, we saw an empty line, a >>>>>> RID:, >>>>>>> or >>>>>>>> a Database:, so push it back >>>>>>>> + $self->_pushback($_); >>>>>>>> + $self->element( >>>>>>>> + { >>>>>>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>>>>>> + 'Data' => $algorithm_reference >>>>>>>> + } >>>>>>>> + ); >>>>>>>> + } >>>>>>>> # added Windows workaround for bug 1985 >>>>>>>> elsif (/^(Searching|Results from round)/) { >>>>>>>> next unless $1 =~ /Results from round/; >>>>>>>> >>>>>>>> >>>>>>>> I am not sure why reference parsing messes things up. Maybe it eats >>>> too >>>>>>> many >>>>>>>> lines from the result file. >>>>>>>> >>>>>>>> Yours, >>>>>>>> >>>>>>>> -Heikki >>>>>>>> >>>>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>>>>> >>>>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>>>>> #4216 >>>>>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> >>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cmb433 at nyu.edu Sun May 9 22:22:52 2010 From: cmb433 at nyu.edu (bergeycm) Date: Sun, 9 May 2010 19:22:52 -0700 (PDT) Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely Message-ID: <28506482.post@talk.nabble.com> Hi all, I'm attempting to query GenBank for all sequences' lengths for a given taxon. I'm using get_Stream_by_query(), but only to grab the species, length, and accession. The genus of interest has almost 500,000 GB entries, though, and my code hangs up at odd points in the info-gathering loop. (Often after only 300 or 400 iterations.) The problem is that $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back undefined. I've tried wrapping the next_seq portion of the code in an eval block, but to no avail. Is there a way to split a query into a bunch of small streams that aren't too much to ask? Or is there a way to pick up a dropped SeqIO stream? I think the connection is timing out and the stream is being lost. Any advice is greatly appreciated, as I'm fairly new to BioPerl. - bergeycm use Bio::DB::GenBank; use Bio::DB::Query::GenBank; # Get general things ready to go for querying GenBank my %options; $options{'-maxids'} = '500000'; # There are presently 460,184 sequences $options{'-db'} = 'nucleotide'; $options{'-query'} = "Pongo [ORGN]"; # Orangutans my $query_obj = Bio::DB::Query::GenBank->new(%options); my $total = $query_obj->count; my $gb_obj = Bio::DB::GenBank->new(); my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); # Restrict info to just what I'll be using. No sequence necessary. my $builder = $stream_obj->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('species','length','accession'); my $c = 0; for (1 .. $total) { eval { my $seq_obj = $stream_obj->next_seq; my $flavor = $seq_obj->species; print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", $seq_obj->length, "\t", $seq_obj->accession, "\n"; }; if ($@) { print $!, '\n'; } # Pause for a little over a third of a second select(undef, undef, undef, 0.35); $c++; } -- View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From robert.bradbury at gmail.com Mon May 10 01:38:09 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 10 May 2010 01:38:09 -0400 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <28506482.post@talk.nabble.com> References: <28506482.post@talk.nabble.com> Message-ID: I don't know whether this is related or not. But the last time I tried to fetch a moderately large genome (NS_000198 for *Podospera anserina*) it failed [1]. It takes a *very* long time and eventually springs an "Out of Memory" error. This is on a Pentium IV Prescott which only has a 4GB address space (configured for 3GB for user programs) and after running a long strace on the perl process it seemed that what was happening was that it was never properly returning and merging memory from the sequence chunks which were being fetched. The final program address was brk(0xafd8c000) or 2,950,217,728 which is probably the maximum amount of data space a user program can have considering that one needs room for the stack. After that the mmap2() calls started failing with ENOMEM. If Bio::DB::GenBank::Query is intelligent enough to only fetch just the requested fields you should be ok. But if it fetches the entire GenBank record and simply throws away the sequence information and you are running into large sequences (say a big chunk of a chromosome) and this ends up hitting the memory/swap space limits on your machine that could be a problem. If the program is running for a long time I'd be inclined to check my system logs to see if one is running out of memory/swap. You can also watch the process using ps to determine if the VSZ grows continuously. I think I mentioned this before on the BioPerl list but never had a clear understanding of what was going on and may not have filed a bug report. I think I eventually worked around it, perhaps by fetching the offending (large) sequence using wget or a browser. Robert 1. Given that NS_000198 is only ~7MB (4.6 million actual bases) the BioPerl memory management has to be really poor in merging/reusing if the fetch uses ~3GB. From bhakti.dwivedi at gmail.com Mon May 10 11:22:41 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Mon, 10 May 2010 11:22:41 -0400 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface Message-ID: Does anyone know why the blast results vary for a query sequence when search is conducted using a web-based interface versus a Command line interface? For example, my web-based blast top hits do not match the top hits of the command line blast (blastcl3). I am using the default settings in both. not sure why the results are different Even if the hit is there, the e-value, bit score etc are different for the same hsp regions identified within the hit. is there a difference in the blast algorithm? or is it the database? Thanks! From cjfields at illinois.edu Mon May 10 12:28:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:28:15 -0500 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: References: Message-ID: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> The default web-based parameters differ than those via blastcl3, so if you are using the defaults for both they may differ somewhat. chris On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: > Does anyone know why the blast results vary for a query sequence when search > is conducted using a web-based interface versus a Command line interface? > > For example, my web-based blast top hits do not match the top hits of the > command line blast (blastcl3). I am using the default settings in both. > not sure why the results are different Even if the hit is there, the > e-value, bit score etc are different for the same hsp regions identified > within the hit. is there a difference in the blast algorithm? or is it the > database? > > Thanks! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 12:31:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:31:15 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: References: <28506482.post@talk.nabble.com> Message-ID: On May 10, 2010, at 12:38 AM, Robert Bradbury wrote: > I don't know whether this is related or not. But the last time I tried to > fetch a moderately large genome (NS_000198 for *Podospera anserina*) it > failed [1]. It takes a *very* long time and eventually springs an "Out of > Memory" error. This is on a Pentium IV Prescott which only has a 4GB > address space (configured for 3GB for user programs) and after running a > long strace on the perl process it seemed that what was happening was that > it was never properly returning and merging memory from the sequence chunks > which were being fetched. The final program address was brk(0xafd8c000) or > 2,950,217,728 which is probably the maximum amount of data space a user > program can have considering that one needs room for the stack. After that > the mmap2() calls started failing with ENOMEM. That's odd. What OS? > If Bio::DB::GenBank::Query is intelligent enough to only fetch just the > requested fields you should be ok. But if it fetches the entire GenBank > record and simply throws away the sequence information and you are running > into large sequences (say a big chunk of a chromosome) and this ends up > hitting the memory/swap space limits on your machine that could be a > problem. Yes, that may happen, as (at the moment) we push everything into memory; there are no lazy or DB-linked Seq instances, at least not yet. Very large sequences take a lot of time (object instantiation) and a lot of memory. To tell the truth, that seems to be the default of most toolkits, but we have recently talked about possible ways to deal with it, just need the tuits for it (as with anything). The other alternative is to pull the sequences down locally as a raw text file. This can still be done within BioPerl, just using Bio::DB::EUtilities: my $in = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nuccore', -email => 'cjfields at bioperl.org', -rettype => 'gbwithparts', -id => 'NS_000198'); $in->get_Response(-file => "$id.gb"); > If the program is running for a long time I'd be inclined to check my system > logs to see if one is running out of memory/swap. You can also watch the > process using ps to determine if the VSZ grows continuously. > > I think I mentioned this before on the BioPerl list but never had a clear > understanding of what was going on and may not have filed a bug report. I > think I eventually worked around it, perhaps by fetching the offending > (large) sequence using wget or a browser. You can still file a bug on it; does help with keeping track (just reporting it here doesn't help much, it gets lost in the shuffle). > Robert > > 1. Given that NS_000198 is only ~7MB (4.6 million actual bases) the BioPerl > memory management has to be really poor in merging/reusing if the fetch uses > ~3GB. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l BioPerl stores everything in memory, but I've worked with 4.6Mbp genomes quite a bit on my MB Pro. However, the default mode for Bio;:DB::GenBank is to pull down everything using 'gbwithparts'. This file is much larger doing so (sequence is ~34Mbp, file is ~51 MB). Maybe that's the problem? If you can please file a bug report, along with the relevant information. That helps us determine the best course of action. chris From cjfields at illinois.edu Mon May 10 12:32:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:32:43 -0500 Subject: [Bioperl-l] Read/write round-tripping Was: Re: New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE6639B.6060004@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> <4BE6639B.6060004@gmail.com> Message-ID: <4B47AB3F-3190-4ACC-8235-8F5D6DBE7DC6@illinois.edu> If there is dynamic ID assignment I would assume you can't compare them between runs, so using is_deeply() won't work as advertised since we already know the ID will change between runs anyway, it's a self-fulfilling prophecy. Also, is_deeply() here is inspecting the SF::Collection blessed hash directly (the _btree is a tied DB_File hash), not sure that's what you want either. So at this point I would have to ask myself: 1) Is the dynamic ID assignment a bug (e.g. should we be using a fixed ID of some sort)? If not, we can't expect these to match across runs, so is_deeply won't work. 2) Would it make more sense to explicitly inspect the handled objects (SF::Collection) directly via method calls? For instance, if I want to see whether a set of features falls within a region, is that reproducible between runs? Either way, I'm not sure what using Test::Deeply would gain you, as it's still meant to inspect complex data structures, just with a bit more sugar than Test::More and is_deeply(). Per #2 above, I would be more explicit in inspecting the SF::Collection: my $collection = $contig->get_features_collection; # check that IDs in SF::Collection conform to a regex using like() # inspect other things about the collection... chris On May 9, 2010, at 2:26 AM, Florent Angly wrote: > Chris, > > I've thought some more on the problem and I now agree with you that round-tripping at the object-level is more powerful. > > It has the problem that some objects are given IDs dynamically every time, which means that identical input files won't have an identical object. > >> is_deeply( $obj_out , $obj_in , 'deep compare' ); > >> not ok 1 - deep compare >> # Failed test 'deep compare' >> # at ./test_roundtrip.pl line 33. >> # Structures begin differing at: >> # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '56438592' >> # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '54980512' >> 1..1 >> # Looks like you failed 1 test of 1. > > > And when I re-run this again: > >> not ok 1 - deep compare >> # Failed test 'deep compare' >> # at ./test_roundtrip.pl line 33. >> # Structures begin differing at: >> # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '47763264' >> # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '46305184' >> 1..1 >> # Looks like you failed 1 test of 1. > > Note how the value of _btree changes everytime. > > Maybe using Test::Deep would be a good approach (http://search.cpan.org/~fdaly/Test-Deep-0.106/lib/Test/Deep.pod): >> Where it becomes more interesting is in allowing you to do something besides simple exact comparisons. With strings, the |eq| operator checks that 2 strings are exactly equal but sometimes that's not what you want. When you don't know exactly what the string should be but you do know some things about how it should look, |eq| is no good and you must use pattern matching instead. Test::Deep provides pattern matching for complex data structures > > Florent > > > > > On 09/05/10 10:02, Chris Fields wrote: >> Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. >> >> chris >> >> On May 8, 2010, at 6:47 PM, Chris Fields wrote: >> >> >>> To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. >>> >>> chris >>> >>> On May 8, 2010, at 6:34 AM, Florent Angly wrote: >>> >>> >>>> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >>>> >>>> It looks like the Bio::SeqIO modules tests could use it as well. >>>> >>>> Cheers, >>>> >>>> Florent >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 12:58:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:58:07 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <28506482.post@talk.nabble.com> References: <28506482.post@talk.nabble.com> Message-ID: <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> 500000 sequences is way too many to request, even in a loop. Under most circumstances this is breaking NCBI's eutils policies: http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements so don't be too surprised this is failing (this would be around 1000 queried of 500 sequences per query). You could try pulling down the raw sequence via batch entrez or using Bio::DB::EUtilities (which should die if an error occurs). chris On May 9, 2010, at 9:22 PM, bergeycm wrote: > > Hi all, > > I'm attempting to query GenBank for all sequences' lengths for a given > taxon. I'm using get_Stream_by_query(), but only to grab the species, > length, and accession. The genus of interest has almost 500,000 GB entries, > though, and my code hangs up at odd points in the info-gathering loop. > (Often after only 300 or 400 iterations.) The problem is that > $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back > undefined. > > I've tried wrapping the next_seq portion of the code in an eval block, but > to no avail. Is there a way to split a query into a bunch of small streams > that aren't too much to ask? Or is there a way to pick up a dropped SeqIO > stream? I think the connection is timing out and the stream is being lost. > Any advice is greatly appreciated, as I'm fairly new to BioPerl. > > - bergeycm > > > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > > # Get general things ready to go for querying GenBank > my %options; > $options{'-maxids'} = '500000'; # There are presently 460,184 sequences > $options{'-db'} = 'nucleotide'; > $options{'-query'} = "Pongo [ORGN]"; # Orangutans > > > my $query_obj = Bio::DB::Query::GenBank->new(%options); > my $total = $query_obj->count; > > my $gb_obj = Bio::DB::GenBank->new(); > my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); > > # Restrict info to just what I'll be using. No sequence necessary. > my $builder = $stream_obj->sequence_builder(); > $builder->want_none(); > $builder->add_wanted_slot('species','length','accession'); > > my $c = 0; > > for (1 .. $total) { > eval { > my $seq_obj = $stream_obj->next_seq; > my $flavor = $seq_obj->species; > print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", > $seq_obj->length, "\t", $seq_obj->accession, "\n"; > }; > > if ($@) { > print $!, '\n'; > } > > # Pause for a little over a third of a second > select(undef, undef, undef, 0.35); > > $c++; > } > > > > -- > View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 13:07:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 12:07:00 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> References: <28506482.post@talk.nabble.com> <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> Message-ID: <58E399D4-A884-4DC1-A5C6-8B0CBDDB173A@illinois.edu> (addendum added, sent too early) On May 10, 2010, at 11:58 AM, Chris Fields wrote: > 500000 sequences is way too many to request, even in a loop. Under most circumstances this is breaking NCBI's eutils policies: > > http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements > > so don't be too surprised this is failing (this would be around 1000 queried of 500 sequences per query). > > You could try pulling down the raw sequence via batch entrez or using Bio::DB::EUtilities (which should die if an error occurs). But you may still run into issues with eutils at some point, particularly if running this at peak times. > > chris > > On May 9, 2010, at 9:22 PM, bergeycm wrote: > >> >> Hi all, >> >> I'm attempting to query GenBank for all sequences' lengths for a given >> taxon. I'm using get_Stream_by_query(), but only to grab the species, >> length, and accession. The genus of interest has almost 500,000 GB entries, >> though, and my code hangs up at odd points in the info-gathering loop. >> (Often after only 300 or 400 iterations.) The problem is that >> $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back >> undefined. >> >> I've tried wrapping the next_seq portion of the code in an eval block, but >> to no avail. Is there a way to split a query into a bunch of small streams >> that aren't too much to ask? Or is there a way to pick up a dropped SeqIO >> stream? I think the connection is timing out and the stream is being lost. >> Any advice is greatly appreciated, as I'm fairly new to BioPerl. >> >> - bergeycm >> >> >> >> use Bio::DB::GenBank; >> use Bio::DB::Query::GenBank; >> >> >> # Get general things ready to go for querying GenBank >> my %options; >> $options{'-maxids'} = '500000'; # There are presently 460,184 sequences >> $options{'-db'} = 'nucleotide'; >> $options{'-query'} = "Pongo [ORGN]"; # Orangutans >> >> >> my $query_obj = Bio::DB::Query::GenBank->new(%options); >> my $total = $query_obj->count; >> >> my $gb_obj = Bio::DB::GenBank->new(); >> my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); >> >> # Restrict info to just what I'll be using. No sequence necessary. >> my $builder = $stream_obj->sequence_builder(); >> $builder->want_none(); >> $builder->add_wanted_slot('species','length','accession'); >> >> my $c = 0; >> >> for (1 .. $total) { >> eval { >> my $seq_obj = $stream_obj->next_seq; >> my $flavor = $seq_obj->species; >> print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", >> $seq_obj->length, "\t", $seq_obj->accession, "\n"; >> }; >> >> if ($@) { >> print $!, '\n'; >> } >> >> # Pause for a little over a third of a second >> select(undef, undef, undef, 0.35); >> >> $c++; >> } >> >> >> >> -- >> View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Mon May 10 13:14:36 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Mon, 10 May 2010 18:14:36 +0100 Subject: [Bioperl-l] Bio::Align - alignment by position? In-Reply-To: References: Message-ID: <003701caf064$441c4660$cc54d320$%yin@ucd.ie> Hi, When you use $aln->slice(), there is a third optional parameter to keep gap-only columns in newly created slice, e.g. $aln2=$aln->slice(20,30,1); By defining the third parameter, you can keep gap-only sub sequences. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From bhakti.dwivedi at gmail.com Mon May 10 14:35:37 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Mon, 10 May 2010 14:35:37 -0400 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> References: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> Message-ID: Thanks Chris! I changed few parameter values in blastcl3 and now the results are same. Any particular reason to set the default differently in web-based and command-line blast search? Bhakti On Mon, May 10, 2010 at 12:28 PM, Chris Fields wrote: > The default web-based parameters differ than those via blastcl3, so if you > are using the defaults for both they may differ somewhat. > > chris > > On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: > > > Does anyone know why the blast results vary for a query sequence when > search > > is conducted using a web-based interface versus a Command line interface? > > > > For example, my web-based blast top hits do not match the top hits of > the > > command line blast (blastcl3). I am using the default settings in both. > > not sure why the results are different Even if the hit is there, the > > e-value, bit score etc are different for the same hsp regions identified > > within the hit. is there a difference in the blast algorithm? or is it > the > > database? > > > > Thanks! > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 10 15:47:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 14:47:56 -0500 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: References: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> Message-ID: you would need to ask NCBI that. chris On May 10, 2010, at 1:35 PM, Bhakti Dwivedi wrote: > Thanks Chris! I changed few parameter values in blastcl3 and now the > results are same. Any particular reason to set the default differently in > web-based and command-line blast search? > > Bhakti > > > > On Mon, May 10, 2010 at 12:28 PM, Chris Fields wrote: > >> The default web-based parameters differ than those via blastcl3, so if you >> are using the defaults for both they may differ somewhat. >> >> chris >> >> On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: >> >>> Does anyone know why the blast results vary for a query sequence when >> search >>> is conducted using a web-based interface versus a Command line interface? >>> >>> For example, my web-based blast top hits do not match the top hits of >> the >>> command line blast (blastcl3). I am using the default settings in both. >>> not sure why the results are different Even if the hit is there, the >>> e-value, bit score etc are different for the same hsp regions identified >>> within the hit. is there a difference in the blast algorithm? or is it >> the >>> database? >>> >>> Thanks! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dimitark at bii.a-star.edu.sg Mon May 10 22:03:51 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 11 May 2010 10:03:51 +0800 Subject: [Bioperl-l] StandAloneFasta and Too many open files Message-ID: <4BE8BB07.3040407@bii.a-star.edu.sg> Hi guys, yesterday i got the following error: 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' from the following code: ------------ my $ssout="my_seq_out.txt"; print "SS:$tquery:\n:$tseq:\n"; my @sargs=( 'q' => '', 'E' => '1', 'w' => '100', 'O' => "$ssout", 'program' => "ssearch36", ); my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); $fac_ss->library($tmpseq); my @sreport=$fac_ss->run($tqtmp); foreach my $sr (@sreport){ while(my $result=$sr->next_result){ while(my $hit=$result->next_hit){ while(my $hsp=$hit->next_hsp){ my $iden=$hsp->frac_identical; $rv3=$iden; # print "IDEN:$iden:$rv1\n"; } } } } -------------------- I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. Next i commented out the above code and rewrote my script into the following: -------------- my $ssout="my_seq_out.txt"; my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); system(@sargs) == 0 or die "system @sargs failed: $!"; my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); while(my $result=$sreport->next_result){ # print Dumper($result); while(my $hit=$result->next_hit){ while(my $hsp=$hit->next_hsp){ my $iden=$hsp->frac_identical; $rv3=$iden; # print "IDEN:$iden:$rv1\n"; } } } --------------- Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. Ok that is it. I just wanted to inform about my experience and to report the problem. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From cjfields at illinois.edu Mon May 10 23:04:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 22:04:12 -0500 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <4BE8BB07.3040407@bii.a-star.edu.sg> References: <4BE8BB07.3040407@bii.a-star.edu.sg> Message-ID: <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: > Hi guys, > yesterday i got the following error: > > 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' > > from the following code: > ------------ > my $ssout="my_seq_out.txt"; > print "SS:$tquery:\n:$tseq:\n"; > my @sargs=( > 'q' => '', > 'E' => '1', > 'w' => '100', > 'O' => "$ssout", > 'program' => "ssearch36", > ); > my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); > $fac_ss->library($tmpseq); > my @sreport=$fac_ss->run($tqtmp); > > foreach my $sr (@sreport){ > while(my $result=$sr->next_result){ > while(my $hit=$result->next_hit){ > while(my $hsp=$hit->next_hsp){ > my $iden=$hsp->frac_identical; > $rv3=$iden; > # print "IDEN:$iden:$rv1\n"; > } > } > } > } > -------------------- > I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. > I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. > Next i commented out the above code and rewrote my script into the following: > -------------- > my $ssout="my_seq_out.txt"; > my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); > system(@sargs) == 0 or die "system @sargs failed: $!"; > > my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); > while(my $result=$sreport->next_result){ > # print Dumper($result); > while(my $hit=$result->next_hit){ > while(my $hsp=$hit->next_hsp){ > > my $iden=$hsp->frac_identical; > $rv3=$iden; > # print "IDEN:$iden:$rv1\n"; > } > } > } > --------------- > Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. > > I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. > > Ok that is it. I just wanted to inform about my experience and to report the problem. > > Cheers > Dimitar Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. chris From cjfields at illinois.edu Mon May 10 23:57:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 22:57:18 -0500 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> References: <4BE8BB07.3040407@bii.a-star.edu.sg> <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> Message-ID: <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> Addendum to that last post. On May 10, 2010, at 10:04 PM, Chris Fields wrote: > On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: > >> Hi guys, >> yesterday i got the following error: >> >> 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' >> >> from the following code: >> ------------ >> my $ssout="my_seq_out.txt"; >> print "SS:$tquery:\n:$tseq:\n"; >> my @sargs=( >> 'q' => '', >> 'E' => '1', >> 'w' => '100', >> 'O' => "$ssout", >> 'program' => "ssearch36", >> ); >> my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); >> $fac_ss->library($tmpseq); >> my @sreport=$fac_ss->run($tqtmp); >> >> foreach my $sr (@sreport){ >> while(my $result=$sr->next_result){ >> while(my $hit=$result->next_hit){ >> while(my $hsp=$hit->next_hsp){ >> my $iden=$hsp->frac_identical; >> $rv3=$iden; >> # print "IDEN:$iden:$rv1\n"; >> } >> } >> } >> } >> -------------------- >> I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. >> I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. >> Next i commented out the above code and rewrote my script into the following: >> -------------- >> my $ssout="my_seq_out.txt"; >> my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); >> system(@sargs) == 0 or die "system @sargs failed: $!"; >> >> my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); >> while(my $result=$sreport->next_result){ >> # print Dumper($result); >> while(my $hit=$result->next_hit){ >> while(my $hsp=$hit->next_hsp){ >> >> my $iden=$hsp->frac_identical; >> $rv3=$iden; >> # print "IDEN:$iden:$rv1\n"; >> } >> } >> } >> --------------- >> Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. >> >> I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. >> >> Ok that is it. I just wanted to inform about my experience and to report the problem. >> >> Cheers >> Dimitar > > > Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. > > chris Dimitar, I think Peter had answered this before, might indicate the problem is actually using the 'O' option in output. We can look at possibly just capturing STDOUT instead, but we may not support the use of 'O' if it's as buggy as indicated. http://groups.google.com/group/bioperl-l/msg/25c17748d1ac6ef4 chris From dimitark at bii.a-star.edu.sg Tue May 11 00:24:13 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 11 May 2010 12:24:13 +0800 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> References: <4BE8BB07.3040407@bii.a-star.edu.sg> <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> Message-ID: <4BE8DBED.2000209@bii.a-star.edu.sg> Hi Chris, thank you for the information. I checked it out. I wrote you and the list about that as well. To you on 16.04.2010 and to the list on 23.04.2010. There i explained that i modified the module. Now i pass it the '0' option but this option is not passed to the actual program executed by system. I just add my desired output with "> $output" to the parameter line passed to system. In the email mentioned above i attached the modified version of the module. I was digging again a bit about the module. I found that - line(359): ----------- unless( $outfile ) { open(FASTARUN, "$para |") || $self->throw($@);#original $object=Bio::SearchIO->new(-fh=>\*FASTARUN, #original -format=>"fasta");#original } else { ------------ And here another one when the 'O' is used - line(371): --------- $object = Bio::SearchIO->new(-file=>$self->O, -format=>"fasta"); ---------- May be the problem is here. Because i didnt see anywhere a 'close' for these filehandles. I can test and tell if i was right. Cheers Dimitar On 05/11/2010 11:57 AM, Chris Fields wrote: > Addendum to that last post. > > On May 10, 2010, at 10:04 PM, Chris Fields wrote: > > >> On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: >> >> >>> Hi guys, >>> yesterday i got the following error: >>> >>> 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' >>> >>> from the following code: >>> ------------ >>> my $ssout="my_seq_out.txt"; >>> print "SS:$tquery:\n:$tseq:\n"; >>> my @sargs=( >>> 'q' => '', >>> 'E' => '1', >>> 'w' => '100', >>> 'O' => "$ssout", >>> 'program' => "ssearch36", >>> ); >>> my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); >>> $fac_ss->library($tmpseq); >>> my @sreport=$fac_ss->run($tqtmp); >>> >>> foreach my $sr (@sreport){ >>> while(my $result=$sr->next_result){ >>> while(my $hit=$result->next_hit){ >>> while(my $hsp=$hit->next_hsp){ >>> my $iden=$hsp->frac_identical; >>> $rv3=$iden; >>> # print "IDEN:$iden:$rv1\n"; >>> } >>> } >>> } >>> } >>> -------------------- >>> I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. >>> I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. >>> Next i commented out the above code and rewrote my script into the following: >>> -------------- >>> my $ssout="my_seq_out.txt"; >>> my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq> $ssout"); >>> system(@sargs) == 0 or die "system @sargs failed: $!"; >>> >>> my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); >>> while(my $result=$sreport->next_result){ >>> # print Dumper($result); >>> while(my $hit=$result->next_hit){ >>> while(my $hsp=$hit->next_hsp){ >>> >>> my $iden=$hsp->frac_identical; >>> $rv3=$iden; >>> # print "IDEN:$iden:$rv1\n"; >>> } >>> } >>> } >>> --------------- >>> Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. >>> >>> I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. >>> >>> Ok that is it. I just wanted to inform about my experience and to report the problem. >>> >>> Cheers >>> Dimitar >>> >> >> Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. >> >> chris >> > Dimitar, > > I think Peter had answered this before, might indicate the problem is actually using the 'O' option in output. We can look at possibly just capturing STDOUT instead, but we may not support the use of 'O' if it's as buggy as indicated. > > http://groups.google.com/group/bioperl-l/msg/25c17748d1ac6ef4 > > chris > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From heikki.lehvaslaiho at gmail.com Tue May 11 01:40:14 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Tue, 11 May 2010 08:40:14 +0300 Subject: [Bioperl-l] Github possibilities Message-ID: FYI http://chem-bla-ics.blogspot.com/2010/05/github-simplifies-code-review-and.html -Heikki From heikki.lehvaslaiho at gmail.com Tue May 11 01:43:42 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Tue, 11 May 2010 08:43:42 +0300 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> Message-ID: Thanks Razi and Chris, Blast parsing works again beautifully. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 10 May 2010 03:39, Chris Fields wrote: > Ok, that's fine. It may be something off with revision numbers when using > svn with github (git doesn't have incremental revisions, but a SHA). > Committed the patch to dev svn, in r16970. > > chris > > On May 9, 2010, at 6:48 PM, Razi Khaja wrote: > > > I checked out bioperl-live from github: > > svn checkout http://svn.github.com/bioperl/bioperl-live.git > > > > I just checked it out again, a few seconds ago and by default I got > revision > > 11326. > > Razi > > > > > > On Sun, May 9, 2010 at 5:30 PM, Chris Fields > wrote: > > > >> Then something is wrong, as current trunk is at r16969. Where are you > >> pulling your code from? Our only working anon. server is the sync'ed > github > >> one. > >> > >> chris > >> > >> On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > >> > >>> Hi Chris, > >>> The patch is against the main trunk. I checked out version 11326 of > the > >>> repository today. > >>> Razi > >>> > >>> > >>> On Sun, May 9, 2010 at 4:43 PM, Chris Fields > >> wrote: > >>> > >>>> If the patch is against main trunk it isn't a problem, otherwise the > >> diff > >>>> should be vs. that code. > >>>> > >>>> chris > >>>> > >>>> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > >>>> > >>>>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > >>>>> Can someone advise an appropriate way to have this patch applied, > given > >>>> that > >>>>> it is an amendment to a previous patch? > >>>>> Thanks > >>>>> Razi > >>>>> > >>>>> > >>>>> ---------- Forwarded message ---------- > >>>>> From: Heikki Lehvaslaiho > >>>>> Date: Wed, May 5, 2010 at 2:11 AM > >>>>> Subject: Re: [Bioperl-l] BLAST parsing broken > >>>>> To: Razi Khaja > >>>>> > >>>>> > >>>>> Hi Raja, > >>>>> > >>>>> Thanks for trying to fix this. > >>>>> > >>>>> I am attaching an example output file to this message. I just tested > >>>> again > >>>>> that master from git repository fails to get query ID, but the > previous > >>>>> version works. > >>>>> > >>>>> bala ~/src/bioperl-live> git checkout master > >>>>> Previous HEAD position was 5e278f5... Robson's patch for buggy > blastpgp > >>>>> output > >>>>> Switched to branch 'master' > >>>>> > >>>>> When I started using the latest mpiBLAST code a few months ago I did > >>>> compare > >>>>> the 0 output from it to standard NCBI blast and they were identical. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Also, I've noticed a discrepancy between within bioperl blast > parsing > >>>> that > >>>>> I have not had time to work on. Would you be interested in having a > >> look? > >>>>> > >>>>> I am creating output from mpiBLAST in 0 format and then converting it > >>>> into > >>>>> tab-delimited 8 format. I am unable to get 100% similarity for all > >> cases > >>>>> when I compare the conversion to the output straight from mpiBLAST in > >>>> format > >>>>> 8. Sometimes the mismatch and gap values are off by one. > >>>>> > >>>>> I am attaching a script that does the conversion. It is the same one > I > >>>> was > >>>>> using when I noticed the problem above. I was going to put the code > >> into > >>>>> bioperl but that got delayed when I noticed the discrepancies. > >>>>> > >>>>> > >>>>> Cheers, > >>>>> > >>>>> > >>>>> -Heikki > >>>>> > >>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>> > >>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>>> #4216 > >>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>> > >>>>> > >>>>> > >>>>> On 4 May 2010 20:55, Razi Khaja wrote: > >>>>> > >>>>>> That is odd. Heikki, do you have a blast output file that produces > >> this > >>>>>> error? > >>>>>> Could you attach the file and either send to the list or myself (if > >> the > >>>>>> list > >>>>>> does not accept attachments). > >>>>>> Thanks, > >>>>>> Razi > >>>>>> > >>>>>> > >>>>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > > >>>>>> wrote: > >>>>>> > >>>>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that > >> (in > >>>>>> svn > >>>>>>> of course, until the migration is complete). > >>>>>>> > >>>>>>> chris > >>>>>>> > >>>>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>>>>>> > >>>>>>>> Chris, > >>>>>>>> > >>>>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >>>>>> normal > >>>>>>>> blast output. $result->query_name returns now undef. > >>>>>>>> > >>>>>>>> (Using the anonymous git now). This change still works: > >>>>>>>> > >>>>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>>>> Author: cjfields > >>>>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>>>>>> > >>>>>>>> Robson's patch for buggy blastpgp output > >>>>>>>> > >>>>>>>> But this does not: > >>>>>>>> > >>>>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>>>>>> Author: cjfields > >>>>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>>>>>> > >>>>>>>> [bug 3031] > >>>>>>>> > >>>>>>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>>>>>> > >>>>>>>> That makes it easy to find the diffs: > >>>>>>>> > >>>>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>>>>>> index 378023a..6f7eeeb 100644 > >>>>>>>> --- a/Bio/SearchIO/blast.pm > >>>>>>>> +++ b/Bio/SearchIO/blast.pm > >>>>>>>> @@ -209,6 +209,7 @@ BEGIN { > >>>>>>>> > >>>>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>>>>>> 'BlastOutput_version' => > >>>>>> 'RESULT-algorithm_version', > >>>>>>>> + 'BlastOutput_algorithm-reference' => > >>>>>>> 'RESULT-algorithm_reference', > >>>>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>>>>>> @@ -504,6 +505,26 @@ sub next_result { > >>>>>>>> } > >>>>>>>> ); > >>>>>>>> } > >>>>>>>> + # parse the BLAST algorithm reference > >>>>>>>> + elsif(/^Reference:\s+(.*)$/) { > >>>>>>>> + # want to preserve newlines for the BLAST algorithm > >>>>>>> reference > >>>>>>>> + my $algorithm_reference = "$1\n"; > >>>>>>>> + $_ = $self->_readline; > >>>>>>>> + # while the current line, does not match an empty > line, > >> a > >>>>>>> RID:, > >>>>>>>> or a Database:, we are still looking at the > >>>>>>>> + # algorithm_reference, append it to what we parsed so > >> far > >>>>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ > >> /^Database:/) > >>>> { > >>>>>>>> + $algorithm_reference .= "$_"; > >>>>>>>> + $_ = $self->_readline; > >>>>>>>> + } > >>>>>>>> + # if we exited the while loop, we saw an empty line, > a > >>>>>> RID:, > >>>>>>> or > >>>>>>>> a Database:, so push it back > >>>>>>>> + $self->_pushback($_); > >>>>>>>> + $self->element( > >>>>>>>> + { > >>>>>>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>>>>>> + 'Data' => $algorithm_reference > >>>>>>>> + } > >>>>>>>> + ); > >>>>>>>> + } > >>>>>>>> # added Windows workaround for bug 1985 > >>>>>>>> elsif (/^(Searching|Results from round)/) { > >>>>>>>> next unless $1 =~ /Results from round/; > >>>>>>>> > >>>>>>>> > >>>>>>>> I am not sure why reference parsing messes things up. Maybe it > eats > >>>> too > >>>>>>> many > >>>>>>>> lines from the result file. > >>>>>>>> > >>>>>>>> Yours, > >>>>>>>> > >>>>>>>> -Heikki > >>>>>>>> > >>>>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>>>>> > >>>>>>>> Computational Bioscience Research Centre (CBRC), Building #2, > Office > >>>>>>> #4216 > >>>>>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>> >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cmb433 at nyu.edu Sun May 9 19:40:48 2010 From: cmb433 at nyu.edu (bergeycm) Date: Sun, 9 May 2010 16:40:48 -0700 (PDT) Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely Message-ID: <28506482.post@talk.nabble.com> Hi all, I'm attempting to query GenBank for all sequences' lengths for a given taxon. I'm using get_Stream_by_query(), but only to grab the species, length, and accession. The genus of interest has almost 500,000 GB entries, though, and my code hangs up at odd points in the info-gathering loop. (Often after only 300 or 400 iterations.) The problem is that $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back undefined. I've tried wrapping the next_seq portion of the code in an eval block, but to no avail. Is there a way to split a query into a bunch of small streams that aren't too much to ask? Or is there a way to pick up a dropped SeqIO stream? I think the connection is timing out and the stream is being lost. Any advice is greatly appreciated, as I'm fairly new to BioPerl. - bergeycm use Bio::DB::GenBank; use Bio::DB::Query::GenBank; # Get general things ready to go for querying GenBank my %options; $options{'-maxids'} = '500000'; # There are presently 460,184 sequences $options{'-db'} = 'nucleotide'; $options{'-query'} = "Pongo [ORGN]"; # Orangutans my $query_obj = Bio::DB::Query::GenBank->new(%options); my $total = $query_obj->count; my $gb_obj = Bio::DB::GenBank->new(); my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); # Restrict info to just what I'll be using. No sequence necessary. my $builder = $stream_obj->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('species','length','accession'); my $c = 0; for (1 .. $total) { eval { my $seq_obj = $stream_obj->next_seq; my $flavor = $seq_obj->species; print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", $seq_obj->length, "\t", $seq_obj->accession, "\n"; }; if ($@) { print $!, '\n'; } # Pause for a little over a third of a second select(undef, undef, undef, 0.35); $c++; } -- View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From sudeep.mehrotra at mail.mcgill.ca Tue May 11 09:40:07 2010 From: sudeep.mehrotra at mail.mcgill.ca (Sudeep Mehrotra) Date: Tue, 11 May 2010 09:40:07 -0400 Subject: [Bioperl-l] [Fwd: Re: Modules in Bio:Tree] Message-ID: <4BE95E37.3060702@mail.mcgill.ca> Hello Jason, Your suggestion worked. Thanks. I have two format (NEXUS and NEWICK) for the same tree. I want to obtain a "clade list" in other words is there a way to obtain the leaves which are members of a clade. For example,part of NEXUS file has following entry: other entries ....... 655 Deinococcus_geothermalis, 656 Deinococcus_radiodurans, 657 Thermus_thermophilus, 658 Thermus_sp. ; other entries........ (((((655,656)[])[])[],(((657,658)[])[])[])[])[])[])[]); From the tree I can observe that 657 and 658 are members of a subclade and 655 and 656 are member of another subclade and both these belong to one clade. I want to get this membership information. I tried looking for a module in Bio::Tree but not able to find any. In Bio::NEXUS package there is a module "walk" which I thought would work for me, but it does not. Also, the Bio::NEXUS package is just not working for me. From the documentation the input file they are using it different from what I have. Is there any way I get the membership information as shown earlier. Cheers -- Sudeep Mehrotra (Ph.D. Candidate) McGill University and Genome Quebec Innovation Center -------------- next part -------------- An embedded message was scrubbed... From: Jason Stajich Subject: Re: Modules in Bio:Tree Date: Wed, 5 May 2010 18:45:41 -0400 Size: 5420 URL: From amackey at virginia.edu Tue May 11 17:26:50 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 11 May 2010 17:26:50 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug Message-ID: Hi Zerui (and others), I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, specifically in this code: lines: 1170: (-start => int ($loc->start / 3 ) +1, 1171: -end => int ($loc->end / 3 ) +1, both of those lines should look like: int (($loc->start - 1) / 3) + 1 otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) There is also a problem when mapping exon coordinates that are outside/after the CDS region (instead of getting undefined locations, you continue to get peptide coordinates, but they are invalid, larger than the protein length). Dennis and fringy -- this may affect the SNPtab.pl script I wrote for you, as it uses this module to calculate codons for SNPs. -Aaron P.S. a script the demonstrates the problem: use Bio::Coordinate::GeneMapper; my $mapper = Bio::Coordinate::GeneMapper ->new( -in => "chr", -out => "propeptide", -exons => [ Bio::Location::Simple ->new( -start => 101, -end => 109 ), Bio::Location::Simple ->new( -start => 201, -end => 221 ), ], -cds => Bio::Location::Simple ->new(-start => 101, -end => 209), ); print join("\t", "chr", "aa"), "\n"; for my $pos (99..111,199..211) { my $res = $mapper->map( Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => 1)); my $start = $res->start; $start = "NA" unless defined $start; my $end = $res->end; $end = "NA" unless defined $end; print join("\t", $pos, $start), "\n"; } From cjfields at illinois.edu Tue May 11 18:31:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 May 2010 17:31:17 -0500 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: Message-ID: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Aaron, Do we want to write this up as a set of tests to add to the bioperl test suite? We can probably add it after the github migration tomorrow. chris On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > Hi Zerui (and others), > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > specifically in this code: > > lines: > 1170: (-start => int ($loc->start / 3 ) +1, > 1171: -end => int ($loc->end / 3 ) +1, > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > There is also a problem when mapping exon coordinates that are outside/after > the CDS region (instead of getting undefined locations, you continue to get > peptide coordinates, but they are invalid, larger than the protein length). > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for you, > as it uses this module to calculate codons for SNPs. > > -Aaron > > P.S. a script the demonstrates the problem: > > use Bio::Coordinate::GeneMapper; > > my $mapper = > Bio::Coordinate::GeneMapper > ->new( -in => "chr", > -out => "propeptide", > -exons => [ Bio::Location::Simple > ->new( -start => 101, > -end => 109 ), > Bio::Location::Simple > ->new( -start => 201, > -end => 221 ), > ], > -cds => Bio::Location::Simple > ->new(-start => 101, -end => 209), > ); > > > print join("\t", "chr", "aa"), "\n"; > for my $pos (99..111,199..211) { > my $res = $mapper->map( > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => 1)); > my $start = $res->start; $start = "NA" unless defined $start; > my $end = $res->end; $end = "NA" unless defined $end; > print join("\t", $pos, $start), "\n"; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From amackey at virginia.edu Tue May 11 18:40:11 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 11 May 2010 18:40:11 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Hi Chris, I was hoping Heikki might take up the cause and investigate further -- let's give him a chance to respond. But it's a frightening bug if it's really been that way for all this time ... -Aaron On Tue, May 11, 2010 at 6:31 PM, Chris Fields wrote: > Aaron, > > Do we want to write this up as a set of tests to add to the bioperl test > suite? We can probably add it after the github migration tomorrow. > > chris > > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > > > Hi Zerui (and others), > > > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > > specifically in this code: > > > > lines: > > 1170: (-start => int ($loc->start / 3 ) +1, > > 1171: -end => int ($loc->end / 3 ) +1, > > > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > > > There is also a problem when mapping exon coordinates that are > outside/after > > the CDS region (instead of getting undefined locations, you continue to > get > > peptide coordinates, but they are invalid, larger than the protein > length). > > > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for > you, > > as it uses this module to calculate codons for SNPs. > > > > -Aaron > > > > P.S. a script the demonstrates the problem: > > > > use Bio::Coordinate::GeneMapper; > > > > my $mapper = > > Bio::Coordinate::GeneMapper > > ->new( -in => "chr", > > -out => "propeptide", > > -exons => [ Bio::Location::Simple > > ->new( -start => 101, > > -end => 109 ), > > Bio::Location::Simple > > ->new( -start => 201, > > -end => 221 ), > > ], > > -cds => Bio::Location::Simple > > ->new(-start => 101, -end => 209), > > ); > > > > > > print join("\t", "chr", "aa"), "\n"; > > for my $pos (99..111,199..211) { > > my $res = $mapper->map( > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => > 1)); > > my $start = $res->start; $start = "NA" unless defined $start; > > my $end = $res->end; $end = "NA" unless defined $end; > > print join("\t", $pos, $start), "\n"; > > } > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed May 12 00:15:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 May 2010 23:15:54 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow Message-ID: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Just a friendly reminder that we'll freeze the dev subversion repository tomorrow prior to migration to github. The migration will take about an hour, during which all bioperl github repos will be replaced with the full repos, and devs added. The test repos will be removed around that time (Heikki, will that be a problem?). chris From heikki.lehvaslaiho at gmail.com Wed May 12 00:23:07 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Wed, 12 May 2010 07:23:07 +0300 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Message-ID: No problem at all. Go ahead. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 07:15, Chris Fields wrote: > Just a friendly reminder that we'll freeze the dev subversion repository > tomorrow prior to migration to github. The migration will take about an > hour, during which all bioperl github repos will be replaced with the full > repos, and devs added. The test repos will be removed around that time > (Heikki, will that be a problem?). > > chris From heikki.lehvaslaiho at gmail.com Wed May 12 06:23:03 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Wed, 12 May 2010 13:23:03 +0300 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Outch. I'll definitely have a look. Strange that none of the tests have picked this up... -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 01:40, Aaron Mackey wrote: > Hi Chris, > > I was hoping Heikki might take up the cause and investigate further -- > let's > give him a chance to respond. But it's a frightening bug if it's really > been that way for all this time ... > > -Aaron > > On Tue, May 11, 2010 at 6:31 PM, Chris Fields > wrote: > > > Aaron, > > > > Do we want to write this up as a set of tests to add to the bioperl test > > suite? We can probably add it after the github migration tomorrow. > > > > chris > > > > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > > > > > Hi Zerui (and others), > > > > > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > > > specifically in this code: > > > > > > lines: > > > 1170: (-start => int ($loc->start / 3 ) +1, > > > 1171: -end => int ($loc->end / 3 ) +1, > > > > > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > > > > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect > peptide > > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > > > > > There is also a problem when mapping exon coordinates that are > > outside/after > > > the CDS region (instead of getting undefined locations, you continue to > > get > > > peptide coordinates, but they are invalid, larger than the protein > > length). > > > > > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for > > you, > > > as it uses this module to calculate codons for SNPs. > > > > > > -Aaron > > > > > > P.S. a script the demonstrates the problem: > > > > > > use Bio::Coordinate::GeneMapper; > > > > > > my $mapper = > > > Bio::Coordinate::GeneMapper > > > ->new( -in => "chr", > > > -out => "propeptide", > > > -exons => [ Bio::Location::Simple > > > ->new( -start => 101, > > > -end => 109 ), > > > Bio::Location::Simple > > > ->new( -start => 201, > > > -end => 221 ), > > > ], > > > -cds => Bio::Location::Simple > > > ->new(-start => 101, -end => 209), > > > ); > > > > > > > > > print join("\t", "chr", "aa"), "\n"; > > > for my $pos (99..111,199..211) { > > > my $res = $mapper->map( > > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => > > 1)); > > > my $start = $res->start; $start = "NA" unless defined $start; > > > my $end = $res->end; $end = "NA" unless defined $end; > > > print join("\t", $pos, $start), "\n"; > > > } > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed May 12 12:24:49 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:24:49 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <4BEAD562.1010702@cornell.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: <97B3DF77-C657-4E7C-8298-529F474E1FA5@illinois.edu> Yup, haven't started the migration yet (I'm taking down some crontab scripts used for prior github updates, nightly builds). Then I'll announce before freezing the repo. chris On May 12, 2010, at 11:20 AM, Robert Buels wrote: > The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. > > R > > Heikki Lehvaslaiho wrote: >> No problem at all. Go ahead. >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +966 545 595 849 office: +966 2 808 2429 >> Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 >> 4700 King Abdullah University of Science and Technology (KAUST) >> Thuwal 23955-6900, Kingdom of Saudi Arabia >> On 12 May 2010 07:15, Chris Fields wrote: >>> Just a friendly reminder that we'll freeze the dev subversion repository >>> tomorrow prior to migration to github. The migration will take about an >>> hour, during which all bioperl github repos will be replaced with the full >>> repos, and devs added. The test repos will be removed around that time >>> (Heikki, will that be a problem?). >>> >>> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Wed May 12 12:20:50 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 12 May 2010 09:20:50 -0700 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Message-ID: <4BEAD562.1010702@cornell.edu> The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. R Heikki Lehvaslaiho wrote: > No problem at all. Go ahead. > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > On 12 May 2010 07:15, Chris Fields wrote: > >> Just a friendly reminder that we'll freeze the dev subversion repository >> tomorrow prior to migration to github. The migration will take about an >> hour, during which all bioperl github repos will be replaced with the full >> repos, and devs added. The test repos will be removed around that time >> (Heikki, will that be a problem?). >> >> chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed May 12 12:43:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:43:42 -0500 Subject: [Bioperl-l] dev.open-bio.org SVN is now read-only Message-ID: Just like the subject says, switched the repo to a read only status. I'm starting the github migration now. chris From thomas.sharpton at gmail.com Wed May 12 12:45:22 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 12 May 2010 09:45:22 -0700 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <4BEAD562.1010702@cornell.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: Sorry if I screwed things up - updated before checking this email tread. -T On May 12, 2010, at 9:20 AM, Robert Buels wrote: > The SVN repository is not frozen yet, driveby_bot just say 16984 go > into svn from Thomas Sharpton. > > R > > Heikki Lehvaslaiho wrote: >> No problem at all. Go ahead. >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +966 545 595 849 office: +966 2 808 2429 >> Computational Bioscience Research Centre (CBRC), Building #2, >> Office #4216 >> 4700 King Abdullah University of Science and Technology (KAUST) >> Thuwal 23955-6900, Kingdom of Saudi Arabia >> On 12 May 2010 07:15, Chris Fields wrote: >>> Just a friendly reminder that we'll freeze the dev subversion >>> repository >>> tomorrow prior to migration to github. The migration will take >>> about an >>> hour, during which all bioperl github repos will be replaced with >>> the full >>> repos, and devs added. The test repos will be removed around that >>> time >>> (Heikki, will that be a problem?). >>> >>> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed May 12 12:47:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:47:36 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: <08E7C628-D914-43C0-AB3D-E8FC41A144DC@illinois.edu> No problem, just froze the repo and rsynced to my local machine, so your commit made it just under the wire. chris On May 12, 2010, at 11:45 AM, Thomas Sharpton wrote: > Sorry if I screwed things up - updated before checking this email tread. > > -T > > On May 12, 2010, at 9:20 AM, Robert Buels wrote: > >> The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. >> >> R >> >> Heikki Lehvaslaiho wrote: >>> No problem at all. Go ahead. >>> -Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +966 545 595 849 office: +966 2 808 2429 >>> Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 >>> 4700 King Abdullah University of Science and Technology (KAUST) >>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>> On 12 May 2010 07:15, Chris Fields wrote: >>>> Just a friendly reminder that we'll freeze the dev subversion repository >>>> tomorrow prior to migration to github. The migration will take about an >>>> hour, during which all bioperl github repos will be replaced with the full >>>> repos, and devs added. The test repos will be removed around that time >>>> (Heikki, will that be a problem?). >>>> >>>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maizemu at gmail.com Wed May 12 13:12:28 2010 From: maizemu at gmail.com (Christopher Bottoms) Date: Wed, 12 May 2010 12:12:28 -0500 Subject: [Bioperl-l] Citing CPAN modules in scientific publications Message-ID: Dear BioPerlers, I am working on a publication which would be impossible without the use of several CPAN modules. I appreciate the work authors and maintainers have put into these modules and would like to acknowledge them by citing their work. I was thinking of a format such as Author(s), Maintainer(s) *Module::Name* [ http://search.cpan.org/dist/Module-Name] A reference for File::Slurp would appear thus: Uri Guttman, Dave Rolsky *File::Slurp* [ http://search.cpan.org/dist/File-Slurp] I guess that I could instead mention authors in an acknowledgment section. I noticed a large acknowledgment section in the BioPerl paper ( http://genome.cshlp.org/content/12/10/1611.full). Thanks for your time, Christopher Bottoms (molecules) From greg at ebi.ac.uk Wed May 12 14:16:53 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Wed, 12 May 2010 19:16:53 +0100 Subject: [Bioperl-l] BioPerl for indexing quality score files Message-ID: Hi all, I'm wondering if anyone has tried using BioPerl to index sequence quality score files? The files I'm looking at tend to look like Fasta files, but with numbers (between 0 and 99) and spaces instead of sequence strings. Something like: --- >chr1 0 20 20 20 50 99 99 99 99 30 30 20 20 10 10 0 0 0 0 --- (An example for Chimpanzee can be found here, as the file 'panTro2.quals.fa.gz': http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ ) I'm currently using a home-brewed file indexing system to access subsets of these quality scores, but it's kind of slow and (probably) buggy. I'd much rather use something like Bio::DB::Fasta, but (without having actually tried it) I expect it wouldn't be too happy with these not-quite-fasta format quality files. Has anyone run into a similar situation and found a solution using Bioperl (or something else)? I'd be happy to hack around a bit to get this to work, if necessary; if anyone could provide pointers on where to start, I'd be much obliged. Cheers, Greg PS - it's great to see the GitHub migration moving along so swiftly! I'll be *much* more likely to start bug-hunting and patch-submitting with the code living there now. :) From greg at ebi.ac.uk Wed May 12 14:26:26 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Wed, 12 May 2010 19:26:26 +0100 Subject: [Bioperl-l] BioPerl for indexing quality score files In-Reply-To: References: Message-ID: Ok, I need to shame myself with a huge "RTFM" for this one -- http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/DB/Qual.pm Sorry for the spam. Still happy about the GitHub, though! greg On 12 May 2010 19:16, Gregory Jordan wrote: > Hi all, > > I'm wondering if anyone has tried using BioPerl to index sequence quality > score files? The files I'm looking at tend to look like Fasta files, but > with numbers (between 0 and 99) and spaces instead of sequence strings. > Something like: > --- > >chr1 > 0 20 20 20 50 99 99 99 99 30 30 20 20 10 10 0 0 0 0 > --- > (An example for Chimpanzee can be found here, as the file > 'panTro2.quals.fa.gz': > http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ ) > > I'm currently using a home-brewed file indexing system to access subsets of > these quality scores, but it's kind of slow and (probably) buggy. I'd much > rather use something like Bio::DB::Fasta, but (without having actually tried > it) I expect it wouldn't be too happy with these not-quite-fasta format > quality files. > > Has anyone run into a similar situation and found a solution using Bioperl > (or something else)? > > I'd be happy to hack around a bit to get this to work, if necessary; if > anyone could provide pointers on where to start, I'd be much obliged. > > Cheers, > Greg > > PS - it's great to see the GitHub migration moving along so swiftly! I'll > be *much* more likely to start bug-hunting and patch-submitting with the > code living there now. :) > From cjfields at illinois.edu Wed May 12 14:48:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 13:48:53 -0500 Subject: [Bioperl-l] GitHub migration complete Message-ID: All, The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. I've added collaborators to almost all repositories; unfortunately, GitHub decided to remove 'copy permissions' for adding collaborators just recently, so we'll have to manually add each in to each repo until that is resolved (from what I hear, should be soon). In the meantime, if you are a bioperl developer and aren't listed as a github collaborator please sign up for a github account, add SSH keys, and let me know your github user name. I'll add you to bioperl-live and any other repos you want (please let me know which ones!). I'll be doing a few last-minute house-cleaning bits (adding post-receive hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. Let me know how it goes! chris From David.Messina at sbc.su.se Wed May 12 15:59:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 12 May 2010 21:59:14 +0200 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: Thanks, Chris! Clone and commit are working here. Dave From Kevin.M.Brown at asu.edu Wed May 12 16:06:38 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 12 May 2010 13:06:38 -0700 Subject: [Bioperl-l] Citing CPAN modules in scientific publications In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> Wouldn't the format of the citation actually be dictated by the publication the paper was going to be in? E.g. the APA guide sets the format to be: Jones, D. F. (2002). The Mental Measurement Tester (Version 3.2) [Computer software]. Fort Lauderdale, FL: Nova Southeastern University. Retrieved July 22, 2007. Available from http://www.buros.com/ Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Christopher Bottoms > Sent: Wednesday, May 12, 2010 10:12 AM > To: bioperl-l List > Subject: [Bioperl-l] Citing CPAN modules in scientific publications > > Dear BioPerlers, > > I am working on a publication which would be impossible > without the use of > several CPAN modules. I appreciate the work authors and > maintainers have put > into these modules and would like to acknowledge them by > citing their work. > > I was thinking of a format such as > Author(s), Maintainer(s) *Module::Name* [ > http://search.cpan.org/dist/Module-Name] > > > A reference for File::Slurp would appear thus: > > Uri Guttman, Dave Rolsky *File::Slurp* [ > http://search.cpan.org/dist/File-Slurp] > > > I guess that I could instead mention authors in an > acknowledgment section. I > noticed a large acknowledgment section in the BioPerl paper ( > http://genome.cshlp.org/content/12/10/1611.full). > > Thanks for your time, > Christopher Bottoms (molecules) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Wed May 12 16:35:27 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 12 May 2010 15:35:27 -0500 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> On May 12, 2010, at 1:48 PM, Chris Fields wrote: > The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. You mean clones? ;) Thanks Chris!! This is *awesome*. I'm really glad we're in git now and very much appreciate all your work on this. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Wed May 12 17:34:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 16:34:39 -0500 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> References: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> Message-ID: On May 12, 2010, at 3:35 PM, Jay Hannah wrote: > On May 12, 2010, at 1:48 PM, Chris Fields wrote: >> The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. > > You mean clones? ;) > > Thanks Chris!! This is *awesome*. I'm really glad we're in git now and very much appreciate all your work on this. > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah Yes, that was svn slipping in there... chris From maj at fortinbras.us Wed May 12 21:44:09 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 12 May 2010 21:44:09 -0400 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: <77C82E975CC24860AA16EE537E270FBD@NewLife> awesome job, Chris- MAJ (what's git again? Oh never mind...) ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Wednesday, May 12, 2010 2:48 PM Subject: [Bioperl-l] GitHub migration complete > All, > > The migration to github is now essentially complete, minus a few small > house-keeping details. Please let me know if there are problems with > checkouts. > > I've added collaborators to almost all repositories; unfortunately, GitHub > decided to remove 'copy permissions' for adding collaborators just recently, > so we'll have to manually add each in to each repo until that is resolved > (from what I hear, should be soon). In the meantime, if you are a bioperl > developer and aren't listed as a github collaborator please sign up for a > github account, add SSH keys, and let me know your github user name. I'll add > you to bioperl-live and any other repos you want (please let me know which > ones!). > > I'll be doing a few last-minute house-cleaning bits (adding post-receive > hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. Let > me know how it goes! > > chris > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maizemu at gmail.com Wed May 12 23:27:47 2010 From: maizemu at gmail.com (Christopher Bottoms) Date: Wed, 12 May 2010 22:27:47 -0500 Subject: [Bioperl-l] Citing CPAN modules in scientific publications In-Reply-To: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> Message-ID: Thanks. I was also wondering about listing the maintainer. I'm guessing not, since the maintainer can add herself (or himself) to the list of authors if she felt that she had contributed enough to warrant it. On Wed, May 12, 2010 at 3:06 PM, Kevin Brown wrote: > Wouldn't the format of the citation actually be dictated by the > publication the paper was going to be in? E.g. the APA guide sets the > format to be: > > Jones, D. F. (2002). The Mental Measurement Tester (Version 3.2) > [Computer software]. > Fort Lauderdale, FL: Nova Southeastern University. Retrieved > July 22, 2007. > Available from http://www.buros.com/ > > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Christopher Bottoms > > Sent: Wednesday, May 12, 2010 10:12 AM > > To: bioperl-l List > > Subject: [Bioperl-l] Citing CPAN modules in scientific publications > > > > Dear BioPerlers, > > > > I am working on a publication which would be impossible > > without the use of > > several CPAN modules. I appreciate the work authors and > > maintainers have put > > into these modules and would like to acknowledge them by > > citing their work. > > > > I was thinking of a format such as > > Author(s), Maintainer(s) *Module::Name* [ > > http://search.cpan.org/dist/Module-Name] > > > > > > A reference for File::Slurp would appear thus: > > > > Uri Guttman, Dave Rolsky *File::Slurp* [ > > http://search.cpan.org/dist/File-Slurp] > > > > > > I guess that I could instead mention authors in an > > acknowledgment section. I > > noticed a large acknowledgment section in the BioPerl paper ( > > http://genome.cshlp.org/content/12/10/1611.full). > > > > Thanks for your time, > > Christopher Bottoms (molecules) > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki.lehvaslaiho at gmail.com Thu May 13 02:11:40 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 09:11:40 +0300 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: <77C82E975CC24860AA16EE537E270FBD@NewLife> References: <77C82E975CC24860AA16EE537E270FBD@NewLife> Message-ID: It works. Bliss. Worth mentioning now on the list that the latest instructions are in http://www.bioperl.org/wiki/Using_Git I've recommitted the the two changes I did on the experimental repo. I had a small problem when editing the README text file: git was not showing differences between the original file and my edits. It kept saying that bala ~/src/bioperl-live> git diff README diff --git a/README b/README index 03685a8..8e20592 100644 Binary files a/README and b/README differ The reason, of course, was that a hard to detect binary character had slipped in to my edit. Just so that you know when this happens to you... -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 13 May 2010 04:44, Mark A. Jensen wrote: > awesome job, Chris- MAJ > (what's git again? Oh never mind...) > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Wednesday, May 12, 2010 2:48 PM > Subject: [Bioperl-l] GitHub migration complete > > > > All, >> >> The migration to github is now essentially complete, minus a few small >> house-keeping details. Please let me know if there are problems with >> checkouts. >> >> I've added collaborators to almost all repositories; unfortunately, GitHub >> decided to remove 'copy permissions' for adding collaborators just recently, >> so we'll have to manually add each in to each repo until that is resolved >> (from what I hear, should be soon). In the meantime, if you are a bioperl >> developer and aren't listed as a github collaborator please sign up for a >> github account, add SSH keys, and let me know your github user name. I'll >> add you to bioperl-live and any other repos you want (please let me know >> which ones!). >> >> I'll be doing a few last-minute house-cleaning bits (adding post-receive >> hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. >> Let me know how it goes! >> >> chris >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki.lehvaslaiho at gmail.com Thu May 13 02:20:51 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 09:20:51 +0300 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Just a thumbs up. Aaron's fix works. It problem seems to be limited to where he spotted it. I am working on refreshing my memory how the code work - it has been quite a few years since I wrote it - and will commit better tests. As of getting values outseide the defined region, that is a feature rather than a bug. The idea was to be able to ask what would the new coordinate be if the feature extended beyond the known limits. The is the capability of Bio::Coordinate::ExtrapolatingPair that is used here. That class also has a method strict that can be used to prevent extrapolating, but the code to access that has not been written into GeneMapper. I'll see if I can get it to work. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 13:23, Heikki Lehvaslaiho wrote: > Outch. I'll definitely have a look. > > Strange that none of the tests have picked this up... > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > > On 12 May 2010 01:40, Aaron Mackey wrote: > >> Hi Chris, >> >> I was hoping Heikki might take up the cause and investigate further -- >> let's >> give him a chance to respond. But it's a frightening bug if it's really >> been that way for all this time ... >> >> -Aaron >> >> On Tue, May 11, 2010 at 6:31 PM, Chris Fields >> wrote: >> >> > Aaron, >> > >> > Do we want to write this up as a set of tests to add to the bioperl test >> > suite? We can probably add it after the github migration tomorrow. >> > >> > chris >> > >> > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: >> > >> > > Hi Zerui (and others), >> > > >> > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, >> > > specifically in this code: >> > > >> > > lines: >> > > 1170: (-start => int ($loc->start / 3 ) +1, >> > > 1171: -end => int ($loc->end / 3 ) +1, >> > > >> > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 >> > > >> > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect >> peptide >> > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) >> > > >> > > There is also a problem when mapping exon coordinates that are >> > outside/after >> > > the CDS region (instead of getting undefined locations, you continue >> to >> > get >> > > peptide coordinates, but they are invalid, larger than the protein >> > length). >> > > >> > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for >> > you, >> > > as it uses this module to calculate codons for SNPs. >> > > >> > > -Aaron >> > > >> > > P.S. a script the demonstrates the problem: >> > > >> > > use Bio::Coordinate::GeneMapper; >> > > >> > > my $mapper = >> > > Bio::Coordinate::GeneMapper >> > > ->new( -in => "chr", >> > > -out => "propeptide", >> > > -exons => [ Bio::Location::Simple >> > > ->new( -start => 101, >> > > -end => 109 ), >> > > Bio::Location::Simple >> > > ->new( -start => 201, >> > > -end => 221 ), >> > > ], >> > > -cds => Bio::Location::Simple >> > > ->new(-start => 101, -end => 209), >> > > ); >> > > >> > > >> > > print join("\t", "chr", "aa"), "\n"; >> > > for my $pos (99..111,199..211) { >> > > my $res = $mapper->map( >> > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => >> > 1)); >> > > my $start = $res->start; $start = "NA" unless defined $start; >> > > my $end = $res->end; $end = "NA" unless defined $end; >> > > print join("\t", $pos, $start), "\n"; >> > > } >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From remi.planel at free.fr Thu May 13 05:08:58 2010 From: remi.planel at free.fr (Remi) Date: Thu, 13 May 2010 11:08:58 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast Message-ID: <4BEBC1AA.2020908@free.fr> Hi all, I'm using Bio::Tools::Run::StandAloneBlastPlus and trying to run a remote blast using this code : /my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'nr', -remote => '1', ); my $result = $fac->blastp( -query => 'P12996.fasta', -outfile => 'out.bls', ); /but I got an error message : "BLAST Database error: Protein BLAST database './nr' does not exist in the NCBI servers". But if I'm modifying directly the value of $fac->{'_db_path'} like : /$fac->{'_db_path'} = 'nr';/ it's working. Is that a Bug or am I missing something ? Thanks, R?mi From maj at fortinbras.us Thu May 13 07:17:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 13 May 2010 07:17:55 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast In-Reply-To: <4BEBC1AA.2020908@free.fr> References: <4BEBC1AA.2020908@free.fr> Message-ID: <1A1631149DEF4B9080E5D4D5851F4587@NewLife> Hi R?mi Looks like a bug-- can you report it via http://bugzilla.bioperl.org? Just enter what you've written here-- I appreciate it- Mark ----- Original Message ----- From: "Remi" To: "BioPerl List" Sent: Thursday, May 13, 2010 5:08 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast Hi all, I'm using Bio::Tools::Run::StandAloneBlastPlus and trying to run a remote blast using this code : /my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'nr', -remote => '1', ); my $result = $fac->blastp( -query => 'P12996.fasta', -outfile => 'out.bls', ); /but I got an error message : "BLAST Database error: Protein BLAST database './nr' does not exist in the NCBI servers". But if I'm modifying directly the value of $fac->{'_db_path'} like : /$fac->{'_db_path'} = 'nr';/ it's working. Is that a Bug or am I missing something ? Thanks, R?mi _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed May 12 16:10:36 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 12 May 2010 22:10:36 +0200 Subject: [Bioperl-l] Ohloh update Message-ID: <32ED5B44-061D-4634-9E5C-72E313E1A58C@sbc.su.se> Hi everyone, Ohloh account probably needs to be changed to point to our Github repo. I'd be happy to do it if someone adds me on there. Otherwise, could one of the admins check into that when they get a chance? Also, I notice it hasn't registered any commits since March 15th ? hopefully the repo change will wake it up or we may need to contact one of their admins again. Can anyone think of other external sites pointing to BioPerl which need updating, too? Dave From jay at jays.net Thu May 13 08:42:41 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 07:42:41 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <201005130328.o4D3S8Fs011865@portal.open-bio.org> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> Message-ID: ------- Comment #3 from cjfields at bioperl.org 2010-05-12 23:28 EST ------- > Ouch, that's a bit nasty. Taking advantage of git move and doing this on a > topic branch (topic/bug_3077) on github. I plan on cleaning up the 'jhannah' branch (renaming it 'topic/bug_2515', asking people for their input, merging to master). I plan on cleaning up the 'yapc10hackathon' branch. I can't remember what Robert and I left in there after YAPC last year. Should most of the other branches be deleted? If a branch hasn't been changed in more than a year and no one intends to jump into it in the coming year what purpose does it serve? Old tags can hang out forever, but shouldn't our branch list be tidy? (Specifically I would argue that old release number tags should hang out forever, but I don't see the point in any other ancient tags continuing to exist if their purpose isn't documented anywhere.) Are we serious about emulating this branching model? http://nvie.com/git-model If so then we need to create a 'develop' branch and only the release manager should touch 'master' and yahoos like me should be branching off of 'develop' instead, right? Counter argument: Since 'master' is the default branch and we want to encourage doc patches and typo corrections from the world making trivial contributions as easy as possible for everyone, I would think that using 'master' as the daily headstream would be better. So 'topic/bug_####' for each non-trivial Bugzilla ticket, and release managers can work their magic in 'release-#-#' branches. (Release branches old enough that there's no way we're going to patch them any more are deleted, and only the tag remains). Thoughts? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah P.S. My "up to date" OS X 10.6.3 machines both had git 1.5.3.1 on them. Upgrading to git 1.7.1 makes branch checkouts simpler. jhannah at minijaysnet~/src/bioperl-live$ git branch -a * master remotes/origin/HEAD -> origin/master remotes/origin/TRY_featureio_refactor remotes/origin/TRY_gff_refactor remotes/origin/TRY_locatableseq_refactor remotes/origin/anydbm-branch remotes/origin/bioperl remotes/origin/bioperl-branch-1-5-1 remotes/origin/bioperl-live remotes/origin/branch-06 remotes/origin/branch-07 remotes/origin/branch-07-ensembl-120 remotes/origin/branch-1-0-0 remotes/origin/branch-1-2 remotes/origin/branch-1-2-collection remotes/origin/branch-1-4 remotes/origin/branch-1-5-2 remotes/origin/branch-1-6 remotes/origin/branch-ensembl-m1 remotes/origin/branch-experimental remotes/origin/featann_rollback remotes/origin/internal-branch-pre-delete-06-tag remotes/origin/jhannah remotes/origin/lightweight_feature_branch remotes/origin/master remotes/origin/ontology-cache remotes/origin/release-0-04-bug remotes/origin/restriction-refactor remotes/origin/stable-0-05 remotes/origin/stable-0-05-new remotes/origin/steve_chervitz remotes/origin/topic/bug_3077 remotes/origin/yapc10hackathon jhannah at minijaysnet~/src/bioperl-live$ git tag after-05-06-merge after-05-06-merge-2 after004 before-05-to-06-merge before-05-to-06-trunk bioperl-06-1 bioperl-061-pre1 bioperl-1-0-0 bioperl-1-0-alpha bioperl-1-0-alpha2-rc bioperl-1-2-1-rc1 bioperl-1-6-0_001 bioperl-1-6-0_002 bioperl-1-6-0_003 bioperl-1-6-0_004 bioperl-1-6-0_005 bioperl-1-6-0_006 bioperl-1-6-RC1 bioperl-1-6-RC2 bioperl-1-6-RC2_15306 bioperl-1-6-RC3 bioperl-1-6-RC3_15392 bioperl-1-6-RC4 bioperl-devel-1-1-1 bioperl-devel-1-3-01 bioperl-devel-1-3-02 bioperl-devel-1-3-03 bioperl-devel-1-3-04 bioperl-release-1-0-0 bioperl-release-1-0-1 bioperl-release-1-0-2 bioperl-release-1-1-0 bioperl-release-1-2-0 bioperl-release-1-2-1 bioperl-release-1-2-2 bioperl-release-1-2-3 bioperl-release-1-4-0 bioperl-release-1-5-0 bioperl-release-1-5-0-rc1 bioperl-release-1-5-0-rc2 bioperl-release-1-5-1 bioperl-release-1-5-1-rc4 bioperl-release-1-5-2 bioperl-release-1-5-2-patch1 bioperl-release-1-5-2-patch2 bioperl-release-1-6 bioperl-release-1-6-1 bioperl-run-release-1-2-0 for_gmod_0_003 gbrowse_1_65 join-0-04-to-0-05 lightweight_feature ontology-fix1 ontology-overhaul-end ontology-overhaul-start prerelease-06 release-0-04-1 release-0-04-2 release-0-04-3 release-0-04-4 release-0-05 release-0-05-1 release-0-7-0 release-0-7-1 release-0-7-2 release-0-9-0 release-0-9-2 release-0-9-3 release-06 release-06-2 release-1_01 release-ensembl-06 snapshot-at-head-of-07-branch start tag-ensembl-stable-061 From cjfields at illinois.edu Thu May 13 09:49:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 08:49:19 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> Message-ID: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> On May 13, 2010, at 7:42 AM, Jay Hannah wrote: > ------- Comment #3 from cjfields at bioperl.org 2010-05-12 23:28 EST ------- >> Ouch, that's a bit nasty. Taking advantage of git move and doing this on a >> topic branch (topic/bug_3077) on github. > > I plan on cleaning up the 'jhannah' branch (renaming it 'topic/bug_2515', asking people for their input, merging to master). > > I plan on cleaning up the 'yapc10hackathon' branch. I can't remember what Robert and I left in there after YAPC last year. > > Should most of the other branches be deleted? If a branch hasn't been changed in more than a year and no one intends to jump into it in the coming year what purpose does it serve? Old tags can hang out forever, but shouldn't our branch list be tidy? (Specifically I would argue that old release number tags should hang out forever, but I don't see the point in any other ancient tags continuing to exist if their purpose isn't documented anywhere.) I would say err on the safe side and keep the ones we're unsure of, but a cleanup would be nice. We could adopt what Moose has done and move branches we're unsure of to something like 'attic'. > Are we serious about emulating this branching model? > > http://nvie.com/git-model > > If so then we need to create a 'develop' branch and only the release manager should touch 'master' and yahoos like me should be branching off of 'develop' instead, right? > > Counter argument: Since 'master' is the default branch and we want to encourage doc patches and typo corrections from the world making trivial contributions as easy as possible for everyone, I would think that using 'master' as the daily headstream would be better. So 'topic/bug_####' for each non-trivial Bugzilla ticket, and release managers can work their magic in 'release-#-#' branches. (Release branches old enough that there's no way we're going to patch them any more are deleted, and only the tag remains). ... > Thoughts? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > P.S. My "up to date" OS X 10.6.3 machines both had git 1.5.3.1 on them. Upgrading to git 1.7.1 makes branch checkouts simpler. Moose has a 'stable' branch that release managers (the cabal) pull into from 'master' for releases. It's just a matter of semantics, what name we use for active development branches and what to use for stable releases; for us, the 'develop' and 'master' from that link could be (respectively) 'master' and 'stable'. 'hotfixes' would be bug fixes, and 'feature branches' would be just that, new features to be added. As for bug fixes, it would be much nicer to have most changes beyond very simple ones (including all bug fixes) relegated to branches that can be merged in. This sequesters any changes to the branch, where they can be tested prior to a merge. Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. chris From jay at jays.net Thu May 13 10:38:20 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 09:38:20 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: So, like this? Flow diagram: http://biodoc.ist.unomaha.edu/~jhannah/tmp/branches.png master (git and github default) Trivial changes committed directly here. topic/bug_#### One branch per non-trivial Bugzilla ticket topic/jhannah_crazy_idea Branches for unstable/unfinished work stable Release manager pulls from master to stable periodically (all tests are passing, etc.) release-#-#-# Pulled from stable, pushed to CPAN attic/* Any branch with no activity for 1 year I like it. > Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? I'm fine with attic/ and just leaving stuff in there until 2050. Then we should probably delete them. :) My understanding is that by default commits that have no pointers to them (branches or tags or subsequent commits) are subject to cleanup/prune. I think this means that if someone, 10 years ago, committed 3 times to the branch "jhannah_crazy_idea" and that branch is deleted, then those 3 commits may be removed (gone forever) by git cleanup/prune. This is a feature or a crime against humanity depending on who you ask. It can be disabled in a normal repo, I don't know about github. > Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. As I collect clues I'll be brain dumping everything I think I know onto the wiki. This is a crazy busy week for me though. :( Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Thu May 13 11:00:05 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 10:00:05 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: On May 13, 2010, at 8:49 AM, Chris Fields wrote: > Saying that, we could adopt a workflow policy that allows deletion of any merged branch. Right. Except for release-* branches, which are never merged anywhere. A release is a branch while it's being prepared and tweaked. Once perfect, it is tagged and pushed to CPAN. At that point the branch can be deleted since we can never push that release number to CPAN again (even if we wanted to). The tag remains forever. Or am I mistaken? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From shalabh.sharma7 at gmail.com Thu May 13 11:07:26 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 13 May 2010 11:07:26 -0400 Subject: [Bioperl-l] parsing blast report with long description Message-ID: Hi All, I need some help in parsing blast output. I have a inhouse database that contain sequences with really long description. >SMPL_IDI_1105131728043 /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV So my blast report looks like this: ..... ..... >SMPL_IDI_1105131728043 /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 6887/Open Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 Length = 213 Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix adjust. Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) ..... ..... (note that the tag "TI_1000008216887" is splitting in two lines). I am using SeqIO to parse this report. What i am doing is parsing the description field again to get all the tags. like .... .... my $desc = $hit->description; my @f = split('/',$desc); for(my $i = 0;$i < scalar @f;$i++){ print OUT "$f[$i]\t";} ..... ..... *I am getting the perfect parsed report but the field with TI_1000008216887 has a space **TI_100000821 6887 *. I would really appreciate if anyone can help me out. Thanks Shalabh Sharma From joshpk105 at gmail.com Thu May 13 10:42:28 2010 From: joshpk105 at gmail.com (Katz) Date: Thu, 13 May 2010 07:42:28 -0700 (PDT) Subject: [Bioperl-l] RemoteBlast Message-ID: <54674635-db43-413c-8c96-0d214f1b978d@l31g2000yqm.googlegroups.com> Is there anyway to differentiate between the three different ncbi blastn? Right now I'm using RemoteBlast as follows: Bio::Tools::Run::RemoteBlast->new(-prog => 'blastn', -data => 'nr', - expect => '1e-5', -readmethod => 'SearchIO'); then blasting my files. However, this is auto using megablastn and i need to use regular blastn. Thx, Josh From hlapp at drycafe.net Thu May 13 11:43:47 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 11:43:47 -0400 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> On May 13, 2010, at 9:49 AM, Chris Fields wrote: > Re: deletion of branches, I'm only really in support of deleting > feature branches that have been merged back to 'master' or another > branch (e.g. only removed using 'git branch -d foo'). I agree. > Older subversion release branches don't tend to fall into that > category, in that we had merged or cherry-picked changes from svn > trunk to them, not vice versa; they were never merged back to > trunk. Deletion in this case would be somewhat history-revising, > correct? I wouldn't call it history-revising. I also think it's OK to delete release branches that are no longer supported, iff we have a tag for the release itself. That's different from counting inactivity. A branch may lie dormant for a year or longer until someone has time to pick it back up again - I don't see the harm in keeping those around. > Saying that, we could adopt a workflow policy that allows deletion > of any merged branch. All this suggests coming up with a good > 'Contributing' document. That would be highly useful. I'll also voice a word of caution here though - I find it kind of ironic that the switch to git, which is supposed to make contribution *easier*, very often leads subsequently to complex commit/pull/push/branching workflows being instituted for projects that take pages and pages to document, a lot of time to ingest, and discipline to follow - it seems to be very easy and tempting to go overboard with this. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Thu May 13 12:01:05 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 11:01:05 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: On May 13, 2010, at 10:43 AM, Hilmar Lapp wrote: > On May 13, 2010, at 9:49 AM, Chris Fields wrote: >> Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. > > That would be highly useful. I'll also voice a word of caution here though - I find it kind of ironic that the switch to git, which is supposed to make contribution *easier*, very often leads subsequently to complex commit/pull/push/branching workflows being instituted for projects that take pages and pages to document, a lot of time to ingest, and discipline to follow - it seems to be very easy and tempting to go overboard with this. I'm happy to comply with whatever the policy is. If that policy is "everything trivial in master, non-trivial in topic/FOO, release manager will figure out everything else" that's fine with me. A branch cleanup would be nice. Or I'll just close my eyes. :) I'm embarrassed that I left unfinished business in branches in 2009. I'm fishing for a consensus on a contribution policy. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From heikki.lehvaslaiho at gmail.com Thu May 13 12:48:14 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 19:48:14 +0300 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: I second Hilmar. Let's try to keep this simple. While for most people just beginning to use git this discussion seems confusing and the structures complex, things really are pretty simple. I expect most of the branches to live only in developers copies of the repo. They are created when work starts on the new bug or a feature, merged to master when work is done, and removed immediately or soon after that. Most of the work is done in the master and only the release managers touch the stable and release branches. See Jay's flow diagram. Work flow for this is (while calling 'git status' all the time): git branch $new git checkout $new # work git commit git commit ... git checkout master git merge $new git push ... git branch -d $new -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 13 May 2010 18:43, Hilmar Lapp wrote: > > On May 13, 2010, at 9:49 AM, Chris Fields wrote: > > Re: deletion of branches, I'm only really in support of deleting feature >> branches that have been merged back to 'master' or another branch (e.g. only >> removed using 'git branch -d foo'). >> > > I agree. > > > Older subversion release branches don't tend to fall into that category, >> in that we had merged or cherry-picked changes from svn trunk to them, not >> vice versa; they were never merged back to trunk. Deletion in this case >> would be somewhat history-revising, correct? >> > > I wouldn't call it history-revising. I also think it's OK to delete release > branches that are no longer supported, iff we have a tag for the release > itself. > > That's different from counting inactivity. A branch may lie dormant for a > year or longer until someone has time to pick it back up again - I don't see > the harm in keeping those around. > > > Saying that, we could adopt a workflow policy that allows deletion of any >> merged branch. All this suggests coming up with a good 'Contributing' >> document. >> > > That would be highly useful. I'll also voice a word of caution here though > - I find it kind of ironic that the switch to git, which is supposed to make > contribution *easier*, very often leads subsequently to complex > commit/pull/push/branching workflows being instituted for projects that take > pages and pages to document, a lot of time to ingest, and discipline to > follow - it seems to be very easy and tempting to go overboard with this. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 13 17:41:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 16:41:35 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: On May 13, 2010, at 11:48 AM, Heikki Lehvaslaiho wrote: > I second Hilmar. Let's try to keep this simple. > > While for most people just beginning to use git this discussion seems > confusing and the structures complex, things really are pretty simple. > > I expect most of the branches to live only in developers copies of the repo. > They are created when work starts on the new bug or a feature, merged to > master when work is done, and removed immediately or soon after that. Most > of the work is done in the master and only the release managers touch the > stable and release branches. See Jay's flow diagram. Right, many branches will occur locally. And I'm not suggesting that we strictly follow a particular pattern; I would rather not enforce that upon devs who already have a productive pattern set. I think this would act more as a suggested method of development, something that has been demonstrated to work well for other large projects (and something I'll be following). What I would really like to promote is using branches for making code changes, even ones that are only a few commits or so (and even if they are only local ones not pushed to github). Branches are cheap. > Work flow for this is (while calling 'git status' all the time): > > git branch $new > git checkout $new > # work > git commit > git commit > ... > git checkout master > git merge $new > git push > ... > git branch -d $new > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia Yes, that's essentially the basic workflow, maybe with a preliminary 'git pull' to sync to the latest. chris > On 13 May 2010 18:43, Hilmar Lapp wrote: > >> >> On May 13, 2010, at 9:49 AM, Chris Fields wrote: >> >> Re: deletion of branches, I'm only really in support of deleting feature >>> branches that have been merged back to 'master' or another branch (e.g. only >>> removed using 'git branch -d foo'). >>> >> >> I agree. >> >> >> Older subversion release branches don't tend to fall into that category, >>> in that we had merged or cherry-picked changes from svn trunk to them, not >>> vice versa; they were never merged back to trunk. Deletion in this case >>> would be somewhat history-revising, correct? >>> >> >> I wouldn't call it history-revising. I also think it's OK to delete release >> branches that are no longer supported, iff we have a tag for the release >> itself. >> >> That's different from counting inactivity. A branch may lie dormant for a >> year or longer until someone has time to pick it back up again - I don't see >> the harm in keeping those around. >> >> >> Saying that, we could adopt a workflow policy that allows deletion of any >>> merged branch. All this suggests coming up with a good 'Contributing' >>> document. >>> >> >> That would be highly useful. I'll also voice a word of caution here though >> - I find it kind of ironic that the switch to git, which is supposed to make >> contribution *easier*, very often leads subsequently to complex >> commit/pull/push/branching workflows being instituted for projects that take >> pages and pages to document, a lot of time to ingest, and discipline to >> follow - it seems to be very easy and tempting to go overboard with this. >> >> -hilmar >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu May 13 17:56:11 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 14:56:11 -0700 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: <4BEC757B.5030407@cornell.edu> OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. Rob From jay at jays.net Thu May 13 18:00:21 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 17:00:21 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <4BEC757B.5030407@cornell.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: <7BA7535D-AE97-4827-8B86-91C24842BAED@jays.net> On May 13, 2010, at 4:56 PM, Robert Buels wrote: > OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. > > I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. master++ Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From rmb32 at cornell.edu Thu May 13 18:13:52 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 15:13:52 -0700 Subject: [Bioperl-l] move ancient branches to attic Message-ID: <4BEC79A0.5000505@cornell.edu> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. Note that there are still tags for all the old releases, so those won't be lost. Thoughts? Rob From jay at jays.net Thu May 13 18:22:30 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 17:22:30 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC79A0.5000505@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> Message-ID: On May 13, 2010, at 5:13 PM, Robert Buels wrote: > To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. > > Note that there are still tags for all the old releases, so those won't be lost. Sounds generous to me. proceed++ Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From hlapp at drycafe.net Thu May 13 18:46:00 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 18:46:00 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC79A0.5000505@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> Message-ID: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Why? What is the gain from deleting branches that you don't know whether they are dead or not? -hilmar On May 13, 2010, at 6:13 PM, Robert Buels wrote: > To clean up branches, I propose to deleting branches (merged or not) > whose head is older than Jan 1, 2006, and moving branches to attic/ > whose head is older than Jan 1, 2009. > > Note that there are still tags for all the old releases, so those > won't be lost. > > Thoughts? > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From rmb32 at cornell.edu Thu May 13 19:05:06 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 16:05:06 -0700 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <4BEC85A2.50401@cornell.edu> The gain is to avoid having useless things hanging around. Every time somebody has to read through a list of 50 branches to find the maybe 5 that are useful, it's time lost. In other word, it's the same gain that you get from cleaning off your desk, so that you can see where you put things. Rob Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether > they are dead or not? > > -hilmar > > On May 13, 2010, at 6:13 PM, Robert Buels wrote: > >> To clean up branches, I propose to deleting branches (merged or not) >> whose head is older than Jan 1, 2006, and moving branches to attic/ >> whose head is older than Jan 1, 2009. >> >> Note that there are still tags for all the old releases, so those >> won't be lost. >> >> Thoughts? >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 13 19:07:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 18:07:31 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <4BEC757B.5030407@cornell.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: 'master'. That's more in lone with other repos. chris On May 13, 2010, at 4:56 PM, Robert Buels wrote: > OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. > > I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu May 13 20:27:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:27:22 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC85A2.50401@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> Message-ID: <77C06787-B381-43AA-8F5A-74331866C495@illinois.edu> Let's go through and check which branches are specifically merged back to trunk and delete those first, then list the ones that aren't or we're unsure of. If needed we can move those to an 'attic', like Moose. chris On May 13, 2010, at 6:05 PM, Robert Buels wrote: > The gain is to avoid having useless things hanging around. Every time somebody has to read through a list of 50 branches to find the maybe 5 that are useful, it's time lost. > > In other word, it's the same gain that you get from cleaning off your desk, so that you can see where you put things. > > Rob > > > Hilmar Lapp wrote: >> Why? What is the gain from deleting branches that you don't know whether they are dead or not? >> -hilmar >> On May 13, 2010, at 6:13 PM, Robert Buels wrote: >>> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. >>> >>> Note that there are still tags for all the old releases, so those won't be lost. >>> >>> Thoughts? >>> >>> Rob >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu May 13 20:28:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:28:30 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: <6757E1DD-5712-4894-8EAF-52F5F902D348@illinois.edu> On May 13, 2010, at 9:38 AM, Jay Hannah wrote: > So, like this? > > Flow diagram: > http://biodoc.ist.unomaha.edu/~jhannah/tmp/branches.png > > master > (git and github default) Trivial changes committed directly here. > topic/bug_#### > One branch per non-trivial Bugzilla ticket > topic/jhannah_crazy_idea > Branches for unstable/unfinished work > stable > Release manager pulls from master to stable periodically (all tests are passing, etc.) > release-#-#-# > Pulled from stable, pushed to CPAN > attic/* > Any branch with no activity for 1 year > > I like it. Yes, something along those lines. >> Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? > > I'm fine with attic/ and just leaving stuff in there until 2050. Then we should probably delete them. :) > > My understanding is that by default commits that have no pointers to them (branches or tags or subsequent commits) are subject to cleanup/prune. I think this means that if someone, 10 years ago, committed 3 times to the branch "jhannah_crazy_idea" and that branch is deleted, then those 3 commits may be removed (gone forever) by git cleanup/prune. > > This is a feature or a crime against humanity depending on who you ask. It can be disabled in a normal repo, I don't know about github. I don't think this is disabled in github (e.g. one can still delete branches). Duke Leto suggested the only real way to prevent history revising commits would be to do a pre-commit hook, which is not supported right now in github. >> Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. > > As I collect clues I'll be brain dumping everything I think I know onto the wiki. This is a crazy busy week for me though. :( > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah No problem. chris From cjfields at illinois.edu Thu May 13 20:41:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:41:57 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> It would be nice to at least designate them as outdated in some respect, and organize them along those lines. chris On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether they are dead or not? > > -hilmar > > On May 13, 2010, at 6:13 PM, Robert Buels wrote: > >> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. >> >> Note that there are still tags for all the old releases, so those won't be lost. >> >> Thoughts? >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu May 13 20:55:01 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 20:55:01 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> Message-ID: On May 13, 2010, at 8:41 PM, Chris Fields wrote: > It would be nice to at least designate them as outdated in some > respect, and organize them along those lines. I agree. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Thu May 13 21:04:02 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 21:04:02 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC85A2.50401@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> Message-ID: <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> On May 13, 2010, at 7:05 PM, Robert Buels wrote: > The gain is to avoid having useless things hanging around. Every > time somebody has to read through a list of 50 branches to find the > maybe 5 that are useful, it's time lost. > > In other word, it's the same gain that you get from cleaning off > your desk, so that you can see where you put things. Hold on - that's not a good comparison is it? First off, this being git, the "main" repo is not your desk. You can have your desk and wipe it clean of all branches and tags that have ever existed, without affecting, or imposing this on, anyone else. Second, why would you *want* to look through all those branches? This being git, you create branches all the time and merge them back, on your own repo, right? Where in this workflow are you browsing through the 50 branches of the "main" repo all the time? Third, and maybe I'm just too old, but moving to git because branching and having your own clone exactly the way you want it is so easy, only to subsequently delete most of the branches on the "main" repo for primarily aesthetic reasons just doesn't make much sense to me, honestly. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From heikki.lehvaslaiho at gmail.com Fri May 14 06:41:22 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Fri, 14 May 2010 13:41:22 +0300 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: Yep. master. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 14 May 2010 02:07, Chris Fields wrote: > 'master'. That's more in lone with other repos. > > chris > > On May 13, 2010, at 4:56 PM, Robert Buels wrote: > > > OK then, decision time, which is the main devel branch, 'master' or > 'develop'? I need to merge in a few small bugfixes. > > > > I vote for 'master', since it's slightly simpler for new devs, with > releases being constructed in branches off of that. > > > > Rob > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From heikki.lehvaslaiho at gmail.com Fri May 14 06:45:50 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Fri, 14 May 2010 13:45:50 +0300 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> Message-ID: Rob, If you think is important, do a survay and create a nice wiki page explaing these braches to everyone. Then we can discuss if some of them are best deleted. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 14 May 2010 04:04, Hilmar Lapp wrote: > > On May 13, 2010, at 7:05 PM, Robert Buels wrote: > > The gain is to avoid having useless things hanging around. Every time >> somebody has to read through a list of 50 branches to find the maybe 5 that >> are useful, it's time lost. >> >> In other word, it's the same gain that you get from cleaning off your >> desk, so that you can see where you put things. >> > > > Hold on - that's not a good comparison is it? First off, this being git, > the "main" repo is not your desk. You can have your desk and wipe it clean > of all branches and tags that have ever existed, without affecting, or > imposing this on, anyone else. > > Second, why would you *want* to look through all those branches? This being > git, you create branches all the time and merge them back, on your own repo, > right? Where in this workflow are you browsing through the 50 branches of > the "main" repo all the time? > > Third, and maybe I'm just too old, but moving to git because branching and > having your own clone exactly the way you want it is so easy, only to > subsequently delete most of the branches on the "main" repo for primarily > aesthetic reasons just doesn't make much sense to me, honestly. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Fri May 14 09:32:04 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 14 May 2010 08:32:04 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether they are dead or not? If our branch list was clean they wouldn't dupe up when I go to merge in other people's contributions. You don't find large lists of probably dead things annoying? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah jhannah at cplreynoldslpt:~/src/bioperl-live$ git remote add vinanna git://github.com/vinanna/bioperl-live.gitjhannah at cplreynoldslpt:~/src/bioperl-live$ git fetch vinanna remote: Counting objects: 18, done. remote: Compressing objects: 100% (9/9), done. remote: Total 10 (delta 8), reused 0 (delta 0) Unpacking objects: 100% (10/10), done. >From git://github.com/vinanna/bioperl-live * [new branch] TRY_featureio_refactor -> vinanna/TRY_featureio_refactor * [new branch] TRY_gff_refactor -> vinanna/TRY_gff_refactor * [new branch] TRY_locatableseq_refactor -> vinanna/TRY_locatableseq_refactor * [new branch] anydbm-branch -> vinanna/anydbm-branch * [new branch] bioperl -> vinanna/bioperl * [new branch] bioperl-branch-1-5-1 -> vinanna/bioperl-branch-1-5-1 * [new branch] bioperl-live -> vinanna/bioperl-live * [new branch] branch-06 -> vinanna/branch-06 * [new branch] branch-07 -> vinanna/branch-07 * [new branch] branch-07-ensembl-120 -> vinanna/branch-07-ensembl-120 * [new branch] branch-1-0-0 -> vinanna/branch-1-0-0 * [new branch] branch-1-2 -> vinanna/branch-1-2 * [new branch] branch-1-2-collection -> vinanna/branch-1-2-collection * [new branch] branch-1-4 -> vinanna/branch-1-4 * [new branch] branch-1-5-2 -> vinanna/branch-1-5-2 * [new branch] branch-1-6 -> vinanna/branch-1-6 * [new branch] branch-ensembl-m1 -> vinanna/branch-ensembl-m1 * [new branch] branch-experimental -> vinanna/branch-experimental * [new branch] featann_rollback -> vinanna/featann_rollback * [new branch] internal-branch-pre-delete-06-tag -> vinanna/internal-branch-pre-delete-06-tag * [new branch] lightweight_feature_branch -> vinanna/lightweight_feature_branch * [new branch] master -> vinanna/master * [new branch] ontology-cache -> vinanna/ontology-cache * [new branch] release-0-04-bug -> vinanna/release-0-04-bug * [new branch] restriction-refactor -> vinanna/restriction-refactor * [new branch] stable-0-05 -> vinanna/stable-0-05 * [new branch] stable-0-05-new -> vinanna/stable-0-05-new * [new branch] steve_chervitz -> vinanna/steve_chervitz * [new branch] topic/bug_2515 -> vinanna/topic/bug_2515 jhannah at cplreynoldslpt:~/src/bioperl-live$ git branch -a * master remotes/origin/HEAD -> origin/master remotes/origin/TRY_featureio_refactor remotes/origin/TRY_gff_refactor remotes/origin/TRY_locatableseq_refactor remotes/origin/anydbm-branch remotes/origin/bioperl remotes/origin/bioperl-branch-1-5-1 remotes/origin/bioperl-live remotes/origin/branch-06 remotes/origin/branch-07 remotes/origin/branch-07-ensembl-120 remotes/origin/branch-1-0-0 remotes/origin/branch-1-2 remotes/origin/branch-1-2-collection remotes/origin/branch-1-4 remotes/origin/branch-1-5-2 remotes/origin/branch-1-6 remotes/origin/branch-ensembl-m1 remotes/origin/branch-experimental remotes/origin/featann_rollback remotes/origin/internal-branch-pre-delete-06-tag remotes/origin/jhannah remotes/origin/lightweight_feature_branch remotes/origin/master remotes/origin/ontology-cache remotes/origin/release-0-04-bug remotes/origin/restriction-refactor remotes/origin/stable-0-05 remotes/origin/stable-0-05-new remotes/origin/steve_chervitz remotes/origin/topic/bug_2515 remotes/origin/yapc10hackathon remotes/vinanna/TRY_featureio_refactor remotes/vinanna/TRY_gff_refactor remotes/vinanna/TRY_locatableseq_refactor remotes/vinanna/anydbm-branch remotes/vinanna/bioperl remotes/vinanna/bioperl-branch-1-5-1 remotes/vinanna/bioperl-live remotes/vinanna/branch-06 remotes/vinanna/branch-07 remotes/vinanna/branch-07-ensembl-120 remotes/vinanna/branch-1-0-0 remotes/vinanna/branch-1-2 remotes/vinanna/branch-1-2-collection remotes/vinanna/branch-1-4 remotes/vinanna/branch-1-5-2 remotes/vinanna/branch-1-6 remotes/vinanna/branch-ensembl-m1 remotes/vinanna/branch-experimental remotes/vinanna/featann_rollback remotes/vinanna/internal-branch-pre-delete-06-tag remotes/vinanna/lightweight_feature_branch remotes/vinanna/master remotes/vinanna/ontology-cache remotes/vinanna/release-0-04-bug remotes/vinanna/restriction-refactor remotes/vinanna/stable-0-05 remotes/vinanna/stable-0-05-new remotes/vinanna/steve_chervitz remotes/vinanna/topic/bug_2515 From cjfields at illinois.edu Fri May 14 09:47:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 08:47:05 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> Message-ID: <2309AD4D-9FEA-4463-A4FD-519F0FCA2639@illinois.edu> To me, this is more a problem with the way forks currently work in github, via automatically dup-ing all branches vs allowing a single branch ('master', for instance). In fairness, that makes sense if they're implementing this the way I think, in order to conserve space. There are other small issues on github that should be worked out, for instance the automatic addition of all collabs with pull requests, since these go to bioperl-guts now. At least, I got a dup email from the last pull request. Some fixes are supposedly being planned for group-like accounts, just don't know when they'll appear. But I think the overall benefits of github outweigh some of the bumps in the road we're seeing. chris On May 14, 2010, at 8:32 AM, Jay Hannah wrote: > On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: >> Why? What is the gain from deleting branches that you don't know whether they are dead or not? > > If our branch list was clean they wouldn't dupe up when I go to merge in other people's contributions. > > You don't find large lists of probably dead things annoying? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > > > jhannah at cplreynoldslpt:~/src/bioperl-live$ git remote add vinanna git://github.com/vinanna/bioperl-live.gitjhannah at cplreynoldslpt:~/src/bioperl-live$ git fetch vinanna > remote: Counting objects: 18, done. > remote: Compressing objects: 100% (9/9), done. > remote: Total 10 (delta 8), reused 0 (delta 0) > Unpacking objects: 100% (10/10), done. >> From git://github.com/vinanna/bioperl-live > * [new branch] TRY_featureio_refactor -> vinanna/TRY_featureio_refactor > * [new branch] TRY_gff_refactor -> vinanna/TRY_gff_refactor > * [new branch] TRY_locatableseq_refactor -> vinanna/TRY_locatableseq_refactor > * [new branch] anydbm-branch -> vinanna/anydbm-branch > * [new branch] bioperl -> vinanna/bioperl > * [new branch] bioperl-branch-1-5-1 -> vinanna/bioperl-branch-1-5-1 > * [new branch] bioperl-live -> vinanna/bioperl-live > * [new branch] branch-06 -> vinanna/branch-06 > * [new branch] branch-07 -> vinanna/branch-07 > * [new branch] branch-07-ensembl-120 -> vinanna/branch-07-ensembl-120 > * [new branch] branch-1-0-0 -> vinanna/branch-1-0-0 > * [new branch] branch-1-2 -> vinanna/branch-1-2 > * [new branch] branch-1-2-collection -> vinanna/branch-1-2-collection > * [new branch] branch-1-4 -> vinanna/branch-1-4 > * [new branch] branch-1-5-2 -> vinanna/branch-1-5-2 > * [new branch] branch-1-6 -> vinanna/branch-1-6 > * [new branch] branch-ensembl-m1 -> vinanna/branch-ensembl-m1 > * [new branch] branch-experimental -> vinanna/branch-experimental > * [new branch] featann_rollback -> vinanna/featann_rollback > * [new branch] internal-branch-pre-delete-06-tag -> vinanna/internal-branch-pre-delete-06-tag > * [new branch] lightweight_feature_branch -> vinanna/lightweight_feature_branch > * [new branch] master -> vinanna/master > * [new branch] ontology-cache -> vinanna/ontology-cache > * [new branch] release-0-04-bug -> vinanna/release-0-04-bug > * [new branch] restriction-refactor -> vinanna/restriction-refactor > * [new branch] stable-0-05 -> vinanna/stable-0-05 > * [new branch] stable-0-05-new -> vinanna/stable-0-05-new > * [new branch] steve_chervitz -> vinanna/steve_chervitz > * [new branch] topic/bug_2515 -> vinanna/topic/bug_2515 > jhannah at cplreynoldslpt:~/src/bioperl-live$ git branch -a > * master > remotes/origin/HEAD -> origin/master > remotes/origin/TRY_featureio_refactor > remotes/origin/TRY_gff_refactor > remotes/origin/TRY_locatableseq_refactor > remotes/origin/anydbm-branch > remotes/origin/bioperl > remotes/origin/bioperl-branch-1-5-1 > remotes/origin/bioperl-live > remotes/origin/branch-06 > remotes/origin/branch-07 > remotes/origin/branch-07-ensembl-120 > remotes/origin/branch-1-0-0 > remotes/origin/branch-1-2 > remotes/origin/branch-1-2-collection > remotes/origin/branch-1-4 > remotes/origin/branch-1-5-2 > remotes/origin/branch-1-6 > remotes/origin/branch-ensembl-m1 > remotes/origin/branch-experimental > remotes/origin/featann_rollback > remotes/origin/internal-branch-pre-delete-06-tag > remotes/origin/jhannah > remotes/origin/lightweight_feature_branch > remotes/origin/master > remotes/origin/ontology-cache > remotes/origin/release-0-04-bug > remotes/origin/restriction-refactor > remotes/origin/stable-0-05 > remotes/origin/stable-0-05-new > remotes/origin/steve_chervitz > remotes/origin/topic/bug_2515 > remotes/origin/yapc10hackathon > remotes/vinanna/TRY_featureio_refactor > remotes/vinanna/TRY_gff_refactor > remotes/vinanna/TRY_locatableseq_refactor > remotes/vinanna/anydbm-branch > remotes/vinanna/bioperl > remotes/vinanna/bioperl-branch-1-5-1 > remotes/vinanna/bioperl-live > remotes/vinanna/branch-06 > remotes/vinanna/branch-07 > remotes/vinanna/branch-07-ensembl-120 > remotes/vinanna/branch-1-0-0 > remotes/vinanna/branch-1-2 > remotes/vinanna/branch-1-2-collection > remotes/vinanna/branch-1-4 > remotes/vinanna/branch-1-5-2 > remotes/vinanna/branch-1-6 > remotes/vinanna/branch-ensembl-m1 > remotes/vinanna/branch-experimental > remotes/vinanna/featann_rollback > remotes/vinanna/internal-branch-pre-delete-06-tag > remotes/vinanna/lightweight_feature_branch > remotes/vinanna/master > remotes/vinanna/ontology-cache > remotes/vinanna/release-0-04-bug > remotes/vinanna/restriction-refactor > remotes/vinanna/stable-0-05 > remotes/vinanna/stable-0-05-new > remotes/vinanna/steve_chervitz > remotes/vinanna/topic/bug_2515 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Fri May 14 09:56:48 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 14 May 2010 09:56:48 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> Message-ID: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> On May 14, 2010, at 9:32 AM, Jay Hannah wrote: > You don't find large lists of probably dead things annoying? Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. As an analogy, Google Mail keeps all your dead email (email you delete). Forever. Not because they think most of what you delete you shouldn't have deleted, but because it costs so little, and can be so efficiently managed for the few things that you do decide to recover a year later that it's not worth for you as a user to spend any brain cycles on which emails you should physically delete and which you should only "archive". Likewise, I don't see the gain that outweighs the brain cycles and careful consideration that would have to go into deciding which branches to delete, which ones to move into an "attic", and which ones to keep around. If you don't want to see them, simply clone and wipe them away. Life can be so easy :-) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Fri May 14 10:20:22 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 14 May 2010 09:20:22 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> Message-ID: <0C1AE8D4-70F5-427E-9429-B59156587E19@jays.net> On May 14, 2010, at 8:56 AM, Hilmar Lapp wrote: > On May 14, 2010, at 9:32 AM, Jay Hannah wrote: >> You don't find large lists of probably dead things annoying? > > Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. > > As an analogy, Google Mail keeps all your dead email (email you delete). Forever. OK. So our policy is that our branch list is an ever-growing pile of probably-dead things that we all ignore. A couple of them might be alive and useful at any given moment in time, but only if whoever created them is still around and cares and happens to remember what the point was. Understood. Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Fri May 14 11:34:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 10:34:41 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> Message-ID: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> On May 14, 2010, at 8:56 AM, Hilmar Lapp wrote: > > On May 14, 2010, at 9:32 AM, Jay Hannah wrote: > >> You don't find large lists of probably dead things annoying? > > > Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. > > As an analogy, Google Mail keeps all your dead email (email you delete). Forever. Not because they think most of what you delete you shouldn't have deleted, but because it costs so little, and can be so efficiently managed for the few things that you do decide to recover a year later that it's not worth for you as a user to spend any brain cycles on which emails you should physically delete and which you should only "archive". > > Likewise, I don't see the gain that outweighs the brain cycles and careful consideration that would have to go into deciding which branches to delete, which ones to move into an "attic", and which ones to keep around. If you don't want to see them, simply clone and wipe them away. Life can be so easy :-) > > -hilmar I tend to fall in the middle here, in that it would be nice to clean out feature branches that have been merged back in and relegate all older branches to an attic. Moving branches is as easy as 'git branch -m foo attic/foo'. I'm not in favor of removing branches that haven't been merged back, unless they're deemed unnecessary by the core devs. re: removing feature branches, this is something we have talked about doing in the past on svn, but is a bit trickier at the moment as the git repo doesn't currently indicate if/when specific svn branches were merged to HEAD. We still have read-only access to our svn repo to determine that if needed. So far, though, I haven't seen much in the way of indicating what some regard as 'feature' (removable) vs 'attic' (old but retained). That discussion needs to happen on list. chris From hlapp at drycafe.net Fri May 14 12:56:54 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 14 May 2010 12:56:54 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> Message-ID: <69D4619C-F21E-4FAE-B56F-C2F3B323EFD6@drycafe.net> On May 14, 2010, at 11:34 AM, Chris Fields wrote: > it would be nice to clean out feature branches that have been merged > back in Agreed, if the case is clear. > and relegate all older branches to an attic. Moving branches is as > easy as 'git branch -m foo attic/foo'. That's easy enough too and doesn't lose anything, hence no need to spend time on making sure it might not be a mistake. > I'm not in favor of removing branches that haven't been merged > back, unless they're deemed unnecessary by the core devs. Agreed, except I would remove the conditional. I'd rather spend that time on coding ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From subodhs at iastate.edu Fri May 14 12:24:21 2010 From: subodhs at iastate.edu (Srivastava, Subodh K [AGRON]) Date: Fri, 14 May 2010 11:24:21 -0500 Subject: [Bioperl-l] running perl script Message-ID: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> hi, I am running a perl script and getting error like: Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. How to set the path for this? the other related scripts are working in same directory. I am running; perl, v5.8.8 built for x86_64-linux-thread-multi thank you subodh ************************************* G-302 Agronomy Hall Iowa State University Ames, IA -50010 From rmb32 at cornell.edu Fri May 14 14:38:10 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 11:38:10 -0700 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> Message-ID: <4BED9892.5070408@cornell.edu> At the PDX hackathon last night, I was talking about this problem with a git expert, and he gave me a little tutorial on how git thinks about and keep branches and tags. Each of these things is just a special case of a 'ref', which is just a reference to the end of some piece of the commit graph. If you run git ls-remote http://github.com/bioperl/bioperl-live.git you can see all the refs we currently have in our bioperl-live repo, which are all in either /refs/heads (which are our branches), or /refs/tags (our tags). Now, it turns out you can have arbitrary things in here in addition to heads and tags. I copied one of the old branches to /refs/archives/branch-ensembl-m1 to demonstrate this. Now, it doesn't show up in normal workflow listings, but it's not deleted. If somebody wanted to resurrect it, they could move or copy it into /refs/heads (where it would show up as as an active branch again). To copy a branch into archives/, git push origin origin/:refs/archives/ To *move* a branch into archives/ git push origin origin/:refs/archives/ \ :refs/heads/ The first part of that second part of that push has nothing on the left side of the colon, which pushes a 'null' to refs/heads/, which deletes it. You can have an arbitrary number of these kinds of commands in each push invocation. So, there's a good mechanism for archiving our old branches. Rob From pat.boutet at gmail.com Fri May 14 15:14:36 2010 From: pat.boutet at gmail.com (Patrick Boutet) Date: Fri, 14 May 2010 13:14:36 -0600 Subject: [Bioperl-l] running perl script In-Reply-To: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> Message-ID: On Fri, May 14, 2010 at 10:24 AM, Srivastava, Subodh K [AGRON] < subodhs at iastate.edu> wrote: > hi, > I am running a perl script and getting error like: > > Can't locate Bio/Perl.pm in @INC (@INC contains: > /home/subodhs/SHORE_map/SHOREmap_release_1.1 > /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl > /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl > /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at > /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. > > How to set the path for this? > the other related scripts are working in same directory. > > I am running; perl, v5.8.8 built for x86_64-linux-thread-multi > > thank you > subodh > ************************************* > G-302 > Agronomy Hall > Iowa State University > Ames, IA -50010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Now I'm still new at this but I'll try and be helpful, first where is bioperl installed? System wide or local to your home directory? Do you have root access? What type of shell are you using? Because it seems like you might have to set your shells PERL5LIB variable to check the directory where bioperl is installed. Patrick Boutet From cjfields at illinois.edu Fri May 14 15:23:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 14:23:31 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BED9892.5070408@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> Message-ID: On May 14, 2010, at 1:38 PM, Robert Buels wrote: > At the PDX hackathon last night, I was talking about this problem with a git expert, and he gave me a little tutorial on how git thinks about and keep branches and tags. > > Each of these things is just a special case of a 'ref', which is just a reference to the end of some piece of the commit graph. If you run > > git ls-remote http://github.com/bioperl/bioperl-live.git > > you can see all the refs we currently have in our bioperl-live repo, which are all in either /refs/heads (which are our branches), or /refs/tags (our tags). > > Now, it turns out you can have arbitrary things in here in addition to heads and tags. I copied one of the old branches to /refs/archives/branch-ensembl-m1 to demonstrate this. Now, it doesn't show up in normal workflow listings, but it's not deleted. If somebody wanted to resurrect it, they could move or copy it into /refs/heads (where it would show up as as an active branch again). > > To copy a branch into archives/, > > git push origin origin/:refs/archives/ > > To *move* a branch into archives/ > > git push origin origin/:refs/archives/ \ > :refs/heads/ > > The first part of that second part of that push has nothing on the left side of the colon, which pushes a 'null' to refs/heads/, which deletes it. You can have an arbitrary number of these kinds of commands in each push invocation. > > So, there's a good mechanism for archiving our old branches. > > Rob That's a nice alternative to an attic, and less visible. On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. chris From rmb32 at cornell.edu Fri May 14 18:56:49 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 15:56:49 -0700 Subject: [Bioperl-l] BioPerl for indexing quality score files In-Reply-To: References: Message-ID: <4BEDD531.8050502@cornell.edu> Gregory Jordan wrote: > Ok, I need to shame myself with a huge "RTFM" for this one -- We still like you, Greg. Come hang out in #bioperl, where we can make fun of you properly. ;-) Rob From rmb32 at cornell.edu Fri May 14 19:01:50 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 16:01:50 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> Message-ID: <4BEDD65E.9070702@cornell.edu> Chris Fields wrote: > On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. OK, here are all our current branches, I will go through them in order of last-modified date. 1998-12-11 bioperl 1999-02-19 release-0-04-bug 1999-04-13 bioperl-live 1999-04-13 stable-0-05 2000-01-27 branch-ensembl-m1 2000-02-07 internal-branch-pre-delete-06-tag 2000-03-22 stable-0-05-new 2001-02-19 branch-06 2001-11-14 branch-07-ensembl-120 2001-12-28 steve_chervitz 2002-01-16 branch-07 2002-10-22 branch-1-0-0 2003-07-07 branch-1-2-collection 2003-10-13 branch-1-2 2004-10-20 ontology-cache 2005-04-14 branch-1-4 2006-01-11 bioperl-branch-1-5-1 2006-08-14 branch-experimental 2007-02-14 branch-1-5-2 2007-08-28 featann_rollback 2007-11-07 lightweight_feature_branch Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. 2009-06-17 restriction-refactor Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f 2009-07-16 topic/bug_2515 proposal: keep, jhannah "working" ;-) 2009-08-13 TRY_gff_refactor proposal: delete, git claims it is merged 2009-08-13 TRY_locatableseq_refactor proposal: delete, git claims it is merged 2009-09-29 branch-1-6 keep, 1.6 maint branch i think. 2009-10-14 anydbm-branch keep, MAJ working. MAJ, maybe you should move this to topic/ ? 2010-01-31 TRY_featureio_refactor keep, but looks dead. cjfields, maybe you want to delete it? 2010-05-12 topic/bug_3077 delete, git claims it is merged. Please review, and I'll do the work if people agree. Rob From jason at bioperl.org Fri May 14 19:54:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 14 May 2010 16:54:30 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDD65E.9070702@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> Message-ID: <4BEDE2B6.3010307@bioperl.org> lightweight_feature_branch was my test built with a feature type that is based on arrays instead of hashes got 25+% speedup I believe - have to go back to the archives to see what I claimed was speedup... =) I think that Bio::SeqFeature::Slim might be at least one speedup by Lincoln for Gbrowse that addresses some of the speed problem, though I think it still isn't array-based for data storage. -j Robert Buels wrote, On 5/14/10 4:01 PM: > Chris Fields wrote: >> On a related note, going through, it appears the git conversion >> didn't track merges back to trunk. For instance, I know the >> featann_rollback was merged to trunk but it's not showing up. I know >> svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came >> into play), so it may be hard to actually find true merges w/o that. > > OK, here are all our current branches, I will go through them in order > of last-modified date. > > 1998-12-11 bioperl > 1999-02-19 release-0-04-bug > 1999-04-13 bioperl-live > 1999-04-13 stable-0-05 > 2000-01-27 branch-ensembl-m1 > 2000-02-07 internal-branch-pre-delete-06-tag > 2000-03-22 stable-0-05-new > 2001-02-19 branch-06 > 2001-11-14 branch-07-ensembl-120 > 2001-12-28 steve_chervitz > 2002-01-16 branch-07 > 2002-10-22 branch-1-0-0 > 2003-07-07 branch-1-2-collection > 2003-10-13 branch-1-2 > 2004-10-20 ontology-cache > 2005-04-14 branch-1-4 > 2006-01-11 bioperl-branch-1-5-1 > 2006-08-14 branch-experimental > 2007-02-14 branch-1-5-2 > 2007-08-28 featann_rollback > 2007-11-07 lightweight_feature_branch > > Proposal: move the above to refs/archive and not worry any further > about them. Maybe we can throw them out in 2020. > > 2009-06-17 restriction-refactor > > Proposal: delete, looks like it was merged in > a2cb40e6c9c7da4f776dbb72a0266f54320fa37f > > 2009-07-16 topic/bug_2515 > proposal: keep, jhannah "working" ;-) > > 2009-08-13 TRY_gff_refactor > proposal: delete, git claims it is merged > > 2009-08-13 TRY_locatableseq_refactor > proposal: delete, git claims it is merged > > 2009-09-29 branch-1-6 > keep, 1.6 maint branch i think. > > 2009-10-14 anydbm-branch > keep, MAJ working. MAJ, maybe you should move this to topic/ ? > > 2010-01-31 TRY_featureio_refactor > keep, but looks dead. cjfields, maybe you want to delete it? > > 2010-05-12 topic/bug_3077 > delete, git claims it is merged. > > Please review, and I'll do the work if people agree. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri May 14 23:41:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 22:41:18 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDD65E.9070702@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> Message-ID: <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> On May 14, 2010, at 6:01 PM, Robert Buels wrote: > Chris Fields wrote: >> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. > > OK, here are all our current branches, I will go through them in order of last-modified date. > > 1998-12-11 bioperl > 1999-02-19 release-0-04-bug > 1999-04-13 bioperl-live > 1999-04-13 stable-0-05 > 2000-01-27 branch-ensembl-m1 > 2000-02-07 internal-branch-pre-delete-06-tag > 2000-03-22 stable-0-05-new > 2001-02-19 branch-06 > 2001-11-14 branch-07-ensembl-120 > 2001-12-28 steve_chervitz > 2002-01-16 branch-07 > 2002-10-22 branch-1-0-0 > 2003-07-07 branch-1-2-collection > 2003-10-13 branch-1-2 > 2004-10-20 ontology-cache > 2005-04-14 branch-1-4 > 2006-01-11 bioperl-branch-1-5-1 > 2006-08-14 branch-experimental > 2007-02-14 branch-1-5-2 > 2007-08-28 featann_rollback > 2007-11-07 lightweight_feature_branch > > Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. Just as long as we know they are there. Rob, can you document the archive set up on the wiki so we don't forget it? I deleted the featann_rollback branch. That was a feature branch (no pun intended) to rollback overloading and a host of other changes introduced to bioperl just before the 1.5 release. It was merged a few years ago in svn. > 2009-06-17 restriction-refactor > > Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f This may have been Mark's refactoring, so yes, delete. > 2009-08-13 TRY_gff_refactor > proposal: delete, git claims it is merged > > 2009-08-13 TRY_locatableseq_refactor > proposal: delete, git claims it is merged I deleted these. The primary goal of TRY_gff_refactor was to work in GFF3 work, but that may rely on FeatureIO so will have to be done in stages. At some point, if we do a larger scale refactoring of GFF for GFF3 compat we can make another branch. TRY_locatableseq_refactor will be obsoleted once GSoC starts. > 2009-09-29 branch-1-6 > keep, 1.6 maint branch i think. Yes. I will probably work on another set of merges from to 1.6 soon to bring it up to speed, maybe for one last 1.6 release. > 2009-10-14 anydbm-branch > keep, MAJ working. MAJ, maybe you should move this to topic/ ? > > 2010-01-31 TRY_featureio_refactor > keep, but looks dead. cjfields, maybe you want to delete it? Yes. I've deleted this, as FeatureIO is on it's own. > 2010-05-12 topic/bug_3077 > delete, git claims it is merged. That's already deleted. Maybe needs to be pruned locally? > Please review, and I'll do the work if people agree. > > Rob Good start! chris From cjfields at illinois.edu Fri May 14 23:45:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 22:45:07 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDE2B6.3010307@bioperl.org> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <4BEDE2B6.3010307@bioperl.org> Message-ID: <34DFCB4E-2048-4A62-AE9C-06CBF900D38A@illinois.edu> This was moved into bioperl-dev at some point: http://github.com/bioperl/bioperl-dev/tree/master/Bio/SeqFeature/ Might be obsolete as well. chris On May 14, 2010, at 6:54 PM, Jason Stajich wrote: > lightweight_feature_branch was my test built with a feature type that is based on arrays instead of hashes got 25+% speedup I believe - have to go back to the archives to see what I claimed was speedup... =) > > I think that Bio::SeqFeature::Slim might be at least one speedup by Lincoln for Gbrowse that addresses some of the speed problem, though I think it still isn't array-based for data storage. > > -j > > Robert Buels wrote, On 5/14/10 4:01 PM: >> Chris Fields wrote: >>> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. >> >> OK, here are all our current branches, I will go through them in order of last-modified date. >> >> 1998-12-11 bioperl >> 1999-02-19 release-0-04-bug >> 1999-04-13 bioperl-live >> 1999-04-13 stable-0-05 >> 2000-01-27 branch-ensembl-m1 >> 2000-02-07 internal-branch-pre-delete-06-tag >> 2000-03-22 stable-0-05-new >> 2001-02-19 branch-06 >> 2001-11-14 branch-07-ensembl-120 >> 2001-12-28 steve_chervitz >> 2002-01-16 branch-07 >> 2002-10-22 branch-1-0-0 >> 2003-07-07 branch-1-2-collection >> 2003-10-13 branch-1-2 >> 2004-10-20 ontology-cache >> 2005-04-14 branch-1-4 >> 2006-01-11 bioperl-branch-1-5-1 >> 2006-08-14 branch-experimental >> 2007-02-14 branch-1-5-2 >> 2007-08-28 featann_rollback >> 2007-11-07 lightweight_feature_branch >> >> Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. >> >> 2009-06-17 restriction-refactor >> >> Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f >> >> 2009-07-16 topic/bug_2515 >> proposal: keep, jhannah "working" ;-) >> >> 2009-08-13 TRY_gff_refactor >> proposal: delete, git claims it is merged >> >> 2009-08-13 TRY_locatableseq_refactor >> proposal: delete, git claims it is merged >> >> 2009-09-29 branch-1-6 >> keep, 1.6 maint branch i think. >> >> 2009-10-14 anydbm-branch >> keep, MAJ working. MAJ, maybe you should move this to topic/ ? >> >> 2010-01-31 TRY_featureio_refactor >> keep, but looks dead. cjfields, maybe you want to delete it? >> >> 2010-05-12 topic/bug_3077 >> delete, git claims it is merged. >> >> Please review, and I'll do the work if people agree. >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Sat May 15 10:27:48 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 09:27:48 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) Message-ID: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> I wrote some tests and merged and deleted branch topic/bug_2515. Bio::SeqIO::gbxml is now in master. Thanks to Ryan Golhar for the contribution! Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah bioperl-live$ perl -I. t/SeqIO/gbxml.t 1..14 ok 1 - use Bio::SeqIO::gbxml; ok 2 - The object isa Bio::SeqIO ok 3 - molecule ok 4 - alphabet ok 5 - primary_id ok 6 - display_id ok 7 - version ok 8 - is_circular ok 9 - description ok 10 - sequence ok 11 - classification ok 12 - feat - clone_lib ok 13 - feat - db_xref ok 14 - feat - lab_host From jay at jays.net Sat May 15 10:57:54 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 09:57:54 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> Message-ID: On May 15, 2010, at 9:34 AM, Chris Fields wrote: > Can you add something to the Changes file for this? You can make a new section for bug fixes or new features at the top, and we can worry about versions later. > > I'll add in the recent bug fix I made as well. Pushed. Feel free to discard any of that you don't like. HTH, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Sat May 15 11:46:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 May 2010 10:46:16 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> Message-ID: <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> Thanks Jay. I'll add a bit in myself for bug 3077. Not sure if we'll pursue another point release yet, but it would be nice to get changes out prior to any major structural reorganization. chris On May 15, 2010, at 9:57 AM, Jay Hannah wrote: > On May 15, 2010, at 9:34 AM, Chris Fields wrote: >> Can you add something to the Changes file for this? You can make a new section for bug fixes or new features at the top, and we can worry about versions later. >> >> I'll add in the recent bug fix I made as well. > > Pushed. Feel free to discard any of that you don't like. HTH, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > From jay at jays.net Sat May 15 14:08:35 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 13:08:35 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> Message-ID: On May 15, 2010, at 10:46 AM, Chris Fields wrote: > Thanks Jay. I'll add a bit in myself for bug 3077. Not sure if we'll pursue another point release yet, but it would be nice to get changes out prior to any major structural reorganization. Is there a list whose completion will mark the push of 1.6.2 to CPAN? The Changes file says this now: Bugs to be addressed: http://bugzilla.open-bio.org specific bugs intended for the next CPAN release series highlighted in BUGS But I don't understand what 'highlighted in BUGS' means. I also don't know what a 'point release' is. :) Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From David.Messina at sbc.su.se Sat May 15 15:34:58 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 May 2010 21:34:58 +0200 Subject: [Bioperl-l] parsing blast report with long description In-Reply-To: References: Message-ID: Shalabh, Could you please file a bug report on this at bugzilla.open-bio.org? Please include a description (pasting this email will do) and most importantly a test script and sample blast output file which reproduces the problem. We will need those in order to be able to diagnose and fix the problem. Thanks! Dave On May 13, 2010, at 5:07 PM, shalabh sharma wrote: > Hi All, > I need some help in parsing blast output. > I have a inhouse database that contain sequences with really long > description. > >> SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open > Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - > 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV > > So my blast report looks like this: > > ..... > ..... >> SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 > 6887/Open Ocean/Galapagos Islands/134 miles NE of > Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 > m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > Length = 213 > > Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix > adjust. > Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) > ..... > ..... > > (note that the tag "TI_1000008216887" is splitting in two lines). > > I am using SeqIO to parse this report. What i am doing is parsing the > description field again to get all the tags. like > .... > .... > my $desc = $hit->description; > my @f = split('/',$desc); > for(my $i = 0;$i < scalar > @f;$i++){ print OUT "$f[$i]\t";} > ..... > ..... > > > *I am getting the perfect parsed report but the field with TI_1000008216887 > has a space **TI_100000821 6887 *. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun May 16 11:14:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 10:14:25 -0500 Subject: [Bioperl-l] GenomeeTools Message-ID: Anyone used GenomeTools? I'm thinking of setting up some C bindings to it. It has a C-based GFF3 parser, among other goodies. http://genometools.org/index.html chris From cjfields at illinois.edu Sun May 16 12:16:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 11:16:11 -0500 Subject: [Bioperl-l] Bio-FeatureIO Message-ID: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> All, Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. chris From jay at jays.net Sun May 16 13:32:57 2010 From: jay at jays.net (Jay Hannah) Date: Sun, 16 May 2010 12:32:57 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 11:16 AM, Chris Fields wrote: > Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. I'm curious about how this works in terms of git storage. Does this mean that the separate Bio-FeatureIO repo will have the entire history of BioPerl inside it? (Making git clones of Bio-FeatureIO 189MB?) In the recent past I have attempted pulling certain files across git repos before, and ended up with the full history of repo1 inside repo2. I'm unclear if this is just how life is, or if I did it wrong. You could, of course, always just cp text files in, but then you lose the history of those files. Is there some way to get all the history of a handful of files from massive repo1 into tiny repo2 without making repo1 massive? I don't know if any of these considerations are important for the eventual de-monolithification of BioPerl, I was just generally curious. git does that to me. :) Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Sun May 16 14:18:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 13:18:24 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 12:32 PM, Jay Hannah wrote: > On May 16, 2010, at 11:16 AM, Chris Fields wrote: >> Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. > > I'm curious about how this works in terms of git storage. > > Does this mean that the separate Bio-FeatureIO repo will have the entire history of BioPerl inside it? (Making git clones of Bio-FeatureIO 189MB?) > > In the recent past I have attempted pulling certain files across git repos before, and ended up with the full history of repo1 inside repo2. I'm unclear if this is just how life is, or if I did it wrong. > > You could, of course, always just cp text files in, but then you lose the history of those files. > > Is there some way to get all the history of a handful of files from massive repo1 into tiny repo2 without making repo1 massive? > > I don't know if any of these considerations are important for the eventual de-monolithification of BioPerl, I was just generally curious. git does that to me. :) > > Thanks, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I'm just planning on having something to the effect of 'Bio-FeatureIO is a set of modules developed by author X that once was part of bioperl-live, but was removed at point XYZ to significantly refactor the code,' then point back to bioperl-live if anyone is interested in software archaeology. Not sure we would need to go beyond that. chris From jay at jays.net Sun May 16 14:47:42 2010 From: jay at jays.net (Jay Hannah) Date: Sun, 16 May 2010 13:47:42 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 1:18 PM, Chris Fields wrote: > I'm just planning on having something to the effect of 'Bio-FeatureIO is a set of modules developed by author X that once was part of bioperl-live, but was removed at point XYZ to significantly refactor the code,' then point back to bioperl-live if anyone is interested in software archaeology. Not sure we would need to go beyond that. Gotcha. That certainly solves the problem. :) So maybe in 2020 we'll be pushing 30 independent github repos to PAUSE all citing the bioperl-live repo for historical digging prior to their emancipation. To jhannah in the year 2020: You are NOT too old for dirt bikes. Keep riding! :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From fs5 at sanger.ac.uk Mon May 17 04:38:18 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 17 May 2010 09:38:18 +0100 Subject: [Bioperl-l] parsing blast report with long description In-Reply-To: References: Message-ID: <1274085498.5288.30.camel@deskpro15336.dynamic.sanger.ac.uk> I think you should try to avoid those long IDs anyway, especially because you have spaces in there too and this may cause problems further down the line as many programs will use a pattern like />(\S+)/ as the identifier. I would build a small database for your files and use unique database identifiers in your FASTA files. That will make it easier in the future to collect, for example, all sequences from a certain region etc. If you want to avoid that you could have two file: one FASTA files using numbers as IDs and a file where you map those numbers to sample descriptions, i.e. a simple flat-file database. Frank On Thu, 2010-05-13 at 11:07 -0400, shalabh sharma wrote: > Hi All, > I need some help in parsing blast output. > I have a inhouse database that contain sequences with really long > description. > > >SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open > Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - > 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV > > So my blast report looks like this: > > ..... > ..... > >SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 > 6887/Open Ocean/Galapagos Islands/134 miles NE of > Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 > m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > Length = 213 > > Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix > adjust. > Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) > ..... > ..... > > (note that the tag "TI_1000008216887" is splitting in two lines). > > I am using SeqIO to parse this report. What i am doing is parsing the > description field again to get all the tags. like > .... > .... > my $desc = $hit->description; > my @f = split('/',$desc); > for(my $i = 0;$i < scalar > @f;$i++){ print OUT "$f[$i]\t";} > ..... > ..... > > > *I am getting the perfect parsed report but the field with TI_1000008216887 > has a space **TI_100000821 6887 *. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From fs5 at sanger.ac.uk Mon May 17 04:41:51 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 17 May 2010 09:41:51 +0100 Subject: [Bioperl-l] running perl script In-Reply-To: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> Message-ID: <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> why are you requiring "Bio::Perl"? You would normally use somethink specific in the BioPerl bundle, like Bio::Seq or whatever. Can you show some of your script? Frank On Fri, 2010-05-14 at 11:24 -0500, Srivastava, Subodh K [AGRON] wrote: > hi, > I am running a perl script and getting error like: > > Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. > > How to set the path for this? > the other related scripts are working in same directory. > > I am running; perl, v5.8.8 built for x86_64-linux-thread-multi > > thank you > subodh > ************************************* > G-302 > Agronomy Hall > Iowa State University > Ames, IA -50010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Mon May 17 08:26:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 07:26:20 -0500 Subject: [Bioperl-l] running perl script In-Reply-To: <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <63D0BEDA-27F7-48AB-ABE8-1F39B09B349A@illinois.edu> Frank, Bio::Perl is the generic user module for very simple tasks. See here: http://github.com/bioperl/bioperl-live/blob/master/Bio/Perl.pm Subodh, you need to make sure the modules are in your perl library path. See the following link, under 'INSTALLING BIOPERL IN A PERSONAL MODULE AREA': http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix chris On May 17, 2010, at 3:41 AM, Frank Schwach wrote: > why are you requiring "Bio::Perl"? You would normally use somethink > specific in the BioPerl bundle, like Bio::Seq or whatever. Can you show > some of your script? > Frank > > > On Fri, 2010-05-14 at 11:24 -0500, Srivastava, Subodh K [AGRON] wrote: >> hi, >> I am running a perl script and getting error like: >> >> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. >> >> How to set the path for this? >> the other related scripts are working in same directory. >> >> I am running; perl, v5.8.8 built for x86_64-linux-thread-multi >> >> thank you >> subodh >> ************************************* >> G-302 >> Agronomy Hall >> Iowa State University >> Ames, IA -50010 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ross at cuhk.edu.hk Mon May 17 08:42:35 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 17 May 2010 20:42:35 +0800 Subject: [Bioperl-l] extracting genbank content Message-ID: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Dear all, When there are more than one genbank records in a file, except by splitting the file into separate records, what can I do to transverse the records? $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); $seqobj=$obj->next_seq(); Do I just use another $obj->next_seq() so it will point to another record? Thanks for your advice. From amackey at virginia.edu Mon May 17 09:51:31 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Mon, 17 May 2010 09:51:31 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: On Thu, May 13, 2010 at 2:20 AM, Heikki Lehvaslaiho < heikki.lehvaslaiho at gmail.com> wrote: > > As of getting values outseide the defined region, that is a feature rather > than a bug. The idea was to be able to ask what would the new coordinate be > if the feature extended beyond the known limits. The is the capability of > Bio::Coordinate::ExtrapolatingPair that is used here. That class also has a > method strict that can be used to prevent extrapolating, but the code to > access that has not been written into GeneMapper. I'll see if I can get it > to work. > > I had this same thought/expectation, but that in fact is not what's going on. There is no place in the GeneMapper code where the CDS end coordinate is being used, only the begin coordinate. The implicit assumption is that the CDS ends at the last exon. >From the perspective of the translate/revtranslate methods, an extrapolating pair does not make sense (at least to me) -- just as a CDS coordinate is undefined within an intron, so too would I expect a CDS coordinate to be undefined in an UTR or intragenic region. Alternatively, it would be nice (in general) to be able to check whether the provided mapping is an extrapolation or not. -Aaron From David.Messina at sbc.su.se Mon May 17 09:56:35 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 17 May 2010 15:56:35 +0200 Subject: [Bioperl-l] extracting genbank content In-Reply-To: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: Hi Ross, > Do I just use another $obj->next_seq() so it will point to another record? Yes. The common approach is to use a while loop: my $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); while(my $seqobj = $obj->next_seq) { # do stuff with $seqobj } For more details, see the SeqIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SeqIO Dave From cjfields at illinois.edu Mon May 17 12:36:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 11:36:37 -0500 Subject: [Bioperl-l] extracting genbank content In-Reply-To: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: <9952EA98-248E-41B8-9816-A3A01EC6ADFE@illinois.edu> Depends on what you need to do. If you are just interested in pulling out certain bits of data from each record, using SeqIO is a good option. But if you want to access the records as a flat database (not iteration, but indexed for fast access), use Bio::Index::GenBank or Bio::DB::Flat to make a simple flat file database and access them by ID. chris On May 17, 2010, at 7:42 AM, Ross KK Leung wrote: > Dear all, > > > > When there are more than one genbank records in a file, except by splitting > the file into separate records, what can I do to transverse the records? > > > > $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); > > > $seqobj=$obj->next_seq(); > > > > Do I just use another $obj->next_seq() so it will point to another record? > > > > Thanks for your advice. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Mon May 17 12:50:21 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 May 2010 09:50:21 -0700 Subject: [Bioperl-l] GenomeeTools In-Reply-To: References: Message-ID: <4BF173CD.8020600@cornell.edu> I haven't used GenomeTools but I've used GenomeThreader, one of Gordon's other tools. Rob Chris Fields wrote: > Anyone used GenomeTools? I'm thinking of setting up some C bindings to it. It has a C-based GFF3 parser, among other goodies. > > http://genometools.org/index.html > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Mon May 17 20:15:13 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 May 2010 17:15:13 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> Message-ID: <4BF1DC11.6030402@cornell.edu> OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches Rob Chris Fields wrote: > On May 14, 2010, at 6:01 PM, Robert Buels wrote: > >> Chris Fields wrote: >>> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. >> OK, here are all our current branches, I will go through them in order of last-modified date. >> >> 1998-12-11 bioperl >> 1999-02-19 release-0-04-bug >> 1999-04-13 bioperl-live >> 1999-04-13 stable-0-05 >> 2000-01-27 branch-ensembl-m1 >> 2000-02-07 internal-branch-pre-delete-06-tag >> 2000-03-22 stable-0-05-new >> 2001-02-19 branch-06 >> 2001-11-14 branch-07-ensembl-120 >> 2001-12-28 steve_chervitz >> 2002-01-16 branch-07 >> 2002-10-22 branch-1-0-0 >> 2003-07-07 branch-1-2-collection >> 2003-10-13 branch-1-2 >> 2004-10-20 ontology-cache >> 2005-04-14 branch-1-4 >> 2006-01-11 bioperl-branch-1-5-1 >> 2006-08-14 branch-experimental >> 2007-02-14 branch-1-5-2 >> 2007-08-28 featann_rollback >> 2007-11-07 lightweight_feature_branch >> >> Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. > > Just as long as we know they are there. Rob, can you document the archive set up on the wiki so we don't forget it? > > I deleted the featann_rollback branch. That was a feature branch (no pun intended) to rollback overloading and a host of other changes introduced to bioperl just before the 1.5 release. It was merged a few years ago in svn. > >> 2009-06-17 restriction-refactor >> >> Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f > > This may have been Mark's refactoring, so yes, delete. > >> 2009-08-13 TRY_gff_refactor >> proposal: delete, git claims it is merged >> >> 2009-08-13 TRY_locatableseq_refactor >> proposal: delete, git claims it is merged > > I deleted these. The primary goal of TRY_gff_refactor was to work in GFF3 work, but that may rely on FeatureIO so will have to be done in stages. At some point, if we do a larger scale refactoring of GFF for GFF3 compat we can make another branch. TRY_locatableseq_refactor will be obsoleted once GSoC starts. > >> 2009-09-29 branch-1-6 >> keep, 1.6 maint branch i think. > > Yes. I will probably work on another set of merges from to 1.6 soon to bring it up to speed, maybe for one last 1.6 release. > >> 2009-10-14 anydbm-branch >> keep, MAJ working. MAJ, maybe you should move this to topic/ ? >> >> 2010-01-31 TRY_featureio_refactor >> keep, but looks dead. cjfields, maybe you want to delete it? > > Yes. I've deleted this, as FeatureIO is on it's own. > >> 2010-05-12 topic/bug_3077 >> delete, git claims it is merged. > > That's already deleted. Maybe needs to be pruned locally? > >> Please review, and I'll do the work if people agree. >> >> Rob > > Good start! > > chris > > From jay at jays.net Mon May 17 20:35:33 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 17 May 2010 19:35:33 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BF1DC11.6030402@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> Message-ID: <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> On May 17, 2010, at 7:15 PM, Robert Buels wrote: > OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches Thank you!! git pull --prune and suddenly I feel clean again! :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From amackey at virginia.edu Mon May 17 20:42:17 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Mon, 17 May 2010 20:42:17 -0400 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: <20100518001029.CD8644229D@smtp1.rs.github.com> References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: I probably missed some prior discussion of this, but any chance that the new commit messages can actually include the (unified, possibly truncated-for-length) diff of the changes? My own 2 cents is that community-wide visual skims of the diffs provide a valuable spot-check for typo's and other think-o's. Plus it gives me an indication of how major the change was. A corollary -- might there be an RSS feed by which I could subscribe to such diffs, rather than get emails about them? Since the emails are sent from "noreply", I already have to step out of the normal email flow to respond to a diff, might as well go whole hog and remove them from my email consciousness entirely, and place them with the other various information streams in my RSS reader. Thanks, -Aaron On Mon, May 17, 2010 at 8:10 PM, wrote: > Branch: refs/archives/heads/branch-1-0-0 > Home: http://github.com/bioperl/bioperl-live > > Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 > > http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 > Author: sac > Date: 2002-10-22 (Tue, 22 Oct 2002) > > Changed paths: > M Bio/SearchIO/Writer/HitTableWriter.pm > > Log Message: > ----------- > Added frame to the column map. > > svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > From jay at jays.net Mon May 17 21:10:56 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 17 May 2010 20:10:56 -0500 Subject: [Bioperl-l] 319a6e: Added frame to the column map. In-Reply-To: References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > I probably missed some prior discussion of this, but any chance that the new > commit messages can actually include the (unified, possibly > truncated-for-length) diff of the changes? I'm 5 years behind the cool-kids curve on this stuff. :) I just discovered SVN::Notify for $work[0]. By default it kicks out really pretty color HTML diffs of every change. I assume there's an equivalent for git? You could always click to github. It's color HTML diffs are very pretty. That commit for example: http://github.com/bioperl/bioperl-live/commit/319a6e Plus all the other github shiny -- comment specific lines of the commit, or the commit itself, etc. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Mon May 17 21:35:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 20:35:21 -0500 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> Aaron, We can do either, though setting up diffs will take a bit more work (will have to set up a post-receive URL to a CGI script to process this). RSS is quite a bit easier: http://github.com/bioperl/bioperl-live/commits/master.atom Replace 'bioperl-live' with any of the other repos for repo-specific RSS commits. The links go to the commits where you can also make in-line notes/comments by clicking in the diff code, or simple comments at the bottom. Those comments are then passed on to bioperl-guts-l for everyone to see. Example here: http://github.com/bioperl/bioperl-live/commit/c86c048c96786f8517ae1ad1fc5e5823eecf52c3 and the relevant bioperl-guts-l posts: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031259.html http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031260.html chris On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > I probably missed some prior discussion of this, but any chance that the new > commit messages can actually include the (unified, possibly > truncated-for-length) diff of the changes? > > My own 2 cents is that community-wide visual skims of the diffs provide a > valuable spot-check for typo's and other think-o's. Plus it gives me an > indication of how major the change was. > > A corollary -- might there be an RSS feed by which I could subscribe to such > diffs, rather than get emails about them? Since the emails are sent from > "noreply", I already have to step out of the normal email flow to respond to > a diff, might as well go whole hog and remove them from my email > consciousness entirely, and place them with the other various information > streams in my RSS reader. > > Thanks, > > -Aaron > > On Mon, May 17, 2010 at 8:10 PM, wrote: > >> Branch: refs/archives/heads/branch-1-0-0 >> Home: http://github.com/bioperl/bioperl-live >> >> Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 >> >> http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 >> Author: sac >> Date: 2002-10-22 (Tue, 22 Oct 2002) >> >> Changed paths: >> M Bio/SearchIO/Writer/HitTableWriter.pm >> >> Log Message: >> ----------- >> Added frame to the column map. >> >> svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 >> >> >> _______________________________________________ >> Bioperl-guts-l mailing list >> Bioperl-guts-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Tue May 18 03:16:52 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 00:16:52 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> Message-ID: <4BF23EE4.6020704@cornell.edu> We may want to do the same for our tags as well. Our github download page is fairly disastrous. See: http://github.com/bioperl/bioperl-live/downloads It's not clear that a similar date-cutoff policy would work for tags. Pretty much all of these things were before my time, I don't know what most of them are. Does someone with more history than me have some thoughts as to what should stay on that download page? The rest of the tags could be archived. Rob Jay Hannah wrote: > On May 17, 2010, at 7:15 PM, Robert Buels wrote: >> OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches > > Thank you!! git pull --prune and suddenly I feel clean again! :) > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > From bpcwhite at gmail.com Tue May 18 05:49:29 2010 From: bpcwhite at gmail.com (Bryan White) Date: Tue, 18 May 2010 02:49:29 -0700 (PDT) Subject: [Bioperl-l] distance Message-ID: Hello, I am trying to create a simple program to show me the distance between taxa on a given tree. However, I am having trouble getting the bioperl code to work. Here is the code that I am using: -------- #! /usr/bin/perl use strict; use warnings; use Bio::Tree::Draw::Cladogram; use Bio::TreeIO; #use Bio::TreeFunctionsI; my $node1 = 'homo_sapiens'; my $node2 = 'murinae'; my $input = new Bio::TreeIO('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree = $input->next_tree; my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); my $distance = $tree->distance(-nodes => \@nodes); #print $distance; -------- And here is the error message I receive: ------------- EXCEPTION ------------- MSG: Must provide 2 nodes STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ Bio/Tree/TreeFunctionsI.pm:811 STACK toplevel ./phylo.pl:19 ------------------------------------- It seems that the nodes are not being read into the @nodes variable. Any help in figuring this out would be appreciated. Thanks, Bryan From biopython at maubp.freeserve.co.uk Tue May 18 06:07:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 May 2010 11:07:15 +0100 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BF23EE4.6020704@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> Message-ID: On Tue, May 18, 2010 at 8:16 AM, Robert Buels wrote: > We may want to do the same for our tags as well. ?Our github download page > is fairly disastrous. ?See: > > http://github.com/bioperl/bioperl-live/downloads > > It's not clear that a similar date-cutoff policy would work for tags. Pretty > much all of these things were before my time, I don't know what most of them > are. > > Does someone with more history than me have some thoughts as to what should > stay on that download page? ?The rest of the tags could be archived. > > Rob Or just turn off the download feature in github. When you prepare a BioPerl release does it contain anything else not found in the repository (e.g. compiled documentation)? We have this for Biopython (compiled PDF and HTML docs) so we prefer to direct casual release downloads via the website not via the tag on github to ensure they get these extra files in the archive. Peter From adsj at novozymes.com Tue May 18 06:21:25 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Tue, 18 May 2010 12:21:25 +0200 Subject: [Bioperl-l] distance References: Message-ID: <87k4r11pei.fsf@topper.koldfront.dk> On Tue, 18 May 2010 02:49:29 -0700 (PDT), Bryan wrote: > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); I think you may have misunderstood the documentation of find_node(). You are supposed to give the fieldname after the dash, so what you want is: my @nodes = $tree->find_node(-id => 'Homo_sapiens','Murinae'); - if the field you want to match on is 'id'. Also, I don't think you can get find_node() to do 'OR'-searches , so you'll need to do something like this: = = = #!/usr/bin/perl use strict; use warnings; use Bio::TreeIO; my $input=Bio::TreeIO->new('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree=$input->next_tree; my ($node1)=$tree->find_node(-id=>'Homo_sapiens'); # this (arbitrarily) picks the first match my ($node2)=$tree->find_node(-id=>'Murinae'); # -"- my $distance=$tree->distance(-nodes=>[$node1, $node2]); print "$distance\n"; = = = It is much easier to help if you give an example of the input as well as the script. I constructed this stand-in for your newick file to test on: (Homo_sapiens:1.1,B:2.2,(C:3.3,Murinae:4.4):5.5); Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From David.Messina at sbc.su.se Tue May 18 06:50:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 12:50:52 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> Message-ID: <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> On May 18, 2010, at 12:07, Peter wrote: > Or just turn off the download feature in github. That might be the best solution, at least for now. The download page is somewhat unfriendly anyway ? the tag names are truncated, there's no way to sort, and the descriptions are, well, not so descriptive (they appear to be just the last commit message). Probably better to keep http://www.bioperl.org/wiki/Getting_BioPerl as our main distribution point for downloads. Dave From jun.yin at ucd.ie Tue May 18 07:15:14 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 18 May 2010 12:15:14 +0100 Subject: [Bioperl-l] distance In-Reply-To: <87k4r11pei.fsf@topper.koldfront.dk> References: <87k4r11pei.fsf@topper.koldfront.dk> Message-ID: <002d01caf67b$637c20d0$2a746270$%yin@ucd.ie> Hi, Bryan, Use Adam's code. The last sentence of my code was wrong. I made a wrong reference... Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Adam "Sj?gren" Sent: Tuesday, May 18, 2010 11:21 AM To: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] distance On Tue, 18 May 2010 02:49:29 -0700 (PDT), Bryan wrote: > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); I think you may have misunderstood the documentation of find_node(). You are supposed to give the fieldname after the dash, so what you want is: my @nodes = $tree->find_node(-id => 'Homo_sapiens','Murinae'); - if the field you want to match on is 'id'. Also, I don't think you can get find_node() to do 'OR'-searches , so you'll need to do something like this: = = = #!/usr/bin/perl use strict; use warnings; use Bio::TreeIO; my $input=Bio::TreeIO->new('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree=$input->next_tree; my ($node1)=$tree->find_node(-id=>'Homo_sapiens'); # this (arbitrarily) picks the first match my ($node2)=$tree->find_node(-id=>'Murinae'); # -"- my $distance=$tree->distance(-nodes=>[$node1, $node2]); print "$distance\n"; = = = It is much easier to help if you give an example of the input as well as the script. I constructed this stand-in for your newick file to test on: (Homo_sapiens:1.1,B:2.2,(C:3.3,Murinae:4.4):5.5); Best regards, Adam -- Adam Sj?gren adsj at novozymes.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From amackey at virginia.edu Tue May 18 07:26:17 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 18 May 2010 07:26:17 -0400 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> References: <20100518001029.CD8644229D@smtp1.rs.github.com> <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> Message-ID: Thanks for the info, and the thoroughness of your explanation! -Aaron On Mon, May 17, 2010 at 9:35 PM, Chris Fields wrote: > Aaron, > > We can do either, though setting up diffs will take a bit more work (will > have to set up a post-receive URL to a CGI script to process this). > > RSS is quite a bit easier: > > http://github.com/bioperl/bioperl-live/commits/master.atom > > Replace 'bioperl-live' with any of the other repos for repo-specific RSS > commits. The links go to the commits where you can also make in-line > notes/comments by clicking in the diff code, or simple comments at the > bottom. Those comments are then passed on to bioperl-guts-l for everyone to > see. Example here: > > > http://github.com/bioperl/bioperl-live/commit/c86c048c96786f8517ae1ad1fc5e5823eecf52c3 > > and the relevant bioperl-guts-l posts: > > http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031259.html > http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031260.html > > chris > > On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > > > I probably missed some prior discussion of this, but any chance that the > new > > commit messages can actually include the (unified, possibly > > truncated-for-length) diff of the changes? > > > > My own 2 cents is that community-wide visual skims of the diffs provide a > > valuable spot-check for typo's and other think-o's. Plus it gives me an > > indication of how major the change was. > > > > A corollary -- might there be an RSS feed by which I could subscribe to > such > > diffs, rather than get emails about them? Since the emails are sent from > > "noreply", I already have to step out of the normal email flow to respond > to > > a diff, might as well go whole hog and remove them from my email > > consciousness entirely, and place them with the other various information > > streams in my RSS reader. > > > > Thanks, > > > > -Aaron > > > > On Mon, May 17, 2010 at 8:10 PM, wrote: > > > >> Branch: refs/archives/heads/branch-1-0-0 > >> Home: http://github.com/bioperl/bioperl-live > >> > >> Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 > >> > >> > http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 > >> Author: sac > >> Date: 2002-10-22 (Tue, 22 Oct 2002) > >> > >> Changed paths: > >> M Bio/SearchIO/Writer/HitTableWriter.pm > >> > >> Log Message: > >> ----------- > >> Added frame to the column map. > >> > >> svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 > >> > >> > >> _______________________________________________ > >> Bioperl-guts-l mailing list > >> Bioperl-guts-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jun.yin at ucd.ie Tue May 18 07:07:43 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 18 May 2010 12:07:43 +0100 Subject: [Bioperl-l] distance In-Reply-To: References: Message-ID: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> Hi, Bryan, In your code: my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); First, You should specify the fieldname. The "fieldname" itself doesnot seem like a valid key. The default field name is "id". Second, the find_node method can only search for one specific term at one time. Third, distance method can only work on two nodes. So try this: my @nodes_human = $tree->find_node(-id => 'Homo_sapiens'); my @nodes_murinae=$tree->find_node(-id=>'Murinae'); my $distance = $tree->distance(-nodes => \($nodes_human[0],$nodes_murinae[0])); #Providing you only have one match for "Homo_sapiens" and " Murinae". Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bryan White Sent: Tuesday, May 18, 2010 10:49 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] distance Hello, I am trying to create a simple program to show me the distance between taxa on a given tree. However, I am having trouble getting the bioperl code to work. Here is the code that I am using: -------- #! /usr/bin/perl use strict; use warnings; use Bio::Tree::Draw::Cladogram; use Bio::TreeIO; #use Bio::TreeFunctionsI; my $node1 = 'homo_sapiens'; my $node2 = 'murinae'; my $input = new Bio::TreeIO('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree = $input->next_tree; my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); my $distance = $tree->distance(-nodes => \@nodes); #print $distance; -------- And here is the error message I receive: ------------- EXCEPTION ------------- MSG: Must provide 2 nodes STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ Bio/Tree/TreeFunctionsI.pm:811 STACK toplevel ./phylo.pl:19 ------------------------------------- It seems that the nodes are not being read into the @nodes variable. Any help in figuring this out would be appreciated. Thanks, Bryan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Tue May 18 08:47:10 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 07:47:10 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> Message-ID: <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> On May 18, 2010, at 5:50 AM, Dave Messina wrote: > > On May 18, 2010, at 12:07, Peter wrote: > >> Or just turn off the download feature in github. > > That might be the best solution, at least for now. > > The download page is somewhat unfriendly anyway ? the tag names are truncated, there's no way to sort, and the descriptions are, well, not so descriptive (they appear to be just the last commit message). > > Probably better to keep > > http://www.bioperl.org/wiki/Getting_BioPerl > > as our main distribution point for downloads. > > > Dave We can turn that off for now, though it is a nice feature. If we need a replacement link for downloads we can use the repo.or.cz mirror link, for example: http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip chris From David.Messina at sbc.su.se Tue May 18 08:53:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 14:53:29 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: On May 18, 2010, at 14:47, Chris Fields wrote: > http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz > http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. I'll go ahead and update the nightly build links on http://www.bioperl.org/wiki/Getting_BioPerl to point to those, then, unless there are objections. Dave From cjfields at illinois.edu Tue May 18 09:56:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 08:56:45 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> On May 18, 2010, at 7:53 AM, Dave Messina wrote: > > On May 18, 2010, at 14:47, Chris Fields wrote: > >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip > > > Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. > > > I'll go ahead and update the nightly build links on > > http://www.bioperl.org/wiki/Getting_BioPerl > > to point to those, then, unless there are objections. > > > Dave This link also still works, even with the 'Downloads' tab off: http://github.com/bioperl/bioperl-live/archives/master Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. 'build' really never applied either, but oh well... chris From biopython at maubp.freeserve.co.uk Tue May 18 09:57:50 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 May 2010 14:57:50 +0100 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: On Tue, May 18, 2010 at 1:53 PM, Dave Messina wrote: > > > On May 18, 2010, at 14:47, Chris Fields wrote: > >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip > > > Oh right, I forgot about the mirror. Silly me. :) So probably > unnecessary to make our own nightly snapshots then. > Just like what you'd get from the big "Download Source" button on github? Equivalent to visiting this page: http://github.com/bioperl/bioperl-live/archives/master Peter From cjfields at illinois.edu Tue May 18 10:03:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 09:03:46 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> Message-ID: On May 18, 2010, at 8:56 AM, Chris Fields wrote: > On May 18, 2010, at 7:53 AM, Dave Messina wrote: > >> >> On May 18, 2010, at 14:47, Chris Fields wrote: >> >>> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >>> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip >> >> >> Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. >> >> >> I'll go ahead and update the nightly build links on >> >> http://www.bioperl.org/wiki/Getting_BioPerl >> >> to point to those, then, unless there are objections. >> >> >> Dave > > This link also still works, even with the 'Downloads' tab off: > > http://github.com/bioperl/bioperl-live/archives/master > > Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. > > 'build' really never applied either, but oh well... > > chris Oh, and on the topic of annotated tags for downloads: http://github.com/blog/651-annotated-downloads chris From David.Messina at sbc.su.se Tue May 18 10:23:34 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 16:23:34 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> Message-ID: <075CC735-0573-4E79-975F-23AD61C41C72@sbc.su.se> On May 18, 2010, at 16:03, Chris Fields wrote: > > This link also still works, even with the 'Downloads' tab off: > > http://github.com/bioperl/bioperl-live/archives/master Ah, great, thanks Chris and Peter. > Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. > > 'build' really never applied either, but oh well... Righto ? done. 'Snapshots' it is. > Oh, and on the topic of annotated tags for downloads: > > http://github.com/blog/651-annotated-downloads Heh, how timely. :) Good, that will solve the description part of it nicely. Dave From jay at jays.net Tue May 18 10:32:47 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 18 May 2010 09:32:47 -0500 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: <20100518030511.59C314202D@smtp1.rs.github.com> References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: Hi Florent, Can you add a line to the /Changes please? New features are especially great to add to that file. :) If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. You also might want to set your git config so your email is valid in your commits. e.g.: $ git config user.name "Jay Hannah" $ git config user.email jay at jays.net (these end up in ~/.gitconfig) Thanks! Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah On May 17, 2010, at 10:05 PM, noreply at github.com wrote: > Branch: refs/heads/master > Home: http://github.com/bioperl/bioperl-live > > Commit: 87c530525da35a981e9f7b06134184f0adfae156 > http://github.com/bioperl/bioperl-live/commit/87c530525da35a981e9f7b06134184f0adfae156 > Author: Florent Angly > Date: 2010-05-17 (Mon, 17 May 2010) > > Changed paths: > M Bio/Assembly/IO.pm > M Bio/Assembly/IO/ace.pm > M t/Assembly/Assembly.t > > Log Message: > ----------- > Implemented the 454 Newbler ACE assembly variant > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l From florent.angly at gmail.com Tue May 18 11:11:40 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 18 May 2010 08:11:40 -0700 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: <4BF2AE2C.209@gmail.com> Good idea Jay! I did as you suggested. Florent On 18/05/10 07:32, Jay Hannah wrote: > Can you add a line to the /Changes please? New features are especially great to add to that file.:) > > If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. > > You also might want to set your git config so your email is valid in your commits. e.g.: > From bimber at wisc.edu Tue May 18 11:28:06 2010 From: bimber at wisc.edu (Ben Bimber) Date: Tue, 18 May 2010 10:28:06 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? Message-ID: this question is more of a general perl one than bioperl specific, so I hope it is appropriate for this list: I am writing code that has two steps. the first generates a large, complex hash describing mutations. it takes a fair amount of time to run this step. the second step uses this data to perform downstream calculations. for the purposes of writing/debugging this downstream code, it would save me a lot of time if i could run the first step once, then store this hash in something like the file system. this way I could quickly load it, when debugging the downstream code without waiting for the hash to be recreated. is there a 'best practice' way to do something like this? I could save a tab-delimited file, which is human readable, but does not represent the structure of the hash, so I would need code to re-parse it. I assume I could probably do something along the lines of dumping a JSON string, then read/decode it. this is easy, but not so human-readable. is there another option i'm not thinking of? what do others do in this sort of situation? thanks in advance. -Ben From cjfields at illinois.edu Tue May 18 11:31:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 10:31:14 -0500 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: On May 18, 2010, at 9:32 AM, Jay Hannah wrote: > Hi Florent, > > Can you add a line to the /Changes please? New features are especially great to add to that file. :) > > If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. Agreed (or, +1, depending on your taste). Also, I would really like to break the habit of committing everything straight to trunk and promote using branches more. Branches are cheap. Something like: # on master git checkout -b 'topic/feature_foo' # switches over to branch 'topic/feature_foo' # hack hack hack # make commits # add tests # add to Changes # make more commits # push to remote branch # merge to master git checkout master git merge 'topic/feature_foo' # test test test, etc, push to origin or similar. Of course, there would be more to it (handling merge conflicts, etc), just need to get a decent workflow document started up. Ah tuits, where are you? > You also might want to set your git config so your email is valid in your commits. e.g.: > > $ git config user.name "Jay Hannah" > $ git config user.email jay at jays.net > (these end up in ~/.gitconfig) > > Thanks! > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I think these are only set there if you use --global, correct? Otherwise it's repo-specific, would be in .git/ somewhere. chris From s.denaxas at gmail.com Tue May 18 11:41:01 2010 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Tue, 18 May 2010 16:41:01 +0100 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Hello, it all really depends on your definition of readable. YAML is readable but requires a parser ; XML is readable but is bloated and requires a code and a parser. You can directly dump the output from Data::Dumper and then eval() it back in a hash. I would think this is the cleanest way if you specifically want to dump a hash and re-generate it with no additional code. You can set the $Data::Dumper::Indent flag to control how readable the hash is. hope this helps, Spiros On Tue, May 18, 2010 at 4:28 PM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. ?the first generates a large, > complex hash describing mutations. ?it takes a fair amount of time to > run this step. ?the second step uses this data to perform downstream > calculations. ?for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. ?this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? ?I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. ?I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. ?this is easy, but not so > human-readable. ?is there another option i'm not thinking of? ?what do > others do in this sort of situation? > > thanks in advance. > > -Ben > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From adsj at novozymes.com Tue May 18 11:57:12 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Tue, 18 May 2010 17:57:12 +0200 Subject: [Bioperl-l] storing/retrieving a large hash on file system? References: Message-ID: <87zkzxmcdj.fsf@topper.koldfront.dk> On Tue, 18 May 2010 10:28:06 -0500, Ben wrote: > is there a 'best practice' way to do something like this? The only one I can think of is "Don't make up your own format unless you really, really have to". > I could save a tab-delimited file, which is human readable, but does > not represent the structure of the hash, so I would need code to > re-parse it. I assume I could probably do something along the lines of > dumping a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? I would use YAML or JSON if I had to look at it "by hand" or if it had to be somehow portable. I would prefer those over CSV, which hasn't necessarily got well-defined handling of special chars, whitespace etc. If speed is more important, I think the Storable module is quite a bit quicker, but the format is "binary". Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From sdavis2 at mail.nih.gov Tue May 18 12:09:38 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 18 May 2010 12:09:38 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > There are a number of solutions on CPAN, probably. This is one maybe off the beaten path, but it is getting a lot of press in the NoSQL database realm: http://1978th.net/tokyocabinet/ Sean From David.Messina at sbc.su.se Tue May 18 12:19:18 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 18:19:18 +0200 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Hi Ben, Storable should do the trick. http://search.cpan.org/~ams/Storable-2.21/ It allows you to save arbitrary perl data structures to disk and load them back in without needing to dump into another format and then parse it later. Dave From cjfields at illinois.edu Tue May 18 12:22:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 11:22:09 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On May 18, 2010, at 10:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > -Ben Would a simple DB_File tied hash work? chris From cjfields at illinois.edu Tue May 18 12:25:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 11:25:11 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: <87zkzxmcdj.fsf@topper.koldfront.dk> References: <87zkzxmcdj.fsf@topper.koldfront.dk> Message-ID: On May 18, 2010, at 10:57 AM, Adam Sj?gren wrote: > On Tue, 18 May 2010 10:28:06 -0500, Ben wrote: > >> is there a 'best practice' way to do something like this? > > The only one I can think of is "Don't make up your own format unless you > really, really have to". > >> I could save a tab-delimited file, which is human readable, but does >> not represent the structure of the hash, so I would need code to >> re-parse it. I assume I could probably do something along the lines of >> dumping a JSON string, then read/decode it. this is easy, but not so >> human-readable. is there another option i'm not thinking of? what do >> others do in this sort of situation? > > I would use YAML or JSON if I had to look at it "by hand" or if it had > to be somehow portable. I would prefer those over CSV, which hasn't > necessarily got well-defined handling of special chars, whitespace etc. > > If speed is more important, I think the Storable module is quite a bit > quicker, but the format is "binary". > > > Best regards, > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Yes, that in combination with a AnyDBM tied hash would work (essentially what Bio::SeqFeature::Collection is under the hood). chris From sdavis2 at mail.nih.gov Tue May 18 12:39:44 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 18 May 2010 12:39:44 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On Tue, May 18, 2010 at 12:09 PM, Sean Davis wrote: > > > On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: > >> this question is more of a general perl one than bioperl specific, so >> I hope it is appropriate for this list: >> >> I am writing code that has two steps. the first generates a large, >> complex hash describing mutations. it takes a fair amount of time to >> run this step. the second step uses this data to perform downstream >> calculations. for the purposes of writing/debugging this downstream >> code, it would save me a lot of time if i could run the first step >> once, then store this hash in something like the file system. this >> way I could quickly load it, when debugging the downstream code >> without waiting for the hash to be recreated. >> >> is there a 'best practice' way to do something like this? I could >> save a tab-delimited file, which is human readable, but does not >> represent the structure of the hash, so I would need code to re-parse >> it. I assume I could probably do something along the lines of dumping >> a JSON string, then read/decode it. this is easy, but not so >> human-readable. is there another option i'm not thinking of? what do >> others do in this sort of situation? >> >> thanks in advance. >> >> > There are a number of solutions on CPAN, probably. This is one maybe off > the beaten path, but it is getting a lot of press in the NoSQL database > realm: > > http://1978th.net/tokyocabinet/ > > Just to be clear, I am assuming that the problem at hand is storing a key/value pair and then retrieving it later. If what you are talking about is a multi-level hash data structure, then Data::Dumper might be the easiest way to go. Sorry for the confusion.... Sean From bimber at wisc.edu Tue May 18 12:47:33 2010 From: bimber at wisc.edu (Ben Bimber) Date: Tue, 18 May 2010 11:47:33 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Thanks for all the suggestions. Storable seems like the simplest route. This will save me hours of staring at my computer. -Ben On Tue, May 18, 2010 at 11:39 AM, Sean Davis wrote: > > > On Tue, May 18, 2010 at 12:09 PM, Sean Davis wrote: >> >> >> On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: >>> >>> this question is more of a general perl one than bioperl specific, so >>> I hope it is appropriate for this list: >>> >>> I am writing code that has two steps. ?the first generates a large, >>> complex hash describing mutations. ?it takes a fair amount of time to >>> run this step. ?the second step uses this data to perform downstream >>> calculations. ?for the purposes of writing/debugging this downstream >>> code, it would save me a lot of time if i could run the first step >>> once, then store this hash in something like the file system. ?this >>> way I could quickly load it, when debugging the downstream code >>> without waiting for the hash to be recreated. >>> >>> is there a 'best practice' way to do something like this? ?I could >>> save a tab-delimited file, which is human readable, but does not >>> represent the structure of the hash, so I would need code to re-parse >>> it. ?I assume I could probably do something along the lines of dumping >>> a JSON string, then read/decode it. ?this is easy, but not so >>> human-readable. ?is there another option i'm not thinking of? ?what do >>> others do in this sort of situation? >>> >>> thanks in advance. >>> >> >> There are a number of solutions on CPAN, probably.? This is one maybe off >> the beaten path, but it is getting a lot of press in the NoSQL database >> realm: >> >> http://1978th.net/tokyocabinet/ >> > > Just to be clear, I am assuming that the problem at hand is storing a > key/value pair and then retrieving it later.? If what you are talking about > is a multi-level hash data structure, then Data::Dumper might be the easiest > way to go. > > Sorry for the confusion.... > > Sean > > > From bosborne11 at verizon.net Tue May 18 12:00:06 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 18 May 2010 12:00:06 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> Ben, I've use Storable to do things like this, for example: use Storable; my %species = ( "Sc" => 4932, # Saccharomyces cerevisiae "Ec" => 83333, # Escherichia coli K12 "Hs" => 9606 # H. sapiens ); my ($help,$id,$name); GetOptions( "s=s" => \$name, "i=i" => \$id, "h" => \$help ); usage() if ($help || !$id || !$name); my $storedHash = $name . ".dump"; # create index for a directory of fasta files my $db = Bio::DB::Fasta->new($name, -makeid => \&make_my_id); # extract species-specific data from gene2accession unless (-e $storedHash) { my $ref; # extract species-specific information from gene2accession open MYIN,"gene2accession" or die "No gene2accession file\n"; while () { my @arr = split "\t",$_; if ($arr[0] == $species{$name} && $arr[9] =~ /\d+/ && $arr[10] =~ /\d+/) { ($ref->{$arr[1]}->{"start"}, $ref->{$arr[1]}->{"end"}, $ref->{$arr[1]}->{"strand"}, $ref->{$arr[1]}->{"id"}) = ($arr[9], $arr[10], $arr[11], $arr[7]); } } # save species-specific information using Storable store $ref, $storedHash; } # retrieve the species-specific data from a stored hash my $ref = retrieve($storedHash); Take away all the parsing details and you can see that it's simple, and that Storable exports store() and retrieve(). Make up a file name, "store" the hash reference. Brian O. On May 18, 2010, at 11:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > -Ben > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Tue May 18 12:06:54 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 18 May 2010 12:06:54 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? Message-ID: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> bioperl-l, Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. We want these to point to github, yes? I'll fix it if the answer is 'yes'. Brian O. From cjfields at illinois.edu Tue May 18 14:04:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 13:04:55 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> Message-ID: <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> Yes. chris On May 18, 2010, at 11:06 AM, Brian Osborne wrote: > bioperl-l, > > Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. > > We want these to point to github, yes? I'll fix it if the answer is 'yes'. > > Brian O. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Tue May 18 15:39:48 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 12:39:48 -0700 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: <4BF2ED04.2050106@cornell.edu> Chris Fields wrote: > Agreed (or, +1, depending on your taste). Also, I would really like to break the habit of committing everything straight to trunk and promote using branches more. Branches are cheap. I did some work on our git workflow at http://www.bioperl.org/wiki/Using_Git#Developing_BioPerl, but it still needs some more work. So, there's the start of the workflow document I think. Rob From rmb32 at cornell.edu Tue May 18 15:42:44 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 12:42:44 -0700 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> References: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> Message-ID: <4BF2EDB4.4060907@cornell.edu> Based on your description, you want to use either: Storable - if you want to load the whole hash into memory or AnyDBM - if you want to be able to look things up from the hash without loading the whole thing in memory Rob From David.Messina at sbc.su.se Tue May 18 16:16:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 22:16:14 +0200 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: <4BF2ED04.2050106@cornell.edu> References: <20100518030511.59C314202D@smtp1.rs.github.com> <4BF2ED04.2050106@cornell.edu> Message-ID: <2D6396F7-E478-4544-B26A-F8A5799F2039@sbc.su.se> Nice, Rob! > I did some work on our git workflow at http://www.bioperl.org/wiki/Using_Git#Developing_BioPerl, but it still needs some more work. > > So, there's the start of the workflow document I think. From bpcwhite at gmail.com Tue May 18 17:34:06 2010 From: bpcwhite at gmail.com (Bryan White) Date: Tue, 18 May 2010 14:34:06 -0700 (PDT) Subject: [Bioperl-l] distance In-Reply-To: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> References: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> Message-ID: <1a2c786f-07e6-4499-8dc9-19a8d4169653@u3g2000prl.googlegroups.com> Thanks guys, I got it working! Bryan On May 18, 4:07?am, Jun Yin wrote: > Hi, Bryan, > > In your code: > ? ? ? ? my @nodes = $tree->find_node(-fieldname => > 'Homo_sapiens','Murinae'); > > First, You should specify the fieldname. The "fieldname" itself doesnot seem > like a valid key. The default field name is "id". > Second, the find_node method can only search for one specific term at one > time. > Third, distance method can only work on two nodes. > > So try this: > > my @nodes_human = $tree->find_node(-id => 'Homo_sapiens'); > my @nodes_murinae=$tree->find_node(-id=>'Murinae'); > > my $distance = $tree->distance(-nodes => > \($nodes_human[0],$nodes_murinae[0])); #Providing you only have one match > for "Homo_sapiens" and " Murinae". > > Cheers, > Jun Yin > Ph.D.?student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > -----Original Message----- > From: bioperl-l-boun... at lists.open-bio.org > > [mailto:bioperl-l-boun... at lists.open-bio.org] On Behalf Of Bryan White > Sent: Tuesday, May 18, 2010 10:49 AM > To: bioper... at bioperl.org > Subject: [Bioperl-l] distance > > Hello, > > I am trying to create a simple program to show me the distance between > taxa on a given tree. However, I am having trouble getting the bioperl > code to work. Here is the code that I am using: > -------- > #! /usr/bin/perl > use strict; > use warnings; > use Bio::Tree::Draw::Cladogram; > use Bio::TreeIO; > #use Bio::TreeFunctionsI; > > my $node1 = 'homo_sapiens'; > my $node2 = 'murinae'; > my $input = new Bio::TreeIO('-format' => 'newick', > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-file' => 'tree_mammalia_newick.txt'); > > my $tree = $input->next_tree; > > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); > > my $distance = $tree->distance(-nodes => \@nodes); > > #print $distance; > > -------- > > And here is the error message I receive: > > ------------- EXCEPTION ------------- > MSG: Must provide 2 nodes > STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ > Bio/Tree/TreeFunctionsI.pm:811 > STACK toplevel ./phylo.pl:19 > ------------------------------------- > > It seems that the nodes are not being read into the @nodes variable. > Any help in figuring this out would be appreciated. > > Thanks, > Bryan > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________ Information from ESET Smart Security, version of virus signature > database 5099 (20100509) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > __________ Information from ESET Smart Security, version of virus signature > database 5099 (20100509) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Wed May 19 00:17:24 2010 From: hartzell at alerce.com (George Hartzell) Date: Tue, 18 May 2010 21:17:24 -0700 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: <19443.26196.893455.52821@gargle.gargle.HOWL> Ben Bimber writes: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? Someone early on in the thread said not to invent another format, and I concur with that whole heartedly. Your choice of words, "large complex hash" makes me worry that you have something more than a large single level hash with sensible keys. Hashes of references to hashes to references to lists to etc... give me hives. If you'ld like to put add a nice general purpose tool to your kit, think about putting it into a simple SQLite database. Put it into an SQLite db and talk to it via DBI and you get some really cool tricks: - you can store complex stuff, - get back the just the part you need, a column, several columns, or the result of a join among multiple tables, - add indexes to make it Go Fast. and in the cool tricks category - you can use SQLite's backup interface to build the database in memory (nice and fast) then quickly stream it out to a disk based file for persistence. - same trick in reverse, if you know you're going to do a reasonably large number of complex queries you can stream a database into memory and then run your queries quickly. - rtree indexes are cool. Going forward you can scale things up to big databases (Pg, Oracle), you can provide safe multiuser access, transactions, etc.... (NFS not withstanding), etc.... g. From avilella at gmail.com Wed May 19 04:36:25 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 19 May 2010 09:36:25 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Message-ID: Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. From jun.yin at ucd.ie Wed May 19 06:40:51 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Wed, 19 May 2010 11:40:51 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: <008101caf73f$c04973c0$40dc5b40$%yin@ucd.ie> Hi, Albert, Check this page for the BioPerl wrapper on next-gen sequencing results http://bioperl.org/wiki/HOWTO:Short-read_assemblies_with_BWA And, I don't think Bio::SimpleAlign works on assembly files. It is targeted at global alignment, e.g. clustalw output file. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Albert Vilella Sent: Wednesday, May 19, 2010 9:36 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From maj at fortinbras.us Wed May 19 09:34:01 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 09:34:01 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > Hi, > > I would like to know what would be the best way to generate a SAM/BAM file > with cDNA alignments against the human reference from a bunch of > Bio::SimpleAlign > cDNA multiple sequence alignment objects. > > Considering I've got a way to map the cDNAs to chromosome coordinates, > how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 > human > coordinates? > > As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads > assemblies. > Should I be using some other tool existing not in bioperl? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed May 19 09:59:03 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 09:59:03 -0400 Subject: [Bioperl-l] out of memory issue In-Reply-To: References: Message-ID: Hi Shalabh and all, Sorry to comment on an old thread, but Dan Kortschak just pointed me to Tie::File. This may be the right solution to this issue. It turns out that DB_File will read in the entire file to memory anyway, while Tie::File (by MJD of course) works on pieces as it should. See Tie::File in CPAN and also this informative post: http://perl.plover.com/TieFile/why-not-DB_File cheers all- (someday, maybe next month, I'll return in force) MAJ ----- Original Message ----- From: "shalabh sharma" To: "bioperl-l" Sent: Wednesday, April 28, 2010 10:13 AM Subject: [Bioperl-l] out of memory issue > Hi All, > I am trying to make a hash of 38 Million ids but every time i get the > following message : > > perl(191) malloc: *** mmap(size=16777216) failed (error code=12) > *** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > Out of memory! > > I am working on MacOX 10.5.8 with 4GB of memory. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From avilella at gmail.com Wed May 19 11:00:27 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 19 May 2010 16:00:27 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Awesome, thanks. I'll give it a try :-) On Wed, May 19, 2010 at 2:34 PM, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use > of Bio::Assembly::IO::sam (I think). I know there is only read capability > for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing > writes (some assembly (so to speak) required...)-- cheers MAJ > ----- Original Message ----- From: "Albert Vilella" > To: > Sent: Wednesday, May 19, 2010 4:36 AM > Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > > >> Hi, >> >> I would like to know what would be the best way to generate a SAM/BAM file >> with cDNA alignments against the human reference from a bunch of >> Bio::SimpleAlign >> cDNA multiple sequence alignment objects. >> >> Considering I've got a way to map the cDNAs to chromosome coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >> human >> coordinates? >> >> As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads >> assemblies. >> Should I be using some other tool existing not in bioperl? >> >> Cheers, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > From lincoln.stein at gmail.com Wed May 19 12:40:31 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Wed, 19 May 2010 12:40:31 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the > use of Bio::Assembly::IO::sam (I think). I know there is only read > capability for B:A:I:sam, but Samtools may give you the appropriate wrapper > for doing writes (some assembly (so to speak) required...)-- cheers MAJ > ----- Original Message ----- From: "Albert Vilella" > > To: > Sent: Wednesday, May 19, 2010 4:36 AM > > Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > > > Hi, >> >> I would like to know what would be the best way to generate a SAM/BAM file >> with cDNA alignments against the human reference from a bunch of >> Bio::SimpleAlign >> cDNA multiple sequence alignment objects. >> >> Considering I've got a way to map the cDNAs to chromosome coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >> human >> coordinates? >> >> As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads >> assemblies. >> Should I be using some other tool existing not in bioperl? >> >> Cheers, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From john.marshall at sanger.ac.uk Wed May 19 12:22:19 2010 From: john.marshall at sanger.ac.uk (John Marshall) Date: Wed, 19 May 2010 17:22:19 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: On 19 May 2010, at 14:34, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates > the use of Bio::Assembly::IO::sam (I think). I've only briefly skimmed the B:T:R:Samtools documentation, but it would appear that this mostly encapsulates running the various samtools subcommands. These provide various manipulations on SAM and BAM files, but don't give you anything in terms of converting from not- SAM/BAM to SAM/BAM. > ----- Original Message ----- From: "Albert Vilella" > >> Considering I've got a way to map the cDNAs to chromosome >> coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against >> ~23.000 human >> coordinates? Perhaps I misunderstand, but if you already have a bunch of snippets of sequence and their mapped coordinates, then the easy way to generate a SAM file containing them is just to print it out by hand. A SAM file is just a tab-separated text file. For each sequence in your Bio::SimpleAlign objects, print out a line containing appropriate values for each of the 11 main SAM fields. (If the snippets are effectively unpaired, then MRNM,MPOS,ISIZE can just be *,0,0, and the only FLAG values you'll be choosing between are 0, 4, 16, and 20.) You should also start the file with an @SQ header for each of the chromosomes you've mapped against. (I'm assuming you've read http://samtools.sourceforge.net/SAM1.pdf -- it's a little vague, but should be more than enough to explain how to e.g. print out a basic SAM file with only the main fields.) Once you've printed out a simple SAM file, you can use B:T:R:Samtools or samtools directly or other tools to convert it to the binary BAM format and/or otherwise work with it. Cheers, John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From maj at fortinbras.us Wed May 19 13:26:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:26:16 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: <42F365BE46A545CE9DF897BA0B18B8EF@NewLife> CORRECTION: B:T:R:Samtools wraps samtools directly, as John said. Sorry, it's been a while... MAJ ----- Original Message ----- From: Lincoln Stein To: Mark A. Jensen Cc: Albert Vilella ; bioperl-l at bioperl.org Sent: Wednesday, May 19, 2010 12:40 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From maj at fortinbras.us Wed May 19 13:30:25 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:30:25 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Yes that's right John; B:T:R:Samtools is used within the B:A:.I:sam to do the write out with samtools command line pgms. Interested parties might look at Bio::Asssembly::IO::sam to see how Lincoln's Bio::DB::Sam (which uses the libbam library directly via XS, also not BioPerl proper but we love it anyway) might be employed. ----- Original Message ----- From: "John Marshall" To: Cc: "Albert Vilella" Sent: Wednesday, May 19, 2010 12:22 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM > On 19 May 2010, at 14:34, Mark A. Jensen wrote: >> Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use >> of Bio::Assembly::IO::sam (I think). > > I've only briefly skimmed the B:T:R:Samtools documentation, but it would > appear that this mostly encapsulates running the various samtools > subcommands. These provide various manipulations on SAM and BAM files, but > don't give you anything in terms of converting from not- SAM/BAM to SAM/BAM. > >> ----- Original Message ----- From: "Albert Vilella" > > >>> Considering I've got a way to map the cDNAs to chromosome coordinates, >>> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >>> human >>> coordinates? > > Perhaps I misunderstand, but if you already have a bunch of snippets of > sequence and their mapped coordinates, then the easy way to generate a SAM > file containing them is just to print it out by hand. > > A SAM file is just a tab-separated text file. For each sequence in your > Bio::SimpleAlign objects, print out a line containing appropriate values for > each of the 11 main SAM fields. (If the snippets are effectively unpaired, > then MRNM,MPOS,ISIZE can just be *,0,0, and the only FLAG values you'll be > choosing between are 0, 4, 16, and 20.) > > You should also start the file with an @SQ header for each of the chromosomes > you've mapped against. > > (I'm assuming you've read http://samtools.sourceforge.net/SAM1.pdf -- it's a > little vague, but should be more than enough to explain how to e.g. print out > a basic SAM file with only the main fields.) > > Once you've printed out a simple SAM file, you can use B:T:R:Samtools or > samtools directly or other tools to convert it to the binary BAM format > and/or otherwise work with it. > > Cheers, > > John > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a > charity registered in England with number 1021457 and a company registered in > England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed May 19 13:21:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:21:56 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: B:T:R:Samtools wraps Bio::Samtools ----- Original Message ----- From: Lincoln Stein To: Mark A. Jensen Cc: Albert Vilella ; bioperl-l at bioperl.org Sent: Wednesday, May 19, 2010 12:40 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Thu May 20 11:37:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 10:37:16 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> Message-ID: <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Yes, if you have time. I have started along that path already, but I'm sure there are lingering spots where links point to the wrong place, or subversion/svn is mentioned. chris On May 20, 2010, at 10:34 AM, Brian Osborne wrote: > Chris, > > Done, easy. Should I remove all references to SVN from the Wiki? > > Brian O. > > On May 18, 2010, at 2:04 PM, Chris Fields wrote: > >> Yes. >> >> chris >> >> On May 18, 2010, at 11:06 AM, Brian Osborne wrote: >> >>> bioperl-l, >>> >>> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >>> >>> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >>> >>> Brian O. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 20 12:05:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 11:05:56 -0500 Subject: [Bioperl-l] Regarding git commits... Message-ID: All, Please make sure to update your local git repos prior to doing commits and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. chris From florent.angly at gmail.com Thu May 20 12:22:50 2010 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 20 May 2010 09:22:50 -0700 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: <4BF561DA.1070700@gmail.com> On 20/05/10 09:05, Chris Fields wrote: > All, > > Please make sure to update your local git repos prior to doing commits That's done with "git pull", as mentioned on the wiki (http://www.bioperl.org/wiki/Using_Git), right? > and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bosborne11 at verizon.net Thu May 20 11:34:39 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 20 May 2010 11:34:39 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> Message-ID: <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> Chris, Done, easy. Should I remove all references to SVN from the Wiki? Brian O. On May 18, 2010, at 2:04 PM, Chris Fields wrote: > Yes. > > chris > > On May 18, 2010, at 11:06 AM, Brian Osborne wrote: > >> bioperl-l, >> >> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >> >> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >> >> Brian O. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu May 20 12:58:22 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 20 May 2010 09:58:22 -0700 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <4BF561DA.1070700@gmail.com> References: <4BF561DA.1070700@gmail.com> Message-ID: <4BF56A2E.8060309@bioperl.org> I think you want $ git pull upstream master http://help.github.com/forking/ Florent Angly wrote, On 5/20/10 9:22 AM: > On 20/05/10 09:05, Chris Fields wrote: >> All, >> >> Please make sure to update your local git repos prior to doing commits > That's done with "git pull", as mentioned on the wiki > (http://www.bioperl.org/wiki/Using_Git), right? > >> and pushing to master, and merge commits in properly if they don't >> match. Please please please don't save over files if they don't >> merge correctly. I just found out I had a prior commit that fixed >> the test number and removed old files that was completely clobbered, >> so I'm having to hand-merge those changes back in now. If it were >> anything more involved I would revert that prior commit completely. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu May 20 13:35:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 12:35:09 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <4BF56A2E.8060309@bioperl.org> References: <4BF561DA.1070700@gmail.com> <4BF56A2E.8060309@bioperl.org> Message-ID: <86401472-ECAB-4C21-8BD1-61AB37003F64@illinois.edu> Yes. The general syntax is: git pull If you have a read-write checkout directly from bioperl/bioperl-live.git, 'origin' should be set to that, and if you are on the a specific branch a simple 'git pull' will work (it implies 'git pull origin '). All collabs can do this. In the case of a forked repo (which anyone can do), it's a little trickier as it's essentially a branch from the repository at a specific point; it isn't automatically synced. You can see that here: http://github.com/bioperl/bioperl-live/network In order to sync with the original repo, you need to specify exactly which remote to pull from, likely not 'origin' (which is your forked repo), but 'upstream' or whatever you set the original bioperl read-only repo to via: git remote add upstream git://github.com/bioperl/bioperl-live.git Then, to sync, do: git pull upstream master git push # goes to your forked repo chris PS - Note on the graph linked to I just synced my branch using the above. On May 20, 2010, at 11:58 AM, Jason Stajich wrote: > I think you want > $ git pull upstream master > > http://help.github.com/forking/ > > Florent Angly wrote, On 5/20/10 9:22 AM: >> On 20/05/10 09:05, Chris Fields wrote: >>> All, >>> >>> Please make sure to update your local git repos prior to doing commits >> That's done with "git pull", as mentioned on the wiki (http://www.bioperl.org/wiki/Using_Git), right? >> >>> and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Thu May 20 14:06:13 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 20 May 2010 13:06:13 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: On May 20, 2010, at 11:05 AM, Chris Fields wrote: > Please make sure to update your local git repos prior to doing commits and pushing to master I thought git refused to push if your local was out of date? (I thought this was one of the general selling points of git?) It seems to be doing that to me, below. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah jhannah at jaysnet-MacBook:~/src/sandbox$ git push To git at github.com:jhannah/sandbox.git ! [rejected] master -> master (non-fast-forward) error: failed to push some refs to 'git at github.com:jhannah/sandbox.git' To prevent you from losing history, non-fast-forward updates were rejected Merge the remote changes before pushing again. See the 'Note about fast-forwards' section of 'git push --help' for details. From cjfields at illinois.edu Thu May 20 14:43:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 13:43:12 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. chris On May 20, 2010, at 1:06 PM, Jay Hannah wrote: > On May 20, 2010, at 11:05 AM, Chris Fields wrote: >> Please make sure to update your local git repos prior to doing commits and pushing to master > > I thought git refused to push if your local was out of date? (I thought this was one of the general selling points of git?) It seems to be doing that to me, below. > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > jhannah at jaysnet-MacBook:~/src/sandbox$ git push > To git at github.com:jhannah/sandbox.git > ! [rejected] master -> master (non-fast-forward) > error: failed to push some refs to 'git at github.com:jhannah/sandbox.git' > To prevent you from losing history, non-fast-forward updates were rejected > Merge the remote changes before pushing again. See the 'Note about > fast-forwards' section of 'git push --help' for details. > From jay at jays.net Thu May 20 15:09:00 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 20 May 2010 14:09:00 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> References: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> Message-ID: <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> On May 20, 2010, at 1:43 PM, Chris Fields wrote: > It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. So... you're saying don't commit if you don't have any idea what you're committing? :) git pull git diff git status if local is clean then -edit- git diff if it looks good then git commit git status if it looks good then git push Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah enjoys preaching to the choir ;) From cjfields at illinois.edu Thu May 20 15:24:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 14:24:17 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> References: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> Message-ID: <95305268-0D84-478C-A380-68E81742F18F@illinois.edu> On May 20, 2010, at 2:09 PM, Jay Hannah wrote: > On May 20, 2010, at 1:43 PM, Chris Fields wrote: >> It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. > > So... you're saying don't commit if you don't have any idea what you're committing? :) > > git pull > git diff > git status > if local is clean then > -edit- > git diff if it looks good then git commit > git status if it looks good then git push > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > enjoys preaching to the choir ;) Maybe the point is, if someone is having a problem with git either pulling from or pushing to the remote repo, it's very likely b/c of a merge conflict (git is trying to tell you something). There are lots of ways to resolve those (most easily by hand if the change is small). But saving over the top of someone else's commit in a re-cloned repo is definitely not one of them. Possibly a section of 'Using git' that needs some work? chris From charles.tilford at bms.com Thu May 20 16:27:27 2010 From: charles.tilford at bms.com (Charles Tilford) Date: Thu, 20 May 2010 16:27:27 -0400 Subject: [Bioperl-l] Bio::Species irritated with "unclassified sequences" Message-ID: <4BF59B2F.9000300@bms.com> Bio::Species::classification() is irritated with me when I provide it with a @class_array that is composed of one node, particularly: $obj->classification("unclassified sequences") AFAICT this is a valid, single node taxa "tree": http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=12908 Subroutine classification is expecting at least two class members, the problem with the above call crops up as: Use of uninitialized value $vals[1] in quotemeta at /stf/biocgi/tilfordc/patch_lib/Bio/Species.pm line 179 ( $Id: Species.pm 16700 2010-01-15 19:50:11Z dave_messina $) ... and the relevant code is: sub classification { my ($self, @vals) = @_; if (@vals) { if (ref($vals[0]) eq 'ARRAY') { @vals = @{$vals[0]}; } # make sure the lineage contains us as first or second element # (lineage may have subspecies, species, genus ...) my $name = $self->node_name; my ($genus, $species) = (quotemeta($vals[1]), quotemeta($vals[0])); That is, it's expecting at least (species, genus) in the array. Am I misusing classification(), or Bio::Species in general? I know it's named "Species", but I've been using it as a generic tree object for arbitrary taxonomy nodes, not just species and subspecies. This block a little lower down: unless ($self->rank) { # and that we are rank species $self->rank('species'); } ... implies that the module can be used for taxa ranks other than species. However, doing so would not prevent the module being aggravated over a null $vals[1]. The use case here is building Bio::Seq::RichSeq objects pulled from a (very large) sequence database, and then dumped / displayed with SeqIO. Most are well behaved, but there's a non-trivial number of 'artificial' constructs that don't root to an organism. -CAT From dimitark at bii.a-star.edu.sg Thu May 20 22:18:21 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 21 May 2010 10:18:21 +0800 Subject: [Bioperl-l] a problem with HspI module? Message-ID: <4BF5ED6D.6030506@bii.a-star.edu.sg> Hello guys, i think i found a problem with ' Bio::Search::HSP::HSPI'. Consider the following HSP: ------------- Score = 48.9 bits (115), Expect = 8e-04, Method: Compositional matrix adjust. Identities = 27/77 (35%), Positives = 40/77 (51%), Gaps = 14/77 (18%) Frame = +1 Query 371 PSGMLLA-----SCSDDMTLKIWSMKQEVCIHDLQAHNKEIYTIKWSPTGPATSNPNSNI 425 P LLA S S D T+++W ++Q VC H L H + +Y++ +SP G Sbjct 6955270 PGLQLLAFSHPPSASFDSTVRLWDVEQGVCTHTLMKHQEPVYSVAFSPDGK--------- 6955422 Query 426 MLASASFDSTVRLWDIE 442 LAS SFD V +W+ + Sbjct 6955423 YLASGSFDKYVHIWNTQ 6955473 --------------- The method 'frac_identical' is not functioning right. ------------- Title : frac_identical Usage : my $frac_id = $hsp->frac_identical( ['query'|'hit'|'total'] ); Function: Returns the fraction of identitical positions for this HSP Returns : Float in range 0.0 -> 1.0 Args : 'query' = num identical / length of query seq (without gaps) 'hit' = num identical / length of hit seq (without gaps) 'total' = num identical / length of alignment (with gaps) default = 'total' --------------- According to the method description, for the HSP above, 'frac_identical' should return '0.42' with 'hit'. But it doesnt. Now with 'hit' gives '0.13'. With 'total' gives normal result '0.35'. Thats all. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From cjfields at illinois.edu Thu May 20 22:24:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 21:24:46 -0500 Subject: [Bioperl-l] a problem with HspI module? In-Reply-To: <4BF5ED6D.6030506@bii.a-star.edu.sg> References: <4BF5ED6D.6030506@bii.a-star.edu.sg> Message-ID: It would be best to file this in a bug report, along with example data. chris On May 20, 2010, at 9:18 PM, Dimitar Kenanov wrote: > Hello guys, > i think i found a problem with ' Bio::Search::HSP::HSPI'. Consider the following HSP: > ------------- > Score = 48.9 bits (115), Expect = 8e-04, Method: Compositional matrix adjust. > Identities = 27/77 (35%), Positives = 40/77 (51%), Gaps = 14/77 (18%) > Frame = +1 > > Query 371 PSGMLLA-----SCSDDMTLKIWSMKQEVCIHDLQAHNKEIYTIKWSPTGPATSNPNSNI 425 > P LLA S S D T+++W ++Q VC H L H + +Y++ +SP G > Sbjct 6955270 PGLQLLAFSHPPSASFDSTVRLWDVEQGVCTHTLMKHQEPVYSVAFSPDGK--------- 6955422 > > Query 426 MLASASFDSTVRLWDIE 442 > LAS SFD V +W+ + > Sbjct 6955423 YLASGSFDKYVHIWNTQ 6955473 > --------------- > > The method 'frac_identical' is not functioning right. > ------------- > Title : frac_identical > Usage : my $frac_id = $hsp->frac_identical( ['query'|'hit'|'total'] ); > Function: Returns the fraction of identitical positions for this HSP > Returns : Float in range 0.0 -> 1.0 > Args : 'query' = num identical / length of query seq (without gaps) > 'hit' = num identical / length of hit seq (without gaps) > 'total' = num identical / length of alignment (with gaps) > default = 'total' > --------------- > According to the method description, for the HSP above, 'frac_identical' should return '0.42' with 'hit'. But it doesnt. Now with 'hit' gives '0.13'. With 'total' gives normal result '0.35'. > > Thats all. > Cheers > > Dimitar > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Fri May 21 13:44:26 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 10:44:26 -0700 Subject: [Bioperl-l] codon tables, finding ORFs Message-ID: <4BF6C67A.4040202@cornell.edu> Hi all, Right now, Bio::Tools::CodonTable uses as its 'standard' table the NCBI one, described at http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG1. This table recognizes three different start codons: the usual ATG, plus TTG and CTG (which I'd never heard of before looking there, seems they are rare). The issue is, if you use this codon scheme to find open reading frames in nucleotide sequences, you get some ORFs that I think a lot of biologists would be surprised at, from these two (rare?) start codons. Seems to me, this might be a problem. I mean, a naive user (which just about everyone is!) would expect the default codon table to only recognize the canonical ATG as a start, right? And would be rather displeased if BioPerl said (by default) that something starting with one of these rare codons was an open reading frame? So I guess my question is, do we think BioPerl (Bio::Tools::CodonTable) should really recognize these rare start codons by default? Rob From scott at scottcain.net Fri May 21 14:15:20 2010 From: scott at scottcain.net (Scott Cain) Date: Fri, 21 May 2010 14:15:20 -0400 Subject: [Bioperl-l] [Gmod-schema] Trying to load my first database In-Reply-To: References: Message-ID: Hi Daniel, I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. Of course, the file you sent me would be the same file you sent me yesterday; sorry for my poor memory :-) This file uncovered a bug in BioPerl in the FeatureIO module. While fixing the bug may be difficult, working around it might not be too bad. Additionally, I'm not sure we should fix it right now, as this is an effort underway to rework this section of BioPerl anyway. The good news is that the work around is fairly simple. In the GFF that MAKER created, when parsing prodigal output, it generates GFF lines like this: Contig125 pred_gff:prodigal_v2.00 match 104 1723 157.5 + . ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; The tricky part is this tag/value in the ninth column: type=ATG. The tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is in the third column, so when it is parsing this line of GFF, it tries to reassign the feature type to something that isn't valid. The work around is pretty easy: since "type" is a problematic tag, and it appears that the type tag here is defining the start type, I would suggest doing a global search and replace on the file to replace "type=" with "start_type=". I did that and the file loaded fine. I don't know if it is MAKER that creates this tag or the BioPerl parser for prodigal, but changing this at the source might be nice (of course, it might also break somebody else's code :-/ I'll enter a bug for this in the BioPerl bug tracker. Scott On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: > Hi Scott, > > I used Maker to generate the attached file. > > -Daniel > > On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >> Hi Daniel, >> >> Please keep the schema mailing list cc'ed in so the responses can be >> archived and more eyes than just mine can try to solve the problem. >> >> Can you send a sample of the GFF that is causing the problem? ?Any >> ontology term that is in Chado should be "legal." ?If there's >> something causing a problem, we need to figure out what it is. >> >> Scott >> >> >> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>> Hi Scott, >>> >>> I am using the same image as we used in class. ?I was able to load >>> each of the examples in the GMOD course (Pythium) and on the Chado >>> website (yeast). >>> >>> On another note, is there an easy way to navigate the ontology terms >>> that are legal and standard in both GFF3 and in Chado. ?I am having >>> trouble understanding how to convert from an arbitrary analysis (e.g. >>> Blasting KEGG) into a format that works. >>> >>> Thanks so much! >>> -Daniel >>> >>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>> Hi Daniel, >>>> >>>> That error message looks like one that would come from an older >>>> version of BioPerl. ?What version do you have? >>>> >>>> Scott >>>> >>>> >>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>> Hi Scott, >>>>> >>>>> Thanks for the reply. ?Sorry, I should have been able to track down >>>>> that error. ?Could you tell me what the following error means? >>>>> >>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>> --noexon --recreate_cache >>>>> (Re)creating the uniquename cache in the database... >>>>> Creating table... >>>>> Populating table... >>>>> Creating indexes... >>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>> Preparing data for inserting into the chado database >>>>> (This may take a while ...) >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>> with key type. If you were adding new keys in, perhaps you want to >>>>> make use >>>>> of the archetype method to allow registration to a more basic type >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>> ----------------------------------------------------------- >>>>> >>>>> Abnormal termination, trying to clean up... >>>>> >>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>> won't be needed)... >>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>> Exiting... >>>>> >>>>> >>>>> Thanks so much! >>>>> -Daniel >>>>> >>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>> Hi Daniel, >>>>>> >>>>>> The error message you got said that the GFF file that you are trying >>>>>> to load couldn't be found; are you sure the path was correct? ?The >>>>>> file itself looks OK. >>>>>> >>>>>> Scott >>>>>> >>>>>> >>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>> Hello All, >>>>>>> >>>>>>> I am trying to load my first genome from maker. ?Not sure what the >>>>>>> problem is... any help is awesome! ?I am attaching at least part of >>>>>>> the dataset. >>>>>>> >>>>>>> -Daniel >>>>>>> >>>>>>> >>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>> --noexon >>>>>>> >>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>> MSG: Could not open >>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>> such file or directory >>>>>>> STACK: Error::throw >>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>> ----------------------------------------------------------- >>>>>>> >>>>>>> Abnormal termination, trying to clean up... >>>>>>> >>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>> Exiting... >>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gmod-schema mailing list >>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ------------------------------------------------------------------------ >>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>> Ontario Institute for Cancer Research >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >> Ontario Institute for Cancer Research >> > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Fri May 21 14:45:01 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 21 May 2010 14:45:01 -0400 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6C67A.4040202@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> Message-ID: <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> Rob, The user will use translate(), which can do something like this: $prot_obj = $my_seq_object->translate(-orf => 1, -start => "atg" ); CodonTable does little more than hold the codon/aa data. All the useful work is done by translate(), and there are lots of options. Here is part of the documentation: Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! Brian O. On May 21, 2010, at 1:44 PM, Robert Buels wrote: > Hi all, > > Right now, Bio::Tools::CodonTable uses as its 'standard' table the NCBI one, described at http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG1. > > This table recognizes three different start codons: the usual ATG, plus TTG and CTG (which I'd never heard of before looking there, seems they are rare). > > The issue is, if you use this codon scheme to find open reading frames in nucleotide sequences, you get some ORFs that I think a lot of biologists would be surprised at, from these two (rare?) start codons. > > Seems to me, this might be a problem. I mean, a naive user (which just about everyone is!) would expect the default codon table to only recognize the canonical ATG as a start, right? And would be rather displeased if BioPerl said (by default) that something starting with one of these rare codons was an open reading frame? > > So I guess my question is, do we think BioPerl (Bio::Tools::CodonTable) should really recognize these rare start codons by default? > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From briano at bioteam.net Fri May 21 14:52:19 2010 From: briano at bioteam.net (Brian Osborne) Date: Fri, 21 May 2010 14:52:19 -0400 Subject: [Bioperl-l] What is CPAN doing? Message-ID: bioperl-l, Here's the POD for the translate() method: =head2 translate Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Or if you expect a complete coding sequence (CDS) translation, with inititator at the beginning and terminator at the end: $protein_seq_obj = $cds_seq_obj->translate(-complete => 1); Or if you want translate() to find the first initiation codon and return the corresponding protein: $protein_seq_obj = $cds_seq_obj->translate(-orf => 1); Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The complete CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translated protein object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! By default translate() translates termination codons to the some character (default is *), both internal and trailing codons. Setting "-complete" to 1 tells translate() to remove the trailing character. -offset is used for seqfeatures which contain the the \codon_start tag and can be set to 1, 2, or 3. This is the offset by which the sequence translation starts relative to the first base of the feature For details on codon tables used by translate() see L. Deprecated argument set (v. 1.5.1 and prior versions) where each argument is an element in an array: 1: character for terminator (optional), defaults to '*'. 2: character for unknown amino acid (optional), defaults to 'X'. 3: frame (optional), valid values are 0, 1, 2, defaults to 0. 4: codon table id (optional), defaults to 1. 5: complete coding sequence expected, defaults to 0 (false). 6: boolean, throw exception if not complete coding sequence (true), defaults to warning (false) 7: codontable, a custom Bio::Tools::CodonTable object (optional). =cut And here's what appears on CPAN: Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The full CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translation object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : character for terminator (optional) defaults to '*' character for unknown amino acid (optional) defaults to 'X' frame (optional) valid values 0, 1, 2, defaults to 0 codon table id (optional) defaults to 1 complete coding sequence expected, defaults to 0 (false) boolean, throw exception if not complete CDS (true) or defaults to warning (false) Most of the POD is missing - does anyone know why? Brian O. From barani at avesthagen.com Thu May 20 07:27:04 2010 From: barani at avesthagen.com (barani at avesthagen.com) Date: Thu, 20 May 2010 16:57:04 +0530 (IST) Subject: [Bioperl-l] Bio::Biblio find method proxy problem Message-ID: <49660.192.168.1.5.1274354824.squirrel@mail.avesthagen.com> Hi, Our lab is behind firewall. I am using FC10 Linux. I have set the httpproxy in /etc/bash_profile. I am searching for research articles using Bio::Biblio "find" method as shown in the following PERL code.This program executes well, when I run it in the command line. But when i use the same code in PERL CGI, it does not work.(Says "couldn't retrieve results from http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"). Is there anyway that I can set the proxy within the codes as argument and make it executable ? It will be very useful if you guys can help me. ##################################################### #!/usr/bin/perl use Bio::Biblio; use Bio::Biblio::IO; my $search="ABySS[title] AND (Simpson[Author]) AND 2009[dp]"; my $biblio = Bio::Biblio->new(-access=> 'eutils'); $biblio->find($search)->has_next; while(my $xml = $biblio->get_next){ my $io = Bio::Biblio::IO->new( -data => $xml, -format => 'medlinexml' ); my $article = $io->next_bibref(); >>>>>>>>>>>>>>> XML Parser >>>>>>>>>>>> <<<<<<<<<<<<<<< XML Parser <<<<<<<<<<<< } ############################################################### Best Regards barani ----------------------------------- Baranidharan P Project Head Bioinformatics - Genomics Group Avesthagen Ltd Ground floor, Innovator Building International Tech Park Bangalore Whitefield Bangalore - 560066 Ph. 09900727597 Mail Off .barani at avesthagen.com Per. baranidharanp at gmail.com ------------------------------------- From bbimber at gmail.com Fri May 21 09:58:03 2010 From: bbimber at gmail.com (Ben Bimber) Date: Fri, 21 May 2010 08:58:03 -0500 Subject: [Bioperl-l] CommandExts and arrays Message-ID: I am getting an error when trying to pass an array as a param with command exts. I hope there is something obvious i'm missing, but I cant seem to figure this out. I am trying to run the merge two BAM files using Bio::Tools::Run::Samtools using something like this: my $new_bam = Bio::Tools::Run::Samtools->new( -command => 'merge', -program_dir => '/usr/bin/samtools/', )->run( -obm => output_file.bam', -ibm => ['file1.bam', 'file2.bam'], ); When i use an array for the -ibm param, I get an error saying 'cannot use string 'file1' as an arrayref while strict refs in place'. The error comes from this code in CommandExts.pm, around line 989. adding 'no strict' right before the final line stops the error: # expand arrayrefs my $l = $#files; for (0..$l) { if (ref($files[$_]) eq 'ARRAY') { splice(@files, $_, 1, @{$files[$_]}); #error thrown from this line splice(@switches, $_, 1, ($switches[$_]) x @{$files[$_]}); } Thanks for the help. From daniel.quest at gmail.com Fri May 21 15:34:35 2010 From: daniel.quest at gmail.com (Daniel Quest) Date: Fri, 21 May 2010 12:34:35 -0700 Subject: [Bioperl-l] [Gmod-schema] Trying to load my first database In-Reply-To: References: Message-ID: Hey Scott, Thanks so much for the work on this. I have CC'ed Doug Hyatt, the developer of Prodigal so that he is aware of this problem. I am thinking that Maker just passed the Prodigal tags through and then the conflict happened on the Chado load. From my POV it is probably easiest to make small changes to the Prodigal GFF3 output to sync up with the Chado schema. Thanks so much -Daniel On Fri, May 21, 2010 at 11:15 AM, Scott Cain wrote: > Hi Daniel, > > I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. > > Of course, the file you sent me would be the same file you sent me > yesterday; sorry for my poor memory :-) > > This file uncovered a bug in BioPerl in the FeatureIO module. ?While > fixing the bug may be difficult, working around it might not be too > bad. ?Additionally, I'm not sure we should fix it right now, as this > is an effort underway to rework this section of BioPerl anyway. ?The > good news is that the work around is fairly simple. > > In the GFF that MAKER created, when parsing prodigal output, it > generates GFF lines like this: > > Contig125 ? ? ? pred_gff:prodigal_v2.00 match ? 104 ? ? 1723 ? ?157.5 > ?+ ? ? ? . > ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; > > The tricky part is this tag/value in the ninth column: type=ATG. ?The > tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is > in the third column, so when it is parsing this line of GFF, it tries > to reassign the feature type to something that isn't valid. ?The work > around is pretty easy: since "type" is a problematic tag, and it > appears that the type tag here is defining the start type, I would > suggest doing a global search and replace on the file to replace > "type=" with "start_type=". ?I did that and the file loaded fine. ?I > don't know if it is MAKER that creates this tag or the BioPerl parser > for prodigal, but changing this at the source might be nice (of > course, it might also break somebody else's code :-/ ?I'll enter a bug > for this in the BioPerl bug tracker. > > Scott > > > On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: >> Hi Scott, >> >> I used Maker to generate the attached file. >> >> -Daniel >> >> On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >>> Hi Daniel, >>> >>> Please keep the schema mailing list cc'ed in so the responses can be >>> archived and more eyes than just mine can try to solve the problem. >>> >>> Can you send a sample of the GFF that is causing the problem? ?Any >>> ontology term that is in Chado should be "legal." ?If there's >>> something causing a problem, we need to figure out what it is. >>> >>> Scott >>> >>> >>> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>>> Hi Scott, >>>> >>>> I am using the same image as we used in class. ?I was able to load >>>> each of the examples in the GMOD course (Pythium) and on the Chado >>>> website (yeast). >>>> >>>> On another note, is there an easy way to navigate the ontology terms >>>> that are legal and standard in both GFF3 and in Chado. ?I am having >>>> trouble understanding how to convert from an arbitrary analysis (e.g. >>>> Blasting KEGG) into a format that works. >>>> >>>> Thanks so much! >>>> -Daniel >>>> >>>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>>> Hi Daniel, >>>>> >>>>> That error message looks like one that would come from an older >>>>> version of BioPerl. ?What version do you have? >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>>> Hi Scott, >>>>>> >>>>>> Thanks for the reply. ?Sorry, I should have been able to track down >>>>>> that error. ?Could you tell me what the following error means? >>>>>> >>>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>>> --noexon --recreate_cache >>>>>> (Re)creating the uniquename cache in the database... >>>>>> Creating table... >>>>>> Populating table... >>>>>> Creating indexes... >>>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>>> Preparing data for inserting into the chado database >>>>>> (This may take a while ...) >>>>>> >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>>> with key type. If you were adding new keys in, perhaps you want to >>>>>> make use >>>>>> of the archetype method to allow registration to a more basic type >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Abnormal termination, trying to clean up... >>>>>> >>>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>>> won't be needed)... >>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>> Exiting... >>>>>> >>>>>> >>>>>> Thanks so much! >>>>>> -Daniel >>>>>> >>>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> The error message you got said that the GFF file that you are trying >>>>>>> to load couldn't be found; are you sure the path was correct? ?The >>>>>>> file itself looks OK. >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>>> Hello All, >>>>>>>> >>>>>>>> I am trying to load my first genome from maker. ?Not sure what the >>>>>>>> problem is... any help is awesome! ?I am attaching at least part of >>>>>>>> the dataset. >>>>>>>> >>>>>>>> -Daniel >>>>>>>> >>>>>>>> >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>>> --noexon >>>>>>>> >>>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>>> MSG: Could not open >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>>> such file or directory >>>>>>>> STACK: Error::throw >>>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>>> ----------------------------------------------------------- >>>>>>>> >>>>>>>> Abnormal termination, trying to clean up... >>>>>>>> >>>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>>> Exiting... >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gmod-schema mailing list >>>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>> Ontario Institute for Cancer Research >>> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > From rmb32 at cornell.edu Fri May 21 16:11:24 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 13:11:24 -0700 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> Message-ID: <4BF6E8EC.6050001@cornell.edu> Brian Osborne wrote: > The user will use translate(), which can do something like this: > > $prot_obj = $my_seq_object->translate(-orf => 1, > -start => "atg" ); Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. Rob From carson.holt at genetics.utah.edu Fri May 21 15:53:35 2010 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 21 May 2010 13:53:35 -0600 Subject: [Bioperl-l] [maker-devel] [Gmod-schema] Trying to load my first database In-Reply-To: Message-ID: That is correct. MAKER will just pass user defined GFF3 tags through rather than trying to make sense of them or trimming them off. Carson On 5/21/10 1:34 PM, "Daniel Quest" wrote: Hey Scott, Thanks so much for the work on this. I have CC'ed Doug Hyatt, the developer of Prodigal so that he is aware of this problem. I am thinking that Maker just passed the Prodigal tags through and then the conflict happened on the Chado load. From my POV it is probably easiest to make small changes to the Prodigal GFF3 output to sync up with the Chado schema. Thanks so much -Daniel On Fri, May 21, 2010 at 11:15 AM, Scott Cain wrote: > Hi Daniel, > > I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. > > Of course, the file you sent me would be the same file you sent me > yesterday; sorry for my poor memory :-) > > This file uncovered a bug in BioPerl in the FeatureIO module. While > fixing the bug may be difficult, working around it might not be too > bad. Additionally, I'm not sure we should fix it right now, as this > is an effort underway to rework this section of BioPerl anyway. The > good news is that the work around is fairly simple. > > In the GFF that MAKER created, when parsing prodigal output, it > generates GFF lines like this: > > Contig125 pred_gff:prodigal_v2.00 match 104 1723 157.5 > + . > ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; > > The tricky part is this tag/value in the ninth column: type=ATG. The > tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is > in the third column, so when it is parsing this line of GFF, it tries > to reassign the feature type to something that isn't valid. The work > around is pretty easy: since "type" is a problematic tag, and it > appears that the type tag here is defining the start type, I would > suggest doing a global search and replace on the file to replace > "type=" with "start_type=". I did that and the file loaded fine. I > don't know if it is MAKER that creates this tag or the BioPerl parser > for prodigal, but changing this at the source might be nice (of > course, it might also break somebody else's code :-/ I'll enter a bug > for this in the BioPerl bug tracker. > > Scott > > > On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: >> Hi Scott, >> >> I used Maker to generate the attached file. >> >> -Daniel >> >> On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >>> Hi Daniel, >>> >>> Please keep the schema mailing list cc'ed in so the responses can be >>> archived and more eyes than just mine can try to solve the problem. >>> >>> Can you send a sample of the GFF that is causing the problem? Any >>> ontology term that is in Chado should be "legal." If there's >>> something causing a problem, we need to figure out what it is. >>> >>> Scott >>> >>> >>> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>>> Hi Scott, >>>> >>>> I am using the same image as we used in class. I was able to load >>>> each of the examples in the GMOD course (Pythium) and on the Chado >>>> website (yeast). >>>> >>>> On another note, is there an easy way to navigate the ontology terms >>>> that are legal and standard in both GFF3 and in Chado. I am having >>>> trouble understanding how to convert from an arbitrary analysis (e.g. >>>> Blasting KEGG) into a format that works. >>>> >>>> Thanks so much! >>>> -Daniel >>>> >>>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>>> Hi Daniel, >>>>> >>>>> That error message looks like one that would come from an older >>>>> version of BioPerl. What version do you have? >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>>> Hi Scott, >>>>>> >>>>>> Thanks for the reply. Sorry, I should have been able to track down >>>>>> that error. Could you tell me what the following error means? >>>>>> >>>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>>> --noexon --recreate_cache >>>>>> (Re)creating the uniquename cache in the database... >>>>>> Creating table... >>>>>> Populating table... >>>>>> Creating indexes... >>>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>>> Preparing data for inserting into the chado database >>>>>> (This may take a while ...) >>>>>> >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>>> with key type. If you were adding new keys in, perhaps you want to >>>>>> make use >>>>>> of the archetype method to allow registration to a more basic type >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Abnormal termination, trying to clean up... >>>>>> >>>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>>> won't be needed)... >>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>> Exiting... >>>>>> >>>>>> >>>>>> Thanks so much! >>>>>> -Daniel >>>>>> >>>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> The error message you got said that the GFF file that you are trying >>>>>>> to load couldn't be found; are you sure the path was correct? The >>>>>>> file itself looks OK. >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>>> Hello All, >>>>>>>> >>>>>>>> I am trying to load my first genome from maker. Not sure what the >>>>>>>> problem is... any help is awesome! I am attaching at least part of >>>>>>>> the dataset. >>>>>>>> >>>>>>>> -Daniel >>>>>>>> >>>>>>>> >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>>> --noexon >>>>>>>> >>>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>>> MSG: Could not open >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>>> such file or directory >>>>>>>> STACK: Error::throw >>>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>>> ----------------------------------------------------------- >>>>>>>> >>>>>>>> Abnormal termination, trying to clean up... >>>>>>>> >>>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>>> Exiting... >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gmod-schema mailing list >>>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. scott at scottcain dot net >>>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. scott at scottcain dot net >>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Fri May 21 16:44:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 15:44:18 -0500 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6E8EC.6050001@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> Message-ID: <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> On May 21, 2010, at 3:11 PM, Robert Buels wrote: > Brian Osborne wrote: >> The user will use translate(), which can do something like this: >> $prot_obj = $my_seq_object->translate(-orf => 1, >> -start => "atg" ); > > Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. > > Rob Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. chris From rmb32 at cornell.edu Fri May 21 16:48:20 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 13:48:20 -0700 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> Message-ID: <4BF6F194.3080209@cornell.edu> Chris Fields wrote: > Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. > > chris Oh they're available, CodonTable has a number of tables in it that you make translate() use optionally, and there are bacterial tables in there (but they are not well documented). The default behavior is the 'NCBI standard' (eukaryotic) table that I linked to in the original post on this thread. What I am looking for is a discussion of what the best default behavior of $seq->translate( -orf => 1 ) with no arguments should be. But also, there should be better documentation about the codon tables that are available, I can add that in my topic/longest_orf branch. Rob From cjfields at illinois.edu Fri May 21 16:52:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 15:52:15 -0500 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6F194.3080209@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> <4BF6F194.3080209@cornell.edu> Message-ID: <06B1B1F1-979F-461C-BC9B-57A79C26CCE7@illinois.edu> On May 21, 2010, at 3:48 PM, Robert Buels wrote: > Chris Fields wrote: > > Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. > > > > chris > > Oh they're available, CodonTable has a number of tables in it that you make translate() use optionally, and there are bacterial tables in there (but they are not well documented). The default behavior is the 'NCBI standard' (eukaryotic) table that I linked to in the original post on this thread. > > What I am looking for is a discussion of what the best default behavior of $seq->translate( -orf => 1 ) with no arguments should be. Probably the simplest, with documentation on how to change it when needed. > But also, there should be better documentation about the codon tables that are available, I can add that in my topic/longest_orf branch. > > Rob Agreed. More docs never hurt. chris From bosborne11 at verizon.net Fri May 21 16:32:30 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 21 May 2010 16:32:30 -0400 Subject: [Bioperl-l] codon tables, finding ORFs Message-ID: Rob, translate() is one of these methods where reading the documentation is required. Or to put it another way, if you tried to use it without reading the docs most of the time you'd get a result that differs from what you wanted, given the variety of ways to use it, quite apart from the issue of the 3 initiation codons. So really, you have to read the docs, and they say: By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! My concern right now is that CPAN has removed this text and more! If you wanted to add an additional codon table and make it a default I have no problem with that. But, the "naive user" who doesn't read the documentation is probably still going to get "surprising" results. I don't think there's any way around RTFM for this method, changing the default table does not change this. Brian O. On May 21, 2010, at 4:11 PM, Robert Buels wrote: > Brian Osborne wrote: >> The user will use translate(), which can do something like this: >> $prot_obj = $my_seq_object->translate(-orf => 1, >> -start => "atg" ); > > Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. > > Rob From rmb32 at cornell.edu Fri May 21 17:53:34 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 14:53:34 -0700 Subject: [Bioperl-l] POD rendering question/problem (was [Fwd: What is CPAN doing?]) Message-ID: <4BF700DE.8040804@cornell.edu> Hi search.cpan.org maintainers, For one of the methods in BioPerl, a good portion of the POD that's in the source [1] isn't being rendered into HTML on its search.cpan.org page [2]. We'd like to get this POD displaying properly, either by us (BioPerl) tweaking the POD on our end, or by you guys tweaking whatever process is making the HTML. So: do we need to tweak our POD to get it displaying properly? If so, what needs to change in that POD? Rob [1] The source and POD in question: http://search.cpan.org/src/CJFIELDS/BioPerl-1.6.1/Bio/PrimarySeqI.pm [2] The HTML in question: http://search.cpan.org/~cjfields/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm#translate -------- Original Message -------- Subject: [Bioperl-l] What is CPAN doing? Date: Fri, 21 May 2010 14:52:19 -0400 From: Brian Osborne To: BioPerl List bioperl-l, Here's the POD for the translate() method: =head2 translate Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Or if you expect a complete coding sequence (CDS) translation, with inititator at the beginning and terminator at the end: $protein_seq_obj = $cds_seq_obj->translate(-complete => 1); Or if you want translate() to find the first initiation codon and return the corresponding protein: $protein_seq_obj = $cds_seq_obj->translate(-orf => 1); Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The complete CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translated protein object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! By default translate() translates termination codons to the some character (default is *), both internal and trailing codons. Setting "-complete" to 1 tells translate() to remove the trailing character. -offset is used for seqfeatures which contain the the \codon_start tag and can be set to 1, 2, or 3. This is the offset by which the sequence translation starts relative to the first base of the feature For details on codon tables used by translate() see L. Deprecated argument set (v. 1.5.1 and prior versions) where each argument is an element in an array: 1: character for terminator (optional), defaults to '*'. 2: character for unknown amino acid (optional), defaults to 'X'. 3: frame (optional), valid values are 0, 1, 2, defaults to 0. 4: codon table id (optional), defaults to 1. 5: complete coding sequence expected, defaults to 0 (false). 6: boolean, throw exception if not complete coding sequence (true), defaults to warning (false) 7: codontable, a custom Bio::Tools::CodonTable object (optional). =cut And here's what appears on CPAN: Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The full CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translation object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : character for terminator (optional) defaults to '*' character for unknown amino acid (optional) defaults to 'X' frame (optional) valid values 0, 1, 2, defaults to 0 codon table id (optional) defaults to 1 complete coding sequence expected, defaults to 0 (false) boolean, throw exception if not complete CDS (true) or defaults to warning (false) Most of the POD is missing - does anyone know why? Brian O. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Fri May 21 17:56:37 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 21 May 2010 23:56:37 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser Message-ID: <1274478997.1997.4.camel@gonzo.home.kblin.org> Hi list, hi Thomas, I've just bumped into the fact that bioperl-live still doesn't seem to support the hmmer3 hmmscan output format (thanks for the help at #bioperl). The nice folks on IRC pointed me at an email from Thomas Sharpton, noting that he was already working on a parser for this. So I thought I'd ask about the status of that before I run off writing my own. Is there anything I can help with? Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From rmb32 at cornell.edu Fri May 21 18:32:20 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 15:32:20 -0700 Subject: [Bioperl-l] [perl #75252] AutoReply: POD rendering question/problem (was [Fwd: What is CPAN doing?]) In-Reply-To: References: <4BF700DE.8040804@cornell.edu> Message-ID: <4BF709F4.4030705@cornell.edu> Doing a little more investigation, the culprit seems to actually be a stray old (non-installed) version of the module in our uploaded dist. No action required on your part, unless there is a tweak to the indexing that would have not made this module be the top hit. Status: resolved Rob From cjfields at illinois.edu Fri May 21 19:22:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 18:22:41 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274478997.1997.4.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> Message-ID: <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. Relevant commit msg here: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl =========================================== dev.open-bio.org - Authorized Access Only =========================================== ... bioperl-hmmer3/ ... perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 =========================================== dev.open-bio.org - Authorized Access Only =========================================== perllib cjfields$ chris On May 21, 2010, at 4:56 PM, Kai Blin wrote: > Hi list, hi Thomas, > > I've just bumped into the fact that bioperl-live still doesn't seem to > support the hmmer3 hmmscan output format (thanks for the help at > #bioperl). The nice folks on IRC pointed me at an email from Thomas > Sharpton, noting that he was already working on a parser for this. So I > thought I'd ask about the status of that before I run off writing my > own. Is there anything I can help with? > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin > Abteilung Mikrobiologie/Biotechnologie > Eberhard-Karls-Universit?t T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Mon May 24 06:19:55 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 12:19:55 +0200 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: Hi Ben, This looks like it might be a bug. When I ask for the filespec for the 'merge' command: my @filespec = $new_bam->filespec; print join "\n", @filespec, "\n"; I get: obm *ibm (note the leading '*'). Could you please submit this as a bug? http://www.bioperl.org/wiki/Bugs Thanks, Dave From David.Messina at sbc.su.se Mon May 24 09:00:56 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 15:00:56 +0200 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: <8565_1274696770_ZZg0Z3D5iEeCi.00_C34B77C6-2A3E-4B97-83C2-9BE8679CA331@sbc.su.se> Message-ID: > ok, i put in that bug. Thanks. > why exactly does having the asterisk indicate > this is a bug? i thought the asterisk indicated that multiple values > were allowed for that argument? Ah okay, my ignorance of this module is showing. :) > on a related note, are we supposed to be able to pass file names that > have spaces to command exts? on the few cases where this came up, i > have never seemed to get this to work right, so i just got rid of the > spaces. Sorry, I don't know. Paging Mark Jensen ? have you got a moment to look into this? Dave From diment at gmail.com Sat May 22 04:25:55 2010 From: diment at gmail.com (Kieren Diment) Date: Sat, 22 May 2010 18:25:55 +1000 Subject: [Bioperl-l] OT: The Perl Survey Message-ID: <63B7289C-E218-4BBB-A5A4-33AFECA4C867@gmail.com> Hi, Sorry about the off topic posting, but I'm trying to get as large a sample of programmers that use Perl as possible. The Perl Foundation have funded The Perl Survey, 2010 which is ready for people to complete at http://survey.perlfoundation.org. If you could spend a little time to complete the survey, we would be most grateful. It should take around 10-15 minutes to complete. The official announcement is at: http://news.perlfoundation.org/2010/05/grant-update-the-perl-survey-1.html Thanks in advance Kieren Diment From parametres-personnels at hotmail.fr Sun May 23 11:57:14 2010 From: parametres-personnels at hotmail.fr (NamNAme) Date: Sun, 23 May 2010 08:57:14 -0700 (PDT) Subject: [Bioperl-l] Pfam database Message-ID: <28650160.post@talk.nabble.com> Dear all, A few weeks ago I wrote a program that need the pfam database, and I tested it on the first version of pfam where each protein family sequences are in one file. But now I would like to test it on the last version of pfam but the organization changed. I've found a file called Pfam-A.fasta which contains sequences and the family they belong to. But the sequences inside are not complete. So, I've two questions : Why these sequences are not complete ? And, How can I find a file with complete sequences and the family they belong to ? Thank you for your help. Bye. P-S : There is the file pfamseq, I tried to make a script to read it and then retreive the database structure i want but, this file is enourmous and use too much memory so it crashed. -- View this message in context: http://old.nabble.com/Pfam-database-tp28650160p28650160.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From staffa at niehs.nih.gov Mon May 24 10:32:26 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Mon, 24 May 2010 10:32:26 -0400 Subject: [Bioperl-l] Restriction Enzymes Message-ID: So, back in 2007 I wrote a script using use Bio::Tools::RestrictionEnzyme; and generated some useful restriction maps for a client. This year he comes back to me with some very new enzymes that RestrictionEnzyme did not recognize. I erroneously thought that I needed an update of BioPerl, which I requested of SysAdmin. They did this across the board, there is no going back. (I did learn about the NEB file that needed to be installed) Now it appears that I must re-write my scripts because RestrictionEnzyme is not known to the latest version of bioperl. Is this true? How hard would it be to keep things backward compatible. Have I missed something here? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Enterprise-Wide Information Technology Support Contract National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From David.Messina at sbc.su.se Mon May 24 11:55:45 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 17:55:45 +0200 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <4046E576-2109-45BB-969C-F0B6F5749957@sbc.su.se> Hi Nick, Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that module, see also: http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > How hard would it be to keep things backward compatible. > Have I missed something here? I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones are intended to be at least partially backwards compatible. Dave From cjfields at illinois.edu Mon May 24 11:58:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 10:58:11 -0500 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > So, back in 2007 I wrote a script using > > use Bio::Tools::RestrictionEnzyme; > > and generated some useful restriction maps for a client. > > This year he comes back to me with some very new enzymes > that RestrictionEnzyme did not recognize. I erroneously thought that I > needed an update of BioPerl, which I requested of SysAdmin. > They did this across the board, there is no going back. > (I did learn about the NEB file that needed to be installed) > > Now it appears that I must re-write my scripts because RestrictionEnzyme is > not known to the latest version of bioperl. Is this true? > How hard would it be to keep things backward compatible. > Have I missed something here? Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, with removal at 1.6 (an announcement was made to the list regarding this, with no respondents, prior to the 1.6.0 release). The live version of the DEPRECATED docs are here: http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED If I understand correctly, the main reason was most development was put into Bio::Restriction modules, with very little change occurring in Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme and call it via a 'use lib' directive (or local::lib) or package it with your script, it should still work. However, from my perspective, if the older module wasn't recognizing specific enzyme cut sites, and the supported one did, wouldn't it be easier to modify your script to use the newer supported one instead? If the supported Bio::Restriction modules don't recognize the new sites I would consider that a bug. > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Enterprise-Wide Information Technology Support Contract > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina chris From maj at fortinbras.us Mon May 24 12:21:03 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 May 2010 12:21:03 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <13392E899AB04A0E8F66336CDBE417BE@NewLife> The rewrite this summer of Bio::Restriction made several funky enzyme (non-pal, non-symmetric) types workable. I would think it wouldn't be too onerous to convert code to the new system and have it work rather quickly- MAJ ----- Original Message ----- From: "Chris Fields" To: "Staffa, Nick (NIH/NIEHS) [C]" Cc: "Bioperl-l" Sent: Monday, May 24, 2010 11:58 AM Subject: Re: [Bioperl-l] Restriction Enzymes > On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > >> So, back in 2007 I wrote a script using >> >> use Bio::Tools::RestrictionEnzyme; >> >> and generated some useful restriction maps for a client. >> >> This year he comes back to me with some very new enzymes >> that RestrictionEnzyme did not recognize. I erroneously thought that I >> needed an update of BioPerl, which I requested of SysAdmin. >> They did this across the board, there is no going back. >> (I did learn about the NEB file that needed to be installed) >> >> Now it appears that I must re-write my scripts because RestrictionEnzyme is >> not known to the latest version of bioperl. Is this true? >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, > with removal at 1.6 (an announcement was made to the list regarding this, with > no respondents, prior to the 1.6.0 release). The live version of the > DEPRECATED docs are here: > > http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED > > If I understand correctly, the main reason was most development was put into > Bio::Restriction modules, with very little change occurring in > Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older > BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme > and call it via a 'use lib' directive (or local::lib) or package it with your > script, it should still work. > > However, from my perspective, if the older module wasn't recognizing specific > enzyme cut sites, and the supported one did, wouldn't it be easier to modify > your script to use the newer supported one instead? If the supported > Bio::Restriction modules don't recognize the new sites I would consider that a > bug. > >> Nick Staffa >> Telephone: 919-316-4569 (NIEHS: 6-4569) >> Scientific Computing Support Group >> NIEHS Enterprise-Wide Information Technology Support Contract >> National Institute of Environmental Health Sciences >> National Institutes of Health >> Research Triangle Park, North Carolina > > > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Mon May 24 12:54:29 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 24 May 2010 09:54:29 -0700 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] Message-ID: <4BFAAF45.4090400@cornell.edu> -------- Original Message -------- Subject: Re: [perl #75252] POD rendering question/problem (was [Fwd: [Bioperl-l] What is CPAN doing?]) Date: Mon, 24 May 2010 08:33:35 -0700 From: Graham Barr via RT Reply-To: search-rt at cpan.org To: rmb32 at cornell.edu References: <4BF700DE.8040804 at cornell.edu> <3F316B7B-DBCC-4668-94E4-45471ED5ACBB at pobox.com> On May 21, 2010, at 4:54 PM, Robert Buels via RT wrote: > > [1] The source and POD in question: > http://search.cpan.org/src/CJFIELDS/BioPerl-1.6.1/Bio/PrimarySeqI.pm > > [2] The HTML in question: > http://search.cpan.org/~cjfields/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm#translate that HTML is not for the above POD, it is located at http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/PrimarySeqI.pm the issue seems to be that when displaying the POD from the examples directory the source link is linking to the real module the html shown in [2] is representative of http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm IMO it is confusing to include 2 different copies of the same module. I would suggest adding to META.yml no_index: dir: - examples/root/lib Graham. From staffa at niehs.nih.gov Mon May 24 14:32:54 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Mon, 24 May 2010 14:32:54 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: <13392E899AB04A0E8F66336CDBE417BE@NewLife> Message-ID: Thanks, all. On 5/24/10 12:21 PM, "Mark A. Jensen" wrote: The rewrite this summer of Bio::Restriction made several funky enzyme (non-pal, non-symmetric) types workable. I would think it wouldn't be too onerous to convert code to the new system and have it work rather quickly- MAJ ----- Original Message ----- From: "Chris Fields" To: "Staffa, Nick (NIH/NIEHS) [C]" Cc: "Bioperl-l" Sent: Monday, May 24, 2010 11:58 AM Subject: Re: [Bioperl-l] Restriction Enzymes > On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > >> So, back in 2007 I wrote a script using >> >> use Bio::Tools::RestrictionEnzyme; >> >> and generated some useful restriction maps for a client. >> >> This year he comes back to me with some very new enzymes >> that RestrictionEnzyme did not recognize. I erroneously thought that I >> needed an update of BioPerl, which I requested of SysAdmin. >> They did this across the board, there is no going back. >> (I did learn about the NEB file that needed to be installed) >> >> Now it appears that I must re-write my scripts because RestrictionEnzyme is >> not known to the latest version of bioperl. Is this true? >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, > with removal at 1.6 (an announcement was made to the list regarding this, with > no respondents, prior to the 1.6.0 release). The live version of the > DEPRECATED docs are here: > > http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED > > If I understand correctly, the main reason was most development was put into > Bio::Restriction modules, with very little change occurring in > Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older > BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme > and call it via a 'use lib' directive (or local::lib) or package it with your > script, it should still work. > > However, from my perspective, if the older module wasn't recognizing specific > enzyme cut sites, and the supported one did, wouldn't it be easier to modify > your script to use the newer supported one instead? If the supported > Bio::Restriction modules don't recognize the new sites I would consider that a > bug. > >> Nick Staffa >> Telephone: 919-316-4569 (NIEHS: 6-4569) >> Scientific Computing Support Group >> NIEHS Enterprise-Wide Information Technology Support Contract >> National Institute of Environmental Health Sciences >> National Institutes of Health >> Research Triangle Park, North Carolina > > > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bbimber at gmail.com Mon May 24 15:43:07 2010 From: bbimber at gmail.com (Ben Bimber) Date: Mon, 24 May 2010 14:43:07 -0500 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: <1274729912.4373.19.camel@epistle> References: <1274729912.4373.19.camel@epistle> Message-ID: as long as the limitation is known, i dont see it as a big problem. On Mon, May 24, 2010 at 2:38 PM, Dan Kortschak wrote: > Hi Dave, > > You are right, spaces are not allowed - they are actively stripped from > filenames (the other option would be to escape or otherwise quote them - > the is certainly doable, is there enough of a call to do this?). > > You can use last_execution() to see what was attempted to be run, this > should show the filenames (and everything else) that were used in the > IPC call. > > cheers > Dan > > On Mon, 2010-05-24 at 12:00 -0400, Dave Messina wrote: >> Message: 2 >> Date: Mon, 24 May 2010 15:00:56 +0200 >> From: Dave Messina >> Subject: Re: [Bioperl-l] CommandExts and arrays >> To: Ben Bimber >> Message-ID: >> Content-Type: text/plain; charset=windows-1252 >> >> > ok, i put in that bug. >> >> Thanks. >> >> >> > why exactly does having the asterisk indicate >> > this is a bug? ?i thought the asterisk indicated that multiple >> values >> > were allowed for that argument? >> >> Ah okay, my ignorance of this module is showing. :) >> >> >> > on a related note, are we supposed to be able to pass file names >> that >> > have spaces to command exts? ?on the few cases where this came up, i >> > have never seemed to get this to work right, so i just got rid of >> the >> > spaces. >> >> Sorry, I don't know. >> >> >> Paging Mark Jensen ? have you got a moment to look into this? >> >> >> Dave > > From David.Messina at sbc.su.se Mon May 24 18:03:19 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 00:03:19 +0200 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: <4BFAAF45.4090400@cornell.edu> References: <4BFAAF45.4090400@cornell.edu> Message-ID: From: Graham Barr via RT > IMO it is confusing to include 2 different copies of the same module. I agree. It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. Dave From dan.kortschak at adelaide.edu.au Mon May 24 15:38:32 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 25 May 2010 05:08:32 +0930 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: <1274729912.4373.19.camel@epistle> Hi Dave, You are right, spaces are not allowed - they are actively stripped from filenames (the other option would be to escape or otherwise quote them - the is certainly doable, is there enough of a call to do this?). You can use last_execution() to see what was attempted to be run, this should show the filenames (and everything else) that were used in the IPC call. cheers Dan On Mon, 2010-05-24 at 12:00 -0400, Dave Messina wrote: > Message: 2 > Date: Mon, 24 May 2010 15:00:56 +0200 > From: Dave Messina > Subject: Re: [Bioperl-l] CommandExts and arrays > To: Ben Bimber > Message-ID: > Content-Type: text/plain; charset=windows-1252 > > > ok, i put in that bug. > > Thanks. > > > > why exactly does having the asterisk indicate > > this is a bug? i thought the asterisk indicated that multiple > values > > were allowed for that argument? > > Ah okay, my ignorance of this module is showing. :) > > > > on a related note, are we supposed to be able to pass file names > that > > have spaces to command exts? on the few cases where this came up, i > > have never seemed to get this to work right, so i just got rid of > the > > spaces. > > Sorry, I don't know. > > > Paging Mark Jensen ? have you got a moment to look into this? > > > Dave From Russell.Smithies at agresearch.co.nz Mon May 24 18:01:25 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 25 May 2010 10:01:25 +1200 Subject: [Bioperl-l] taxonomy nightmare Message-ID: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> We've upgraded BioPerl recently and now lots of stuff appears broken though I'm sure it's not as bad as it looks. Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm deluged with errors. AFAIK, there were no changes to Perl 5.8.8 Any help greatly appreciated!!! Thanx, Russell Smithies ----------------------------------- #! /usr/local/bin/perl use strict; use warnings; use Bio::DB::Taxonomy; use Data::Dumper; my $idx_dir = '/data/home/smithiesr/taxonomy'; my $TAXDIR = "/data/home/smithiesr/taxdump"; my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -nodesfile => $nodefile, -namesfile => $namesfile, -directory => $idx_dir, -force => 1) or die $!; my $human = $db->get_taxon(-name => 'Homo sapiens'); print Dumper $human; ----------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references are not implemented in the version of perl at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. Compilation failed in require at (eval 21) line 3. ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 STACK: Bio::DB::Taxonomy::_load_tax_module /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 STACK: Bio::DB::Taxonomy::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 STACK: taxonomyTest.pl:15 ----------------------------------------------------------- ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Mon May 24 22:17:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:17:57 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: On May 24, 2010, at 7:46 PM, Thomas Sharpton wrote: > Hi all, > > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and hmmsearch output. It appears to be fully functional and I have had a handful of users test and integrate this module. > > We decided to push this module into a standalone svn repo (bioperl-hmmer3). I am a bit confused about why the repo is empty, as I committed the code back in March and have made a few updates since then to correct bugs identified by test users. Perhaps I screwed something up during the last commit. The commit doesn't show any added files. The original code apparently is on a branch of bioperl-dev, though (think this was pointed out on IRC): http://github.com/bioperl/bioperl-dev/tree/bioperl-hmmer3 Maybe that was the mixup? > Chris, should I just add the code to the github repo? I might need a pointer on how to do this without screwing it up. I started up a new github repo for it. You would just need to let me know your github ID so I can add you to it. Then (after you are added) the instructions are here: http://github.com/bioperl/bioperl-hmmer3 > Kai, I can mail an archive of the parser your way if you're in a hurry. With some assistance from Chris et. al., I expect the code to be in the github repo by the day's end. > > Apologies for any confusion and the delayed reply - I've been on the road. > > Best, > Tom No problem. Thanks for letting us know. chris > >> On May 21, 2010 4:24 PM, "Chris Fields" wrote: >> >> To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. >> >> Relevant commit msg here: >> >> http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html >> >> perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl >> =========================================== >> dev.open-bio.org - Authorized Access Only >> =========================================== >> ... >> bioperl-hmmer3/ >> ... >> perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 >> =========================================== >> dev.open-bio.org - Authorized Access Only >> =========================================== >> perllib cjfields$ >> >> chris >> >> On May 21, 2010, at 4:56 PM, Kai Blin wrote: >> >> > Hi list, hi Thomas, >> > >> > I've just bumped into the ... >> > From cjfields at illinois.edu Mon May 24 22:20:38 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:20:38 -0500 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: On May 24, 2010, at 5:03 PM, Dave Messina wrote: > From: Graham Barr via RT >> IMO it is confusing to include 2 different copies of the same module. > > I agree. > > It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). > > In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). > > I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. > > So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. > > Dave I agree. We should either prevent indexing or remove it, unless someone can suggest it's utility. chris From thomas.sharpton at gmail.com Mon May 24 20:46:04 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Mon, 24 May 2010 17:46:04 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: Hi all, To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and hmmsearch output. It appears to be fully functional and I have had a handful of users test and integrate this module. We decided to push this module into a standalone svn repo (bioperl-hmmer3). I am a bit confused about why the repo is empty, as I committed the code back in March and have made a few updates since then to correct bugs identified by test users. Perhaps I screwed something up during the last commit. Chris, should I just add the code to the github repo? I might need a pointer on how to do this without screwing it up. Kai, I can mail an archive of the parser your way if you're in a hurry. With some assistance from Chris et. al., I expect the code to be in the github repo by the day's end. Apologies for any confusion and the delayed reply - I've been on the road. Best, Tom On May 21, 2010 4:24 PM, "Chris Fields" wrote: To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. Relevant commit msg here: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html perllib cjfields$ svn ls svn+ssh:// dev.open-bio.org/home/svn-repositories/bioperl =========================================== dev.open-bio.org - Authorized Access Only =========================================== ... bioperl-hmmer3/ ... perllib cjfields$ svn ls svn+ssh:// dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 =========================================== dev.open-bio.org - Authorized Access Only =========================================== perllib cjfields$ chris On May 21, 2010, at 4:56 PM, Kai Blin wrote: > Hi list, hi Thomas, > > I've just bumped into the ... From Russell.Smithies at agresearch.co.nz Mon May 24 22:25:41 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 25 May 2010 14:25:41 +1200 Subject: [Bioperl-l] taxonomy nightmare In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32D88D065AA@exchsth.agresearch.co.nz> Fixed I think, some file permissions got screwed somewhere ;-( --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > Sent: Tuesday, 25 May 2010 10:01 a.m. > To: 'bioperl-l' > Subject: [Bioperl-l] taxonomy nightmare > > We've upgraded BioPerl recently and now lots of stuff appears broken > though I'm sure it's not as bad as it looks. > Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm > deluged with errors. > AFAIK, there were no changes to Perl 5.8.8 > > Any help greatly appreciated!!! > > Thanx, > > Russell Smithies > > ----------------------------------- > #! /usr/local/bin/perl > > use strict; > use warnings; > use Bio::DB::Taxonomy; > use Data::Dumper; > > my $idx_dir = '/data/home/smithiesr/taxonomy'; > my $TAXDIR = "/data/home/smithiesr/taxdump"; > > my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); > > my $db = new Bio::DB::Taxonomy(-source => 'flatfile', > -nodesfile => $nodefile, > -namesfile => $namesfile, > -directory => $idx_dir, > -force => 1) or die $!; > > my $human = $db->get_taxon(-name => 'Homo sapiens'); > print Dumper $human; > > ----------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references > are not implemented in the version of perl at > /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. > Compilation failed in require at (eval 21) line 3. > ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. > > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::Root::Root::_load_module > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 > STACK: Bio::DB::Taxonomy::_load_tax_module > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 > STACK: Bio::DB::Taxonomy::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 > STACK: taxonomyTest.pl:15 > ----------------------------------------------------------- > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dimitark at bii.a-star.edu.sg Mon May 24 22:28:19 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 25 May 2010 10:28:19 +0800 Subject: [Bioperl-l] about gene names Message-ID: <4BFB35C3.4010808@bii.a-star.edu.sg> Hi guys, i have a question How can I get only the gene names from NCBI Gene when i have the sequence id? For example with this id - NP_005264.2 i can search NCBI Gene online but i want to get only the gene name automatically. I was checking the Bio::DB::EntrezGene module but it didnt became clear to me if i can use it for my purposes. Thank you in advance. Greetings Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From David.Messina at sbc.su.se Mon May 24 18:23:32 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 00:23:32 +0200 Subject: [Bioperl-l] Pfam database In-Reply-To: <28650160.post@talk.nabble.com> References: <28650160.post@talk.nabble.com> Message-ID: Hi, The release notes for the latest Pfam (24.0) do mention file format changes, but I could not find documentation describing those changes. Your questions relating to that would best be answered by the people at Pfam. You can contact them here: pfam-help at sanger.ac.uk However, please do report back to us what you learn. It's quite likely our code is not compatible with Pfam 24.0, and we would need that information to fix it. Thanks, Dave On May 23, 2010, at 5:57 PM, NamNAme wrote: > > Dear all, > A few weeks ago I wrote a program that need the pfam database, and I tested > it on the first version of pfam where each protein family sequences are in > one file. > But now I would like to test it on the last version of pfam but the > organization changed. > I've found a file called Pfam-A.fasta which contains sequences and the > family they belong to. But the sequences inside are not complete. > So, I've two questions : Why these sequences are not complete ? > And, How can I find a file with complete sequences and the family they > belong to ? > Thank you for your help. > Bye. > P-S : There is the file pfamseq, I tried to make a script to read it and > then retreive the database structure i want but, this file is enourmous and > use too much memory so it crashed. > -- > View this message in context: http://old.nabble.com/Pfam-database-tp28650160p28650160.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 24 22:54:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:54:03 -0500 Subject: [Bioperl-l] taxonomy nightmare In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> Message-ID: You may have a version of perl that either doesn't include Scalar::Util or includes a broken version. Try installing Scalar::Util from CPAN to see if it fixes the problem. Here's a link on the problem: http://www.perlmonks.org/?node_id=424737 chris On May 24, 2010, at 5:01 PM, Smithies, Russell wrote: > We've upgraded BioPerl recently and now lots of stuff appears broken though I'm sure it's not as bad as it looks. > Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm deluged with errors. > AFAIK, there were no changes to Perl 5.8.8 > > Any help greatly appreciated!!! > > Thanx, > > Russell Smithies > > ----------------------------------- > #! /usr/local/bin/perl > > use strict; > use warnings; > use Bio::DB::Taxonomy; > use Data::Dumper; > > my $idx_dir = '/data/home/smithiesr/taxonomy'; > my $TAXDIR = "/data/home/smithiesr/taxdump"; > > my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); > > my $db = new Bio::DB::Taxonomy(-source => 'flatfile', > -nodesfile => $nodefile, > -namesfile => $namesfile, > -directory => $idx_dir, > -force => 1) or die $!; > > my $human = $db->get_taxon(-name => 'Homo sapiens'); > print Dumper $human; > > ----------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references are not implemented in the version of perl at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. > Compilation failed in require at (eval 21) line 3. > ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. > Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 > STACK: Bio::DB::Taxonomy::_load_tax_module /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 > STACK: Bio::DB::Taxonomy::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 > STACK: taxonomyTest.pl:15 > ----------------------------------------------------------- > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Tue May 25 01:58:27 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 07:58:27 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: <1274767107.2271.11.camel@gonzo.home.kblin.org> On Mon, 2010-05-24 at 17:46 -0700, Thomas Sharpton wrote: > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and > hmmsearch output. It appears to be fully functional and I have had a handful > of users test and integrate this module. That's pretty much what I need. Thanks to the folks on IRC, I got pointed at the correct repository yesterday evening. > Kai, I can mail an archive of the parser your way if you're in a hurry. With > some assistance from Chris et. al., I expect the code to be in the github > repo by the day's end. No worries, that's fine. I've got a checkout of the standalone repository that I can play with now. Is there any particular reason you decided to create a new parser instead of integrating the code into the existing hmmer.pm module? I haven't looked at how the hmmer2 hmmsearch output looks compared to the hmmer3 version and if there's any conflicts. Cheers, Kai PS: Tom, sorry for the repost, forgot to CC the list. Pre-coffee email sending, it never works. -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From dan.kortschak at adelaide.edu.au Tue May 25 02:12:27 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 25 May 2010 15:42:27 +0930 Subject: [Bioperl-l] Bioperl-l Digest, Vol 85, Issue 34 In-Reply-To: References: Message-ID: <1274767947.32025.49.camel@zoidberg.mbs.adelaide.edu.au> Dimitar, Try having a look through the EUtilities cookbook: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook cheers Dan On Tue, 2010-05-25 at 01:58 -0400, Dimitar Kenanov wrote: > Date: Tue, 25 May 2010 10:28:19 +0800 > From: Dimitar Kenanov > Subject: [Bioperl-l] about gene names > To: "'bioperl-l at bioperl.org'" > Message-ID: <4BFB35C3.4010808 at bii.a-star.edu.sg> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi guys, > i have a question How can I get only the gene names from NCBI Gene > when > i have the sequence id? For example with this id - NP_005264.2 i can > search NCBI Gene online but i want to get only the gene name > automatically. I was checking the Bio::DB::EntrezGene module but it > didnt became clear to me if i can use it for my purposes. > > Thank you in advance. > > Greetings > Dimitar > From kai.blin at biotech.uni-tuebingen.de Tue May 25 07:41:59 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 13:41:59 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> On Mon, 2010-05-24 at 17:46 -0700, Thomas Sharpton wrote: Hi Tom, > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and > hmmsearch output. It appears to be fully functional and I have had a handful > of users test and integrate this module. I've tried using the hmmer3 parser for my script, but it seems like the hmm_name member of the result object isn't set, and I'm using that. I saw this before when trying to write a test case that integrates into the Bioperl test framework. (Error output is Can't locate object method "hmm_name" via package "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, line 152.) I'm happy to work on this a bit myself if you're not working on this anyway, so we don't duplicate efforts. I just don't get why the hmm_name isn't picked up correctly, and I haven't been able to figure out how to get at the output that $self->debug() when running the tests. Oh well, it's a learning experience in any case. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Tue May 25 08:37:47 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 14:37:47 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> Message-ID: <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> On Tue, 2010-05-25 at 13:41 +0200, Kai Blin wrote: Whined a little too early. > I've tried using the hmmer3 parser for my script, but it seems like the > hmm_name member of the result object isn't set, and I'm using that. > > I saw this before when trying to write a test case that integrates into > the Bioperl test framework. > (Error output is Can't locate object method "hmm_name" via package > "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, > line 152.) I just found the stuff I needed to add to the hmmer3Result.pm file. I'm currently busy adding a comprehensive test case for this module that integrates into the bioperl test harness. What's the best way to publish my additions? Do I create a fork of bioperl-live on Github or how is this handled? Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Tue May 25 08:46:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 07:46:48 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> Message-ID: On May 25, 2010, at 7:37 AM, Kai Blin wrote: > On Tue, 2010-05-25 at 13:41 +0200, Kai Blin wrote: > > Whined a little too early. > >> I've tried using the hmmer3 parser for my script, but it seems like the >> hmm_name member of the result object isn't set, and I'm using that. >> >> I saw this before when trying to write a test case that integrates into >> the Bioperl test framework. >> (Error output is Can't locate object method "hmm_name" via package >> "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, >> line 152.) > > I just found the stuff I needed to add to the hmmer3Result.pm file. I'm > currently busy adding a comprehensive test case for this module that > integrates into the bioperl test harness. > > What's the best way to publish my additions? Do I create a fork of > bioperl-live on Github or how is this handled? Create a fork of the proper repository, which will eventually be bioperl-hmmer3. However, Thomas hasn't added that code in yet; not sure how much has changed since the original deposition into bioperl-dev in March, but it's possible more has been done. chris > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin > Abteilung Mikrobiologie/Biotechnologie > Eberhard-Karls-Universit?t T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > From dueldor at yahoo.com Tue May 25 08:30:59 2010 From: dueldor at yahoo.com (Dubi Eldor) Date: Tue, 25 May 2010 05:30:59 -0700 (PDT) Subject: [Bioperl-l] How to find secondary structures Message-ID: <766825.32163.qm@web37308.mail.mud.yahoo.com> Hi, I am a new user of BioPerl. I would like to find secondary sturctures in sequences of ~10K nt long. Are there any functions that can help me? Thanks, Dubi From David.Messina at sbc.su.se Tue May 25 09:58:38 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 15:58:38 +0200 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <3065CE83-3E61-4080-B475-F609E74A9FD4@sbc.su.se> On May 25, 2010, at 15:54, Staffa, Nick (NIH/NIEHS) [C] wrote: > The tutorial, I discovered, has an error. > a very bad experience for a trusting newby. > whereas the tutorial has these bold examples in the first box under > Identifying restriction enzyme sites (Bio::Restriction) > > use Bio::Restriction::EnzymeCollection; > my $all_collection = Bio::Restriction::EnzymeCollection; > > This is the form of the statement that seems to work: > my $all_collection = Bio::Restriction::EnzymeCollection->new(); Thanks, fixed. From bosborne11 at verizon.net Tue May 25 09:04:01 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 25 May 2010 09:04:01 -0400 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: Dave, I looked at the scripts, and like you I concluded they didn't use that local Bio/ directory. Then I ran then with and without that Bio/ directory, same results. So I removed that local Bio/ directory. Rob, does some additional action need to be taken by Chris, or some other Bioperl maintainer, at CPAN/PAUSE? Brian O. On May 24, 2010, at 6:03 PM, Dave Messina wrote: > From: Graham Barr via RT >> IMO it is confusing to include 2 different copies of the same module. > > I agree. > > It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). > > In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). > > I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. > > So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Tue May 25 09:54:17 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Tue, 25 May 2010 09:54:17 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: <4046E576-2109-45BB-969C-F0B6F5749957@sbc.su.se> Message-ID: The tutorial, I discovered, has an error. a very bad experience for a trusting newby. whereas the tutorial has these bold examples in the first box under Identifying restriction enzyme sites (Bio::Restriction) use Bio::Restriction::EnzymeCollection; my $all_collection = Bio::Restriction::EnzymeCollection; This is the form of the statement that seems to work: my $all_collection = Bio::Restriction::EnzymeCollection->new(); All the other stuff necessary for my purpose of getting fragment lengths is there and seems to work if the $enzyme database has the enzyme under the name you enter. Updating the database with the file from NEB seems to be up to the user or his sysadmin. On 5/24/10 11:55 AM, "Dave Messina" wrote: Hi Nick, Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that module, see also: http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > How hard would it be to keep things backward compatible. > Have I missed something here? I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones are intended to be at least partially backwards compatible. Dave From cjfields at illinois.edu Tue May 25 10:30:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 09:30:09 -0500 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> I have added a 'no_index' to that specific directory in Build.PL, suppose we can change that back if there is no purpose to it (though it might come in handy with spots we don't need to be indexed). chris On May 25, 2010, at 8:04 AM, Brian Osborne wrote: > Dave, > > I looked at the scripts, and like you I concluded they didn't use that local Bio/ directory. Then I ran then with and without that Bio/ directory, same results. So I removed that local Bio/ directory. > > Rob, does some additional action need to be taken by Chris, or some other Bioperl maintainer, at CPAN/PAUSE? > > Brian O. > > On May 24, 2010, at 6:03 PM, Dave Messina wrote: > >> From: Graham Barr via RT >>> IMO it is confusing to include 2 different copies of the same module. >> >> I agree. >> >> It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). >> >> In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). >> >> I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. >> >> So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Tue May 25 10:51:02 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Tue, 25 May 2010 10:51:02 -0400 Subject: [Bioperl-l] New Restriction Analysis Message-ID: I have tried both these methods for getting new enzyme info into the system: use Bio::Restriction::IO; my $re_io = Bio::Restriction::IO->new(-file => $file, -format=>'withrefm'); my $rebase_collection = $re_io->read; A REBASE file in the correct format can be found at ftp://ftp.neb.com/pub/rebase - it will have a name like "withrefm.308". If need be you can also create new enzymes, like this: my $re = new Bio::Restriction::Enzyme(-enzyme => 'BioRI', -seq => 'GG^AATTCC'); But the BioPerl sends an error without informing me which of my statements caused it: Using first the withreftm.005 file from rebase and then these statements (not both at the same time): my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'SgrDI', -seq => 'CG^TCGACG'); Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.8/Bio/Restriction/Analysis.pm line 529. This works: my $pattern = $enzyme->site; print "pattern = $pattern\n"; which would lead me to believe there is nothing wrong with my enzyme. Could there be a problem if there were no cuts? That must be it, because putting info for EcoRI in instead of SgrDI, the program works: [Not the whole program, but only the bioPerl stuff. my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'EcoRI', -seq => 'G^AATTC'); use Bio::Restriction::Analysis; my $pattern = $enzyme->site; print "pattern = $pattern\n"; my $db = Bio::DB::Fasta->new("/uoldhome/estaffa/westmoreland/$filename", -makeid => \&make_my_id); my $obj = $db->get_Seq_by_id("$sequenceID"); #Sequence Object my $analysis = Bio::Restriction::Analysis->new(-seq => $obj); my @strings = $analysis->fragments($enzyme); What to do? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Enterprise-Wide Information Technology Support Contract National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From maj at fortinbras.us Tue May 25 12:20:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:20:41 -0400 Subject: [Bioperl-l] How to find secondary structures In-Reply-To: <766825.32163.qm@web37308.mail.mud.yahoo.com> References: <766825.32163.qm@web37308.mail.mud.yahoo.com> Message-ID: <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> Sounds like a job for infernal and it's Bioperl wrapper (in Bio::Tools::Run); right Chris? MAJ ----- Original Message ----- From: "Dubi Eldor" To: Sent: Tuesday, May 25, 2010 8:30 AM Subject: [Bioperl-l] How to find secondary structures > Hi, > > I am a new user of BioPerl. > I would like to find secondary sturctures in sequences of ~10K nt long. > Are there any functions that can help me? > > Thanks, > Dubi > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue May 25 12:19:42 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:19:42 -0400 Subject: [Bioperl-l] New Restriction Analysis In-Reply-To: References: Message-ID: Hi Nick, You're right, as far as I can tell; the offending line is @cut_positions=@{$self->{'_cut_positions'}->{$enz}}; so $self->{_cut_positions}->{$enz} must be null. I would say this is a bug; if you can put what you've reported below in a bug report at http://bugzilla.bioperl.org, that would be great. A workaround would be to check whether you have cuts first before calling the method; but that may be impossible, in which case a truly awful kludge would be to append a recognized site at the end of your sequences. Just till we can get to the fix. cheers Mark ----- Original Message ----- From: "Staffa, Nick (NIH/NIEHS) [C]" To: "Bioperl-l" Sent: Tuesday, May 25, 2010 10:51 AM Subject: [Bioperl-l] New Restriction Analysis >I have tried both these methods for getting new enzyme info into the system: > > use Bio::Restriction::IO; > my $re_io = Bio::Restriction::IO->new(-file => $file, > -format=>'withrefm'); > my $rebase_collection = $re_io->read; > A REBASE file in the correct format can be found at > ftp://ftp.neb.com/pub/rebase - it will have a name like "withrefm.308". If > need be you can also create new enzymes, like this: > my $re = new Bio::Restriction::Enzyme(-enzyme => 'BioRI', > -seq => 'GG^AATTCC'); > But the BioPerl sends an error without informing me which of my statements > caused it: > > Using first the withreftm.005 file from rebase and then these statements (not > both at the same time): > my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'SgrDI', > -seq => 'CG^TCGACG'); > > > Can't use an undefined value as an ARRAY reference at > /usr/lib/perl5/site_perl/5.8.8/Bio/Restriction/Analysis.pm line 529. > > This works: > my $pattern = $enzyme->site; > print "pattern = $pattern\n"; > which would lead me to believe there is nothing wrong with my enzyme. > Could there be a problem if there were no cuts? > That must be it, because putting info for EcoRI in instead of SgrDI, the > program works: > > [Not the whole program, but only the bioPerl stuff. > my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'EcoRI', > -seq => 'G^AATTC'); > use Bio::Restriction::Analysis; > my $pattern = $enzyme->site; > print "pattern = $pattern\n"; > my $db = Bio::DB::Fasta->new("/uoldhome/estaffa/westmoreland/$filename", > -makeid => \&make_my_id); > my $obj = $db->get_Seq_by_id("$sequenceID"); #Sequence Object > my $analysis = Bio::Restriction::Analysis->new(-seq => $obj); > my @strings = $analysis->fragments($enzyme); > > What to do? > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Enterprise-Wide Information Technology Support Contract > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue May 25 12:38:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 11:38:12 -0500 Subject: [Bioperl-l] How to find secondary structures In-Reply-To: <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> References: <766825.32163.qm@web37308.mail.mud.yahoo.com> <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> Message-ID: <2B6207D9-7221-4949-A7EE-EE6ED54EFF7B@illinois.edu> Yes, that would look for Rfam-based conserved structures. Should work for the latest infernal release, but let me know if you run into problems. Should also look at ERPIN and RNAMotif (both have similar BioPerl wrappers). chris On May 25, 2010, at 11:20 AM, Mark A. Jensen wrote: > Sounds like a job for infernal and it's Bioperl wrapper (in Bio::Tools::Run); right Chris? > MAJ > ----- Original Message ----- From: "Dubi Eldor" > To: > Sent: Tuesday, May 25, 2010 8:30 AM > Subject: [Bioperl-l] How to find secondary structures > > >> Hi, >> >> I am a new user of BioPerl. >> I would like to find secondary sturctures in sequences of ~10K nt long. >> Are there any functions that can help me? >> >> Thanks, >> Dubi >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue May 25 12:43:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:43:41 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <8EE661A4491C4A0FAD9875CF790F8164@NewLife> Thanks for the headsup on that-- we can fix. The refm file should be downloaded relatively transparently by the class directly... MAJ ----- Original Message ----- From: "Staffa, Nick (NIH/NIEHS) [C]" To: "Dave Messina" ; "Chris Fields" ; "Mark A. Jensen" Cc: "Bioperl-l" Sent: Tuesday, May 25, 2010 9:54 AM Subject: Re: [Bioperl-l] Restriction Enzymes > The tutorial, I discovered, has an error. > a very bad experience for a trusting newby. > whereas the tutorial has these bold examples in the first box under > Identifying restriction enzyme sites (Bio::Restriction) > > use Bio::Restriction::EnzymeCollection; > my $all_collection = Bio::Restriction::EnzymeCollection; > > This is the form of the statement that seems to work: > my $all_collection = Bio::Restriction::EnzymeCollection->new(); > > All the other stuff necessary for my purpose of getting fragment lengths is > there and seems to work > if the $enzyme database has the enzyme under the name you enter. > Updating the database with the file from NEB seems to be up to the user or his > sysadmin. > > > On 5/24/10 11:55 AM, "Dave Messina" wrote: > > Hi Nick, > > Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that > module, see also: > > http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > > >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme > was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones > are intended to be at least partially backwards compatible. > > > Dave > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue May 25 13:14:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 13:14:24 -0400 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: <409221E1D1E947108DEDBB5F34E1EBB7@NewLife> Don't think you want 'no strict'; the error's saying something about syntax to you. In the snippet, I see a missing opening single quote for output_file.bam. The asterisk means "expect an array ref", so that's ok. ----- Original Message ----- From: "Ben Bimber" To: "bioperl-l" Sent: Friday, May 21, 2010 9:58 AM Subject: [Bioperl-l] CommandExts and arrays >I am getting an error when trying to pass an array as a param with > command exts. I hope there is something obvious i'm missing, but I > cant seem to figure this out. > > I am trying to run the merge two BAM files using > Bio::Tools::Run::Samtools using something like this: > > my $new_bam = Bio::Tools::Run::Samtools->new( > -command => 'merge', > -program_dir => '/usr/bin/samtools/', > )->run( > -obm => output_file.bam', > -ibm => ['file1.bam', 'file2.bam'], > ); > > When i use an array for the -ibm param, I get an error saying 'cannot > use string 'file1' as an arrayref while strict refs in place'. The > error comes from this code in CommandExts.pm, around line 989. adding > 'no strict' right before the final line stops the error: > > # expand arrayrefs > my $l = $#files; > for (0..$l) { > if (ref($files[$_]) eq 'ARRAY') { > splice(@files, $_, 1, @{$files[$_]}); > #error thrown from this line > splice(@switches, $_, 1, ($switches[$_]) x @{$files[$_]}); > } > > > Thanks for the help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From thomas.sharpton at gmail.com Tue May 25 14:33:06 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 25 May 2010 11:33:06 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274767107.2271.11.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> Message-ID: <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> Hi Kai, I've just pushed the code to github, which you can find here: http://github.com/bioperl/bioperl-hmmer3 Please use this updated code before making any significant changes - I think I may have already fixed the bug you brought up earlier (but maybe not?). Do let me know if you have any problems getting ahold of this data or if you find any bugs in the code I'd deposited. Still getting my head wrapped around github. > No worries, that's fine. I've got a checkout of the standalone > repository that I can play with now. Is there any particular reason > you > decided to create a new parser instead of integrating the code into > the > existing hmmer.pm module? I haven't looked at how the hmmer2 hmmsearch > output looks compared to the hmmer3 version and if there's any > conflicts. Trying to integrate hmmer3 into the old hmmer searchIO module was the original idea. But after talking to some of the BioPerl gurus and considering the inherent differences between hmmer3 and hmmer2 (at least during beta, though there are still some major output report differences in the live release), we decided as separate module would be ideal. I don't want to speak out of turn, but it sounds like this might be one of the ways that the bioperl project is expanded in the future without overbloating bioperl-live. In theory, we can extend Bio::Run into this module as well in the future, such that bioperl- hmmer3 has a SearchIO path in addition to a Run path. I don't know what the more experienced developers currently think about this idea. This is an obvious statement, but I feel it's important to be clear on these matters - you should feel free to make any and all contributions to the development of this module as you see fit. BioPerl has been wonderful to me and I started this module to give a little back, but this remains community generated software. FYI - I have a fix that I'm working on to handle the secondary structure track in the alignment report, so if you're particularly interested in that data, give me a bit and I'll have it up and running. All the best, Tom From David.Messina at sbc.su.se Tue May 25 14:52:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 20:52:29 +0200 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> References: <4BFAAF45.4090400@cornell.edu> <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> Message-ID: <704A3AD7-BF8E-4C52-A3C5-D402B59BFD66@sbc.su.se> On May 25, 2010, at 4:30 PM, Chris Fields wrote: > I have added a 'no_index' to that specific directory in Build.PL, suppose we can change that back if there is no purpose to it (though it might come in handy with spots we don't need to be indexed). Good idea ? it's bound to come up at some point. On May 25, 2010, at 3:04 PM, Brian Osborne wrote: > So I removed that local Bio/ directory. Great, thanks Brian! Dave From hlapp at gmx.net Tue May 25 17:10:42 2010 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 May 2010 15:10:42 -0600 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> Message-ID: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> I'm a little concerned that this discussion is disconnected from the list and so misses a lot of possible input. Are we moving our development discussion to IRC or github commit comments? Regarding $feature->seq(), the API documentation expressly states that the return type is Bio::PrimarySeqI, as it does for $feature- >entire_seq(). The original rationale for that was to avoid circular references. Bio::SeqI objects contain references to attached features, which in turn contain a reference to the seq object they are attached to. A Bio::SeqI object holds the basic sequence properties (everything except annotation and feature objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a reference to, not the containing Bio::SeqI object. It's possible that S::U::weaken() can solve the circular reference problem, but this fact should be tested. I.e., attach a feature with a SeqI-reference to a SeqI, dispose the SeqI, and then test that the feature has lost the reference to the SeqI too. This still leaves the issue though that then you have a SeqFeatureI object with a dangling reference to a sequence object. If you have those SeqFeatureI objects stored in a feature store, this may wreak havoc. I'd like to see convincing arguments that it doesn't. Bottom line - just forking on git and committing a change isn't a substitute for bringing up an issue and possible solutions on the list, and the vetting of pull requests can fall upon only one or two core developers. Two eyeballs often spot a lot less than a hundred. -hilmar On May 25, 2010, at 2:02 PM, GitHub wrote: > Ah, but my link's old, forget it. This one is better: http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html > > From: cjfields > View this commit online: http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From kai.blin at biotech.uni-tuebingen.de Tue May 25 17:50:29 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 23:50:29 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> Message-ID: <1274824229.2271.60.camel@gonzo.home.kblin.org> On Tue, 2010-05-25 at 11:33 -0700, Thomas Sharpton wrote: Hi Thomas, > http://github.com/bioperl/bioperl-hmmer3 > > Please use this updated code before making any significant changes - I > think I may have already fixed the bug you brought up earlier (but > maybe not?). Do let me know if you have any problems getting ahold of > this data or if you find any bugs in the code I'd deposited. Still > getting my head wrapped around github. I've seen the repo, and forked from it already to push my changes. Some of the folks from IRC gave me write access and Chris Fields actually pushed my changes. Most notable about the changes is probably a bit hidden by the noise, but I've changed the Hit->raw_score to contain the overall score, not the "best domain" score. > Trying to integrate hmmer3 into the old hmmer searchIO module was the > original idea. But after talking to some of the BioPerl gurus and > considering the inherent differences between hmmer3 and hmmer2 (at > least during beta, though there are still some major output report > differences in the live release), we decided as separate module would > be ideal. Some of the folks on IRC suggested that we might want to integrate the hmmer.pm parser as well, modularizing this a bit and loading the correct parser depending on the requested format. > This is an obvious statement, but I feel it's important to be clear on > these matters - you should feel free to make any and all contributions > to the development of this module as you see fit. BioPerl has been > wonderful to me and I started this module to give a little back, but > this remains community generated software. I'm planning on adding even more tests, but the basic features for hmmscan parsing seem to be there. I'm currently running an extensive test run on real genome data, hopefully I can see the results of that in a couple of days. Cheers, and thanks for the help, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Tue May 25 17:55:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 16:55:53 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: I agree, but we spotted this from IRC, then added the comments on that merge. Dave also spotted my original code comments (which appeared in the fork queue, and which echo the very same concerns you have) after the commit as well, and managed to revert it. So, with forked where it appears further discussion is warranted (like this), we should bring it to the main list (and IRC, if anyone happens to be there) for discussion. Sounds good to me. For those on list, here are Adam's and my comments on this (linked here: http://github.com/adsj/bioperl-live/commit/24ec961b217084e248f4fdbd174aadace1a27ac4#comments): adsj: "Hi Chris, thanks for the comment. The reason is this: I have a class, MyApp::Seq, which ISA Bio::Seq::RichSeq and adds some extra methods I use in the application. When I call ->seq() on a feature from one of my MyApp::Seq objects, I want to get a MyApp::Seq object back (because of the extra methods). Am I making sense? I have been running with this patch since at least 1.5.2, so it has been a while since I digged into it. Maybe there is a cleaner solution. I am not sure what your comment about changing the API means - I think it is quite reasonable/natural that MyApp::Seq->get_Features"->seq" returns MyApp::Seq objects?" My response: "Calling seq() on a feature should return a truncation of whatever your Bio::SeqFeatureI does (it normally calls trunc(start, end) on it's attached sequence). For Bio::Seq it's normally returning a simple Bio::PrimarySeq, not a Bio::Seq, b/c that is what is attached to the Feature. This is why we don't need GC. There are no circular refs: Bio::Seq has-a PrimarySeq and has-a Features (via FeatureHolderI), each Feature has the same PrimarySeq as the parent Bio::Seq. It's hard to know if there is a workaround w/o knowing what you are asking for (e.g. what MyApp::Seq does), but you can certainly override the default methods to DTRT for your specific case. For instance, redefine add_SeqFeature() for your class to attach self as you have above for Bio::Seq. In this case, we should patch SeqFeature::Generic to use weaken() as you show above just in case this is needed by others, but maybe in the context of (pseudocode) 'weaken if $seq to be attached is-a Bio::SeqI', and not hammered down to check the very specific 'Bio::PrimarySeq'. Anyway, this is what I mean by changing the default API, which is what the above Bio::Seq change does. This would change the context of what is currently being returned (self, instead of a simpler contained Bio::PrimarySeqI). Also, anything gained by abstracting the raw seq handling of Feature data by linking to PrimarySeq is lost when you link to the parent, thus always requiring GC and weaken() (which is notoriously flaky dep. on context)." chris On May 25, 2010, at 4:10 PM, Hilmar Lapp wrote: > I'm a little concerned that this discussion is disconnected from the list and so misses a lot of possible input. Are we moving our development discussion to IRC or github commit comments? > > Regarding $feature->seq(), the API documentation expressly states that the return type is Bio::PrimarySeqI, as it does for $feature->entire_seq(). > > The original rationale for that was to avoid circular references. Bio::SeqI objects contain references to attached features, which in turn contain a reference to the seq object they are attached to. A Bio::SeqI object holds the basic sequence properties (everything except annotation and feature objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a reference to, not the containing Bio::SeqI object. > > It's possible that S::U::weaken() can solve the circular reference problem, but this fact should be tested. I.e., attach a feature with a SeqI-reference to a SeqI, dispose the SeqI, and then test that the feature has lost the reference to the SeqI too. > > This still leaves the issue though that then you have a SeqFeatureI object with a dangling reference to a sequence object. If you have those SeqFeatureI objects stored in a feature store, this may wreak havoc. I'd like to see convincing arguments that it doesn't. > > Bottom line - just forking on git and committing a change isn't a substitute for bringing up an issue and possible solutions on the list, and the vetting of pull requests can fall upon only one or two core developers. Two eyeballs often spot a lot less than a hundred. > > -hilmar > > On May 25, 2010, at 2:02 PM, GitHub wrote: > >> Ah, but my link's old, forget it. This one is better: http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html >> >> From: cjfields >> View this commit online: http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From thomas.sharpton at gmail.com Tue May 25 18:29:38 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 25 May 2010 15:29:38 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274824229.2271.60.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: Thanks for the contributions, Kai. > I've seen the repo, and forked from it already to push my changes. > Some > of the folks from IRC gave me write access and Chris Fields actually > pushed my changes. Just saw this. Thanks for doing that, Chris. > Most notable about the changes is probably a bit hidden by the noise, > but I've changed the Hit->raw_score to contain the overall score, not > the "best domain" score. So this brings up an interesting point. At some point, we'll have to build out a few additional SearchIO methods to incorporate some of the additional information encoded in the HMMER v3 reports. Sean talks a bit in the user manual about the importance of looking at both the full sequence and the best domain (see page 18 in the manual linked to on this page http://hmmer.janelia.org/#documentation). For example, he mentions that one should consider the e-value of both the full sequence and best domain to ascertain if the query is homologous to a profile being considered via hmmsearch. He also mentions that looking at the full sequence report values without consideration of the best domain report values can be misleading. I'm not saying that your approach regarding Hit->raw_score is wrong - proper interpretation of the results is up to the end user and there are benefits to looking at the full sequence (again, communicated on page 18) - but we might consider how to best encode the SearchIO methods to mitigate end user confusion and mistakes. >> Trying to integrate hmmer3 into the old hmmer searchIO module was the >> original idea. But after talking to some of the BioPerl gurus and >> considering the inherent differences between hmmer3 and hmmer2 (at >> least during beta, though there are still some major output report >> differences in the live release), we decided as separate module would >> be ideal. > > Some of the folks on IRC suggested that we might want to integrate the > hmmer.pm parser as well, modularizing this a bit and loading the > correct > parser depending on the requested format. This might make sense, given that HMMER v3 is now live and seems to be adopted by researchers at an increasing rate. Since I used hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult to do, either. I think a thorough conversation on this point is warranted as others I've talked to have preferred the modules to be separate. I'd be interested to hear what other have to say on this point. >> This is an obvious statement, but I feel it's important to be clear >> on >> these matters - you should feel free to make any and all >> contributions >> to the development of this module as you see fit. BioPerl has been >> wonderful to me and I started this module to give a little back, but >> this remains community generated software. > > I'm planning on adding even more tests, but the basic features for > hmmscan parsing seem to be there. I'm currently running an extensive > test run on real genome data, hopefully I can see the results of > that in > a couple of days. Awesome! > Cheers, and thanks for the help, Likewise. T From kannabiran.nandakumar at gmail.com Tue May 25 18:30:18 2010 From: kannabiran.nandakumar at gmail.com (Kanna) Date: Tue, 25 May 2010 15:30:18 -0700 (PDT) Subject: [Bioperl-l] new to this group Message-ID: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Hi guys, I am new to this group. I work in bioinformatics and would like to contribute to the BioPerl project. I am interested in the OBO file parsing module to start with. I visited the project priority list and the page seems to have been modified around 6 months ago. If it is already completed could anyone suggest modules I can contribute to? Thanks, Kanna From David.Messina at sbc.su.se Tue May 25 18:41:27 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 00:41:27 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: On May 25, 2010, at 11:55 PM, Chris Fields wrote: > Sounds good to me. Me too, and just to clarify for everyone following along, I erroneously committed the code in question to bioperl-live master (head), reverted that commit, and moved it to a branch (http://github.com/bioperl/bioperl-live/commits/topic/adsj-seqobj-return). Dave From maj at fortinbras.us Tue May 25 21:37:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 21:37:38 -0400 Subject: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: <525D25AC2CDF42E99C1F4072B02D0C1B@NewLife> I +1 Hilmar, but note that already git is doing what it is designed to do: devolve development. My $0.02 is: that is how BioPerl will keep from becoming a dinosaur. I believe that we as a community, judging from the track of the last year or so, are committed to this evolution by devolution, and the move to git is part of that overall plan. The increase in IRC chatter, led by deafferet and rbuels, prefigured this and it was generally considered a Good Thing. So, I would propose that people (devs and users) make their views known (on list and elsewhere) about how best to communicate and have dev-oriented conversations: it may be that a listserv alone is not nimble enough. MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "BioPerl List" Sent: Tuesday, May 25, 2010 5:10 PM Subject: Re: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) > I'm a little concerned that this discussion is disconnected from the list and > so misses a lot of possible input. Are we moving our development discussion > to IRC or github commit comments? > > Regarding $feature->seq(), the API documentation expressly states that the > return type is Bio::PrimarySeqI, as it does for $feature- > >entire_seq(). > > The original rationale for that was to avoid circular references. Bio::SeqI > objects contain references to attached features, which in turn contain a > reference to the seq object they are attached to. A Bio::SeqI object holds > the basic sequence properties (everything except annotation and feature > objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a > reference to, not the containing Bio::SeqI object. > > It's possible that S::U::weaken() can solve the circular reference problem, > but this fact should be tested. I.e., attach a feature with a SeqI-reference > to a SeqI, dispose the SeqI, and then test that the feature has lost the > reference to the SeqI too. > > This still leaves the issue though that then you have a SeqFeatureI object > with a dangling reference to a sequence object. If you have those SeqFeatureI > objects stored in a feature store, this may wreak havoc. I'd like to see > convincing arguments that it doesn't. > > Bottom line - just forking on git and committing a change isn't a substitute > for bringing up an issue and possible solutions on the list, and the vetting > of pull requests can fall upon only one or two core developers. Two eyeballs > often spot a lot less than a hundred. > > -hilmar > > On May 25, 2010, at 2:02 PM, GitHub wrote: > >> Ah, but my link's old, forget it. This one is better: >> http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html >> >> From: cjfields >> View this commit online: >> http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From asjo at koldfront.dk Wed May 26 01:41:52 2010 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 26 May 2010 07:41:52 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: <87zkznb4nz.fsf@topper.koldfront.dk> On Tue, 25 May 2010 15:10:42 -0600, Hilmar wrote: > Bottom line - just forking on git and committing a change isn't a > substitute for bringing up an issue and possible solutions on the > list, and the vetting of pull requests can fall upon only one or two > core developers. Two eyeballs often spot a lot less than a hundred. Just to clarify: I specifically _didn't_ make a Pull request yet. I simply created the fork store the patch in a visible way - my intention was then to clean the patch up and make it ready for comments/discussion (I just haven't had time to do so yet). I am new to github, but as I understood the interface there, anyone is free (encouraged?) to "fork" their own clone to work in, as a kind of "public" personal workspace, and when you feel that your clone is ready to be merged, then - only then - you do a "Pull request". If that isn't the way github is supposed to be used, or that isn't the way BioPerl wants to use it, let me know and I'll adjust. I appreciate the comments so far, and will get back to this as soon as I can. Thanks, Adam -- "Sunday morning when the rain begins to fall Adam Sj?gren I believe I have seen the end of it all" asjo at koldfront.dk From David.Messina at sbc.su.se Wed May 26 05:24:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 11:24:11 +0200 Subject: [Bioperl-l] Bio::Species irritated with "unclassified sequences" In-Reply-To: <4BF59B2F.9000300@bms.com> References: <4BF59B2F.9000300@bms.com> Message-ID: <50665C57-007D-49CC-86A7-4595D176EA73@sbc.su.se> Hi Charles, Thanks for your report. I believe your interpretation of Bio::Species::classification is correct. It looks like this is going to require a little more investigation. Could you please submit this as a bug report along with a little test case? http://www.bioperl.org/wiki/Bugs Dave On May 20, 2010, at 22:27, Charles Tilford wrote: > Bio::Species::classification() is irritated with me when I provide it with a @class_array that is composed of one node, particularly: > > $obj->classification("unclassified sequences") > > AFAICT this is a valid, single node taxa "tree": > > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=12908 > > Subroutine classification is expecting at least two class members, the problem with the above call crops up as: > > Use of uninitialized value $vals[1] in quotemeta at /stf/biocgi/tilfordc/patch_lib/Bio/Species.pm line 179 > ( $Id: Species.pm 16700 2010-01-15 19:50:11Z dave_messina $) > > > ... and the relevant code is: > > sub classification { > my ($self, @vals) = @_; > > if (@vals) { > if (ref($vals[0]) eq 'ARRAY') { > @vals = @{$vals[0]}; > } > > # make sure the lineage contains us as first or second element > # (lineage may have subspecies, species, genus ...) > my $name = $self->node_name; > my ($genus, $species) = (quotemeta($vals[1]), quotemeta($vals[0])); > > > That is, it's expecting at least (species, genus) in the array. Am I misusing classification(), or Bio::Species in general? I know it's named "Species", but I've been using it as a generic tree object for arbitrary taxonomy nodes, not just species and subspecies. This block a little lower down: > > unless ($self->rank) { > # and that we are rank species > $self->rank('species'); > } > > > ... implies that the module can be used for taxa ranks other than species. However, doing so would not prevent the module being aggravated over a null $vals[1]. > > The use case here is building Bio::Seq::RichSeq objects pulled from a (very large) sequence database, and then dumped / displayed with SeqIO. Most are well behaved, but there's a non-trivial number of 'artificial' constructs that don't root to an organism. > > -CAT > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed May 26 07:53:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 06:53:50 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <87zkznb4nz.fsf@topper.koldfront.dk> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> Message-ID: On May 26, 2010, at 12:41 AM, Adam Sj?gren wrote: > On Tue, 25 May 2010 15:10:42 -0600, Hilmar wrote: > >> Bottom line - just forking on git and committing a change isn't a >> substitute for bringing up an issue and possible solutions on the >> list, and the vetting of pull requests can fall upon only one or two >> core developers. Two eyeballs often spot a lot less than a hundred. > > Just to clarify: I specifically _didn't_ make a Pull request yet. > > I simply created the fork store the patch in a visible way - my > intention was then to clean the patch up and make it ready for > comments/discussion (I just haven't had time to do so yet). > > I am new to github, but as I understood the interface there, anyone is > free (encouraged?) to "fork" their own clone to work in, as a kind of > "public" personal workspace, and when you feel that your clone is ready > to be merged, then - only then - you do a "Pull request". That's odd; I recall receiving a pull request from your fork at some point, but maybe I simply looked into the fork queue instead (which I thought was derived from pull requests, but maybe not). > If that isn't the way github is supposed to be used, or that isn't the > way BioPerl wants to use it, let me know and I'll adjust. > > I appreciate the comments so far, and will get back to this as soon as I > can. > > > Thanks, > > Adam No problem Adam, we're going through the learning curve on this end as well re: this specific github feature. I think how you are going about this is fine, we'll need to come up with some documentation as to how our collabs pull in forked code. chrus From hlapp at drycafe.net Wed May 26 09:27:55 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 26 May 2010 07:27:55 -0600 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <87zkznb4nz.fsf@topper.koldfront.dk> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> Message-ID: <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> On May 25, 2010, at 11:41 PM, Adam Sj?gren wrote: > as I understood the interface there, anyone is free (encouraged?) to > "fork" their own clone to work in, as a kind of "public" personal > workspace, and when you feel that your clone is ready to be merged, > then - only then - you do a "Pull request". That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) And yes, encouraged to fork indeed. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Wed May 26 10:03:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 16:03:14 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> Message-ID: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> On May 26, 2010, at 15:27, Hilmar Lapp wrote: > That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) That would be me. :) His commits were sitting in the fork queue, which I mistakenly understood to mean a pull request had been made. Turns out that's not the case (See http://github.com/blog/270-the-fork-queue). Dave From David.Messina at sbc.su.se Wed May 26 10:52:05 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 16:52:05 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: > So this brings up an interesting point. At some point, we'll have to build out a few additional SearchIO methods to incorporate some of the additional information encoded in the HMMER v3 reports. Would the new methods need to be added to SearchIO if they're specific to H3? (as opposed to just being in the H3 sub-class) > Sean talks a bit in the user manual about the importance of looking at both the full sequence and the best domain (see page 18 in the manual linked to on this page http://hmmer.janelia.org/#documentation). For example, he mentions that one should consider the e-value of both the full sequence and best domain to ascertain if the query is homologous to a profile being considered via hmmsearch. > > He also mentions that looking at the full sequence report values without consideration of the best domain report values can be misleading. I'm not saying that your approach regarding Hit->raw_score is wrong - proper interpretation of the results is up to the end user and there are benefits to looking at the full sequence (again, communicated on page 18) - but we might consider how to best encode the SearchIO methods to mitigate end user confusion and mistakes. I think this is a great idea. Of course it's always best for end-users to RTFM and understand the tools they're using, but it's clearly beneficial to make it easier to do the right thing. Having not considered it too much, I'm not sure how to accomplish this without breaking the SearchIO idiom. But presumably a way could be found. >> Some of the folks on IRC suggested that we might want to integrate the >> hmmer.pm parser as well, modularizing this a bit and loading the correct >> parser depending on the requested format. > This might make sense, given that HMMER v3 is now live and seems to be adopted by researchers at an increasing rate. Since I used hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult to do, either. I think a thorough conversation on this point is warranted as others I've talked to have preferred the modules to be separate. > > I'd be interested to hear what other have to say on this point. I did not follow the IRC discussion, so I confess I'm not totally clear on what "integrate the hmmer.pm parser" means. I'm taking it to mean combining the code that parses HMMER2 with the code that parses HMMER3. But then "modularizing this a bit and loading the correct parser depending on the requested format" seems to contradict that assumption. Perhaps you (or someone) could clarify a bit what the HMMER2 - HMMER3 integration would look like (and the goal of doing so) ? Dave From thomas.sharpton at gmail.com Wed May 26 11:25:24 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 26 May 2010 08:25:24 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: Thanks for the feedback, Dave. >> So this brings up an interesting point. At some point, we'll have >> to build out a few additional SearchIO methods to incorporate some >> of the additional information encoded in the HMMER v3 reports. > > Would the new methods need to be added to SearchIO if they're > specific to H3? (as opposed to just being in the H3 sub-class) Sorry for being unclear - the methods in question would be, at least in my mind, specific to the H3 sub-class. > >> Sean talks a bit in the user manual about the importance of looking >> at both the full sequence and the best domain (see page 18 in the >> manual linked to on this page http://hmmer.janelia.org/#documentation) >> . For example, he mentions that one should consider the e-value of >> both the full sequence and best domain to ascertain if the query is >> homologous to a profile being considered via hmmsearch. >> >> He also mentions that looking at the full sequence report values >> without consideration of the best domain report values can be >> misleading. I'm not saying that your approach regarding Hit- >> >raw_score is wrong - proper interpretation of the results is up to >> the end user and there are benefits to looking at the full sequence >> (again, communicated on page 18) - but we might consider how to >> best encode the SearchIO methods to mitigate end user confusion and >> mistakes. > > I think this is a great idea. > > Of course it's always best for end-users to RTFM and understand the > tools they're using, but it's clearly beneficial to make it easier > to do the right thing. > > Having not considered it too much, I'm not sure how to accomplish > this without breaking the SearchIO idiom. But presumably a way could > be found. > I'll see if I can't hit the drawing board and come up with a naming scheme for additional H3 methods that retrieve some of the extra data encoded in the new reports. It *probably* makes most sense, at least from the standpoint of the user's perspective, to adopt the full- length report values as the standard hit->significance and hit- >raw_score while having something like hit->best_significance and hit- >best_score as H3 methods that return the best-domain report values. Again, this could use some thought/discussion. > >>> Some of the folks on IRC suggested that we might want to integrate >>> the >>> hmmer.pm parser as well, modularizing this a bit and loading the >>> correct >>> parser depending on the requested format. > >> This might make sense, given that HMMER v3 is now live and seems to >> be adopted by researchers at an increasing rate. Since I used >> hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult >> to do, either. I think a thorough conversation on this point is >> warranted as others I've talked to have preferred the modules to be >> separate. >> >> I'd be interested to hear what other have to say on this point. > > I did not follow the IRC discussion, so I confess I'm not totally > clear on what "integrate the hmmer.pm parser" means. I'm taking it > to mean combining the code that parses HMMER2 with the code that > parses HMMER3.= > But then "modularizing this a bit and loading the correct parser > depending on the requested format" seems to contradict that > assumption. > > Perhaps you (or someone) could clarify a bit what the HMMER2 - > HMMER3 integration would look like (and the goal of doing so) ? > I was not a part of that conversation either and I'm also operating under a similar assumption about what "integrating the hmmer.pm parser" means. I too am confused about the statement regarding modularization; I assume Kai meant that next_result would leverage the HMMER version number (which it already grabs) to guide the appropriate parsing of the datafile. Not thinking about this too carefully, it might be a simple as: next_result{ version = get_hmmer_version if version == 2 parse V2 report file if version == 3 parse V3 report file } to make the code a bit more manageable, the various version parsers could be appropriated to independent subroutines. Kai, is this along the lines of what you were thinking? If this is correct (that is, merging the H2 and H3 parsers into a single hmmer.pm module), I see one primary benefit - the end user need not specify which HMMER module they want to implement, just use Bio::SearchIO::hmmer - and one secondary benefit - there's enough similarity between H2 and H3 reports that some from the H2 parser redundantly appears in the H3 parser. There are certainly other benefits that I'm overlooking. The only real downside I see at the moment is that the hmmer.pm parser becomes a bit more complicated and bloated. But I suspect this can be remedied with careful partitioning of the code into appropriate subroutines and thorough documentation. I am a bit concerned about how the aforementioned H3 specific methods are incorporated, but that should be manageable. I wonder if anyone involved in the IRC discussion cares to weigh in? Regardless, I'd advocate getting the H3 version fully flushed out to deal with the issues brought up in the first half of this message prior to an attempt to merge the two modules, as the merging process may be affected by the structure of the H3 parser. Best, Tom From cjfields at illinois.edu Wed May 26 12:13:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 11:13:59 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: On May 26, 2010, at 9:03 AM, Dave Messina wrote: > > On May 26, 2010, at 15:27, Hilmar Lapp wrote: > >> That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) > > > That would be me. :) > > His commits were sitting in the fork queue, which I mistakenly understood to mean a pull request had been made. Turns out that's not the case (See http://github.com/blog/270-the-fork-queue). > > > Dave We can clarify that in the docs on the bioperl site, maybe in a github-specific section. chris From cjfields at illinois.edu Wed May 26 12:17:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 11:17:50 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: <3826604E-CD90-42A5-A0B2-004D9922B6AA@illinois.edu> On May 26, 2010, at 10:25 AM, Thomas Sharpton wrote: >> ... >> I did not follow the IRC discussion, so I confess I'm not totally clear on what "integrate the hmmer.pm parser" means. I'm taking it to mean combining the code that parses HMMER2 with the code that parses HMMER3.= > >> But then "modularizing this a bit and loading the correct parser depending on the requested format" seems to contradict that assumption. >> >> Perhaps you (or someone) could clarify a bit what the HMMER2 - HMMER3 integration would look like (and the goal of doing so) ? >> > > I was not a part of that conversation either and I'm also operating under a similar assumption about what "integrating the hmmer.pm parser" means. I too am confused about the statement regarding modularization; I assume Kai meant that next_result would leverage the HMMER version number (which it already grabs) to guide the appropriate parsing of the datafile. Not thinking about this too carefully, it might be a simple as: > > next_result{ > version = get_hmmer_version > if version == 2 > parse V2 report file > if version == 3 > parse V3 report file > } > > to make the code a bit more manageable, the various version parsers could be appropriated to independent subroutines. > > Kai, is this along the lines of what you were thinking? > > If this is correct (that is, merging the H2 and H3 parsers into a single hmmer.pm module), I see one primary benefit - the end user need not specify which HMMER module they want to implement, just use Bio::SearchIO::hmmer - and one secondary benefit - there's enough similarity between H2 and H3 reports that some from the H2 parser redundantly appears in the H3 parser. There are certainly other benefits that I'm overlooking. > > The only real downside I see at the moment is that the hmmer.pm parser becomes a bit more complicated and bloated. But I suspect this can be remedied with careful partitioning of the code into appropriate subroutines and thorough documentation. I am a bit concerned about how the aforementioned H3 specific methods are incorporated, but that should be manageable. > > I wonder if anyone involved in the IRC discussion cares to weigh in? > > Regardless, I'd advocate getting the H3 version fully flushed out to deal with the issues brought up in the first half of this message prior to an attempt to merge the two modules, as the merging process may be affected by the structure of the H3 parser. > > Best, > Tom That's essentially the idea, though it can be cleaner than that if we're expecting the entire stream of reports will be of the same version (set the proper next_result method at instantiation). SearchIO::infernal does something like this. Or it can call out to a handler, like SearchIO::blastxml. YMMV. chris From maj at fortinbras.us Wed May 26 13:43:37 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 May 2010 13:43:37 -0400 Subject: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail><9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net><87zkznb4nz.fsf@topper.koldfront.dk><1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: <85C731A2326D45FB903FB1B0D5C5DEBF@NewLife> No zeal is is overweening that is on the side of the Right. ----- Original Message ----- From: "Dave Messina" To: "Hilmar Lapp" Cc: "Adam Sj?gren" ; Sent: Wednesday, May 26, 2010 10:03 AM Subject: Re: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) > > On May 26, 2010, at 15:27, Hilmar Lapp wrote: > >> That would be my understanding too. Maybe some overzealous Bioperl gitizens >> at work who weren't going to wait for this? ;) > > > That would be me. :) > > His commits were sitting in the fork queue, which I mistakenly understood to > mean a pull request had been made. Turns out that's not the case (See > http://github.com/blog/270-the-fork-queue). > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Wed May 26 15:03:21 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:03:21 +0200 Subject: [Bioperl-l] new to this group In-Reply-To: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: Hi Kanna, Welcome! We're always happy to have more people jump in the deep end of the pool and help out. >From my reading of the project priority page, the OBO file parsing stuff has been done: > (This appears to be basically solved with the new OBOEngine, Sohel will need to comment if it is indeed finished). --jason stajich 20:10, 19 June 2006 (EDT) ( see http://www.bioperl.org/wiki/Project_priority_list#Ontology_file_parsing ) Can anyone (Hilmar?) who knows where we're at with this verify that our OBO parser is in good shape? I did notice this open bug, Kanna: bp_load_ontology ISBN title parsing error in OBO format http://bugzilla.open-bio.org/show_bug.cgi?id=2730 Is that something you might be interested in? > I visited the project priority list and the page seems to have been modified around 6 months ago. Agreed, it's probably time for someone to go through and update it. I'll post to the list separately about this. > If it is already completed could anyone suggest modules I can contribute to? But even though the project priority list is outdated, the open bugs list is not: http://bugzilla.open-bio.org/buglist.cgi?product=Bioperl&bug_status=NEW I would recommend you look for something relatively small to start with and submit a patch for that. And then as you go along we'll get a better idea of how to direct you as you get a better idea of what needs to be done. Dave From David.Messina at sbc.su.se Wed May 26 15:22:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:22:40 +0200 Subject: [Bioperl-l] project priority list Message-ID: <0DC6E827-8855-4463-8C58-79CC26BDF42D@sbc.su.se> So, as pointed out by Kanna in another thread, our Project Priority list is getting a little stale. http://www.bioperl.org/wiki/Project_priority_list There are lot of things on there that have been crossed off for years now. I propose that we do some housecleaning, including deleting long-finished projects from the list. (They'll still live on in the wiki history of the page.) Unless someone objects, I'll start poking at it a bit, but if other core devs with relevant knowledge of various projects could take a moment to peruse and edit too, that would be great. Dave From jay at jays.net Wed May 26 15:27:01 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 26 May 2010 14:27:01 -0500 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: <1D273263-F9B4-4612-961B-E2B0F480FBC3@jays.net> On May 26, 2010, at 2:03 PM, Dave Messina wrote: > I would recommend you look for something relatively small to start with and submit a patch for that. Ideally "submit a patch" means create a github.com account, click "fork" on the bioperl-live repo, commit your changes into your fork, then send us a "pull request". :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From scott at scottcain.net Wed May 26 15:36:16 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 26 May 2010 15:36:16 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git Message-ID: Hi all, For GBrowse on the 1.X branch there is a network install script that people can download and execute and it will install all of the prerequisites and then install GBrowse. For this script, we also support a -d(eveloper) option, to get GBrowse and BioPerl from their repositories. Now that BioPerl has moved to git, I have a question: does anybody know if there is a way (preferably via url) to get bioperl from git in a non-interactive way? The read-only url on the bioperl-live git page, http://github.com/bioperl/bioperl-live.git, leads to a 404 error, and even if it didn't, I have a feeling that it would take a click or two to get to downloading source. Does anybody with more git-fu than me (which isn't a hard thing to have, since I don't have much) have any suggestions? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From David.Messina at sbc.su.se Wed May 26 15:41:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:41:10 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: <1F539D4E-D352-4F93-AF1E-E9324B970D34@sbc.su.se> > We can clarify that in the docs on the bioperl site, maybe in a github-specific section. I've stubbed it in on Using Git http://www.bioperl.org/wiki/Using_Git Please modify or expand as you see fit. Dave From scott at scottcain.net Wed May 26 15:57:21 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 26 May 2010 15:57:21 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: Also on the bioperl git page is a "download master" link, which pops up a cute javascript window offering me a choice of zip or tar files. If I copy the url of the tar file, I get a page that says: You are being redirected. where presumably, the digits after "bioperl-release" will change on a regular basis (right?), so that doesn't help much either (yes, I know I could parse the redirect message and get that url, but really, is there such a thing as a HEAD url?) Thanks, Scott On Wed, May 26, 2010 at 3:36 PM, Scott Cain wrote: > Hi all, > > For GBrowse on the 1.X branch there is a network install script that > people can download and execute and it will install all of the > prerequisites and then install GBrowse. ?For this script, we also > support a -d(eveloper) option, to get GBrowse and BioPerl from their > repositories. ?Now that BioPerl has moved to git, I have a question: > does anybody know if there is a way (preferably via url) to get > bioperl from git in a non-interactive way? > > The read-only url on the bioperl-live git page, > http://github.com/bioperl/bioperl-live.git, leads to a 404 error, and > even if it didn't, I have a feeling that it would take a click or two > to get to downloading source. ?Does anybody with more git-fu than me > (which isn't a hard thing to have, since I don't have much) have any > suggestions? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From kai.blin at biotech.uni-tuebingen.de Wed May 26 16:07:02 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Wed, 26 May 2010 22:07:02 +0200 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: <1274904422.3019.2.camel@gonzo.home.kblin.org> On Wed, 2010-05-26 at 15:36 -0400, Scott Cain wrote: Hi Scott, > For GBrowse on the 1.X branch there is a network install script that > people can download and execute and it will install all of the > prerequisites and then install GBrowse. For this script, we also > support a -d(eveloper) option, to get GBrowse and BioPerl from their > repositories. Now that BioPerl has moved to git, I have a question: > does anybody know if there is a way (preferably via url) to get > bioperl from git in a non-interactive way? A quick look on the "BioPerl moved to git" announcement (http://news.open-bio.org/news/2010/05/bioperl-has-moved-to-github/) you can find the following link: http://github.com/bioperl/bioperl-live/archives/master This page gives links to a zip and a tar version of BioPerl's master repository, which seems to be what you want. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From David.Messina at sbc.su.se Wed May 26 16:09:22 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 22:09:22 +0200 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Hi Scott, I think the URLs you want are these http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots snapshots of the current repository. If you want instead to grab a static version of a repository, say a tagged revision, you can do like this: http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 (where "for_gmod_0_003" is the tag). By the way, I am getting these URLs on GitHub by: 1. going to the GitHub page for the relevant repository e.g. http://github.com/bioperl/bioperl-live 2. navigating to the tag or branch of interest using the "Switch Branches" or "Switch Tags" pulldowns 3. clicking on the Download Source button 4. right-clicking on the big TAR icon to copy the link underlying it Dave From rmb32 at cornell.edu Wed May 26 16:48:13 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 26 May 2010 13:48:13 -0700 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: <4BFD890D.4080205@cornell.edu> Sigh .... once we get our house in order to the point where it's easy to and quick to make releases with bugfixes, you'll be able to just get the most recent copies of the parts you need from CPAN. That'll be the day. Rob From hlapp at drycafe.net Wed May 26 18:05:36 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 26 May 2010 16:05:36 -0600 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: On May 26, 2010, at 1:03 PM, Dave Messina wrote: > Can anyone (Hilmar?) who knows where we're at with this verify that > our OBO parser is in good shape? The obo parser should be working. It's not wrapping the go-perl parser though. I should revisit the code I've written for that, I know ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Wed May 26 19:27:27 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 18:27:27 -0500 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> On May 26, 2010, at 5:05 PM, Hilmar Lapp wrote: > > On May 26, 2010, at 1:03 PM, Dave Messina wrote: > >> Can anyone (Hilmar?) who knows where we're at with this verify that our OBO parser is in good shape? > > > The obo parser should be working. It's not wrapping the go-perl parser though. I should revisit the code I've written for that, I know ... > > -hilmar So, that might be an area for someone to work on? chris From hlapp at drycafe.net Thu May 27 09:30:05 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 27 May 2010 07:30:05 -0600 Subject: [Bioperl-l] new to this group In-Reply-To: <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> Message-ID: <292C7384-2EF0-45F7-85F9-BB173FE2B6E5@drycafe.net> On May 26, 2010, at 5:27 PM, Chris Fields wrote: >> The obo parser should be working. It's not wrapping the go-perl >> parser though. I should revisit the code I've written for that, I >> know ... >> > > So, that might be an area for someone to work on? Certainly if you want to start from scratch. The code I've written isn't committed (yes, shame on me). That said, I suppose I could now easily commit it to a branch and not cause any harm, right :-) It's not a very good target for a newcomer at all, though. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From kai.blin at biotech.uni-tuebingen.de Thu May 27 10:50:40 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 27 May 2010 16:50:40 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: <1274971840.9545.316.camel@mikropc7.biotech.uni-tuebingen.de> On Wed, 2010-05-26 at 08:25 -0700, Thomas Sharpton wrote: > > Having not considered it too much, I'm not sure how to accomplish > > this without breaking the SearchIO idiom. But presumably a way could > > be found. > > > > I'll see if I can't hit the drawing board and come up with a naming > scheme for additional H3 methods that retrieve some of the extra data > encoded in the new reports. It *probably* makes most sense, at least > from the standpoint of the user's perspective, to adopt the full- > length report values as the standard hit->significance and hit- > >raw_score while having something like hit->best_significance and hit- > >best_score as H3 methods that return the best-domain report values. > Again, this could use some thought/discussion. My reasoning for the change was that you can get at the best sequence score by (at worst) iterating over the top sequences. Without the change there was no way to get at the overall profile score, so that data was lost. Arguably this is just one way to try and make the data from the HMMer results accessible via the SearchIO interface. > I was not a part of that conversation either and I'm also operating > under a similar assumption about what "integrating the hmmer.pm > parser" means. I too am confused about the statement regarding > modularization; I assume Kai meant that next_result would leverage the > HMMER version number (which it already grabs) to guide the appropriate > parsing of the datafile. Not thinking about this too carefully, it > might be a simple as: > > next_result{ > version = get_hmmer_version > if version == 2 > parse V2 report file > if version == 3 > parse V3 report file > } > > to make the code a bit more manageable, the various version parsers > could be appropriated to independent subroutines. > > Kai, is this along the lines of what you were thinking? Yes, this is more or less what I meant. But I agree that we first want to get the hmmer3 parser sorted out and working nicely. More test cases for the parser would be nice, I just got sidetracked by another bug affecting my code. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From scott at scottcain.net Thu May 27 11:29:42 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 27 May 2010 11:29:42 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: Hi All, Thanks for pointing out the links. It's weird: using curl on those urls retrieves a "redirect" page, whereas LWP::Simple::mirror gets the tarball. Anyway, the script works again :-) Scott On Wed, May 26, 2010 at 4:09 PM, Dave Messina wrote: > Hi Scott, > > I think the URLs you want are these > > ? ? ? ?http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots > > snapshots of the current repository. > > > If you want instead to grab a static version of a repository, say a tagged revision, you can do like this: > > http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 > > (where "for_gmod_0_003" is the tag). > > > By the way, I am getting these URLs on GitHub by: > > 1. ?going to the GitHub page for the relevant repository > > ? ? ? ?e.g. http://github.com/bioperl/bioperl-live > > 2. ?navigating to the tag or branch of interest using the "Switch Branches" or "Switch Tags" pulldowns > > 3. ?clicking on the Download Source button > > 4. ?right-clicking on the big TAR icon to copy the link underlying it > > > > Dave > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Thu May 27 11:40:37 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 May 2010 11:40:37 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Message-ID: Chris, Removed all erroneous references to Subversion except for these pages, which require detailed editing and/or a familiarity with Git: http://www.bioperl.org/wiki/Emacs_bioperl-mode http://www.bioperl.org/wiki/HOWTO:Wrappers http://www.bioperl.org/wiki/Making_a_BioPerl_release http://www.bioperl.org/w/index.php/HOWTO:BlastPlus One issue now is the references to pedigree, microarray, GUI, pipeline, and ext, which only exist in SVN. Also GUI, pipeline, and microarray are unsupported, and have been unsupported for many years. Yet they are still listed in pages like: http://www.bioperl.org/wiki/Getting_BioPerl They shouldn't be listed alongside bioperl-live or -run, or they should not be listed at all. Should they be removed? or put into their own "unsupported" section? Brian O. On May 20, 2010, at 11:37 AM, Chris Fields wrote: > Yes, if you have time. I have started along that path already, but I'm sure there are lingering spots where links point to the wrong place, or subversion/svn is mentioned. > > chris > > On May 20, 2010, at 10:34 AM, Brian Osborne wrote: > >> Chris, >> >> Done, easy. Should I remove all references to SVN from the Wiki? >> >> Brian O. >> >> On May 18, 2010, at 2:04 PM, Chris Fields wrote: >> >>> Yes. >>> >>> chris >>> >>> On May 18, 2010, at 11:06 AM, Brian Osborne wrote: >>> >>>> bioperl-l, >>>> >>>> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >>>> >>>> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >>>> >>>> Brian O. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From cjfields at illinois.edu Thu May 27 11:58:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 May 2010 10:58:06 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Message-ID: On May 27, 2010, at 10:40 AM, Brian Osborne wrote: > Chris, > > Removed all erroneous references to Subversion except for these pages, which require detailed editing and/or a familiarity with Git: > > http://www.bioperl.org/wiki/Emacs_bioperl-mode > > http://www.bioperl.org/wiki/HOWTO:Wrappers > > http://www.bioperl.org/wiki/Making_a_BioPerl_release > > http://www.bioperl.org/w/index.php/HOWTO:BlastPlus Okay, looks good so far. I know the emacs mode stuff will be handled by Mark (I'm assuming the others will follow suit). I'll have to go in and clean up the 'making a release' page myself to update it. > One issue now is the references to pedigree, microarray, GUI, pipeline, and ext, which only exist in SVN. By 'only existing in svn', do you mean they are only found there? I moved everything over for archiving: http://github.com/bioperl/bioperl-gui http://github.com/bioperl/bioperl-microarray http://github.com/bioperl/bioperl-pedigree http://github.com/bioperl/bioperl-pipeline > Also GUI, pipeline, and microarray are unsupported, and have been unsupported for many years. Yet they are still listed in pages like: > > http://www.bioperl.org/wiki/Getting_BioPerl > > They shouldn't be listed alongside bioperl-live or -run, or they should not be listed at all. > > Should they be removed? or put into their own "unsupported" section? I think to an 'unsupported' or 'unmaintained' section; could add the corba and pise ones as well (just noticed that the pise repo was missing from github, so just added it for archiving). > Brian O. Thanks brian! chris From sdavis2 at mail.nih.gov Thu May 27 12:04:04 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 27 May 2010 12:04:04 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: On Thu, May 27, 2010 at 11:29 AM, Scott Cain wrote: > Hi All, > > Thanks for pointing out the links. It's weird: using curl on those > urls retrieves a "redirect" page, whereas LWP::Simple::mirror gets the > tarball. Anyway, the script works again :-) > > Hi, Scott. For curl, try: curl -L .... The -L follows redirects. Sean > > On Wed, May 26, 2010 at 4:09 PM, Dave Messina > wrote: > > Hi Scott, > > > > I think the URLs you want are these > > > > http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots > > > > snapshots of the current repository. > > > > > > If you want instead to grab a static version of a repository, say a > tagged revision, you can do like this: > > > > http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 > > > > (where "for_gmod_0_003" is the tag). > > > > > > By the way, I am getting these URLs on GitHub by: > > > > 1. going to the GitHub page for the relevant repository > > > > e.g. http://github.com/bioperl/bioperl-live > > > > 2. navigating to the tag or branch of interest using the "Switch > Branches" or "Switch Tags" pulldowns > > > > 3. clicking on the Download Source button > > > > 4. right-clicking on the big TAR icon to copy the link underlying it > > > > > > > > Dave > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From remi.planel at free.fr Fri May 28 06:29:50 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 12:29:50 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult Message-ID: <4BFF9B1E.10500@free.fr> Hi all, I would like to get a clone of a Bio::Search::Result::GenericResult object and I'm not sure of what I'm doing ... I've tried something like : /my $searchIn = Bio::SearchIO->new( -file => 'result.bls', -format => 'blastxml', ); my $result = $searchIn->next_result; my $result_copy = $result->new($result); /It seems to work but I'm not sure to understand how. So I would like to know if I'll get in trouble using this code and if all the fields are copied one by one. Thank you, R?mi // From David.Messina at sbc.su.se Fri May 28 07:32:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 28 May 2010 13:32:40 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFF9B1E.10500@free.fr> References: <4BFF9B1E.10500@free.fr> Message-ID: Hi R?mi, As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). So I don't think the code you showed will work. However, there are modules such as Clone::More and Clone::Fast that can do it. http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. Dave From remi.planel at free.fr Fri May 28 08:17:01 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 14:17:01 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: References: <4BFF9B1E.10500@free.fr> Message-ID: <4BFFB43D.50409@free.fr> You're right, it's not working there is some missing fields ... Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : -Display Result object as HTML -Ask for filter criteria -Filter Result object -Display filtered Result object as HTML. ... etc And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. I'll have a look to the modules you've mentioned, thanks. Dave Messina wrote: > Hi R?mi, > > As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). > > So I don't think the code you showed will work. > > However, there are modules such as Clone::More and Clone::Fast that can do it. > > http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm > http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm > > > Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. > > Dave > > > From cjfields at illinois.edu Fri May 28 09:25:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 28 May 2010 08:25:54 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFFB43D.50409@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> Message-ID: <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> Remi, Using the constructor that way is not supported. But it's completely unnecessary. Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. Something like the following should work (of course completely untested :) my $result = $in->next_result; # filter on HSP write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); # rewind the result to go back to the beginning $result->rewind; # open a new filehandle here for second report output # filter on hit and HSP write_html('result2.html', $result, { 'HIT' => \&hit_filter, 'HSP' => \&hsp_filter }); # rewind the result to go back to the beginning $result->rewind; # and so on.... sub write_html { my ($file, $result, $filters) = @_; # note that $filter is a hash ref above my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new (-filters => $filters ); my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); $out->write_result($result); } sub hsp_filter { my $hsp = shift; return 1 if $hsp->length('total') > 100; } sub hit_filter { my $hit = shift; return 1 if $hit->significance < 1e-5; } chris On May 28, 2010, at 7:17 AM, Remi wrote: > You're right, it's not working there is some missing fields ... > > Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : > > -Display Result object as HTML > -Ask for filter criteria > -Filter Result object > -Display filtered Result object as HTML. > ... etc > > And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. > > I'll have a look to the modules you've mentioned, thanks. > > > > > Dave Messina wrote: >> Hi R?mi, >> >> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >> >> So I don't think the code you showed will work. >> >> However, there are modules such as Clone::More and Clone::Fast that can do it. >> >> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >> >> >> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >> >> Dave >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri May 28 10:34:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 28 May 2010 09:34:13 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFFD3D5.2000409@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> Message-ID: Let us know how it goes, and if you run into any bugs. chris On May 28, 2010, at 9:31 AM, Remi wrote: > Thank you very much !!!! > I'm gonna try it right away > > Chris Fields wrote: >> Remi, >> >> Using the constructor that way is not supported. But it's completely unnecessary. >> >> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >> >> Something like the following should work (of course completely untested :) >> >> my $result = $in->next_result; >> >> # filter on HSP >> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >> >> # rewind the result to go back to the beginning >> $result->rewind; >> >> # open a new filehandle here for second report output >> # filter on hit and HSP >> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >> 'HSP' => \&hsp_filter }); >> >> # rewind the result to go back to the beginning >> $result->rewind; >> >> # and so on.... >> >> sub write_html { >> my ($file, $result, $filters) = @_; >> # note that $filter is a hash ref above >> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >> (-filters => $filters ); >> >> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >> $out->write_result($result); >> } >> >> sub hsp_filter { >> my $hsp = shift; >> return 1 if $hsp->length('total') > 100; >> } >> >> sub hit_filter { >> my $hit = shift; >> return 1 if $hit->significance < 1e-5; >> } >> >> chris >> >> >> On May 28, 2010, at 7:17 AM, Remi wrote: >> >> >> >>> You're right, it's not working there is some missing fields ... >>> >>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>> >>> -Display Result object as HTML >>> -Ask for filter criteria >>> -Filter Result object >>> -Display filtered Result object as HTML. >>> ... etc >>> >>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>> >>> I'll have a look to the modules you've mentioned, thanks. >>> >>> >>> >>> >>> Dave Messina wrote: >>> >>> >>>> Hi R?mi, >>>> >>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>> >>>> So I don't think the code you showed will work. >>>> >>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>> >>>> >>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>> >>>> >>>> >>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>> >>>> Dave >>>> >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> > From remi.planel at free.fr Fri May 28 10:31:49 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 16:31:49 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> Message-ID: <4BFFD3D5.2000409@free.fr> An HTML attachment was scrubbed... URL: From fij at elte.hu Sun May 30 05:32:58 2010 From: fij at elte.hu (Farkas, Illes) Date: Sun, 30 May 2010 11:32:58 +0200 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) Message-ID: Hi, I've ran across a relatively simple, but specific task. I would like to put interaction (, , ) data from many sources (databases) into a single list containing the following in each record: , , , . (I am aware that there will be some loss during the ID conversion.) I have found so far the following possibilities: (1) BioMart perl API. Seems to be much smarter (and more complex) than what I would need. Also, I would need to parse input and output just as much as with newly written subroutines/modules. (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and KEGG IDs, but I could not find them on the "From" list. (3) Synergizer. I cannot run it in remote batch mode. From what I would need I could not find BioGrid, ENSP and KEGG identifiers. (4) Writing it all with ID mapping files downloaded from each database and contributing it to BioPerl. How can I contribute? How do I find the best place within BioPerl to add a particular module? Whom do I need to ask for approval? Thanks in advance for any comments. Illes -- http://hal.elte.hu/fij From maj at fortinbras.us Sun May 30 09:42:50 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 30 May 2010 09:42:50 -0400 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: Message-ID: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Illes-- no approval necessary (or, if you like, I approve). What you can do is describe what you want to do as an enhancement request at http://bugzilla.bioperl.org, and then attach your new code to that request. We can review it from there. cheers MAJ ----- Original Message ----- From: "Farkas, Illes" To: Sent: Sunday, May 30, 2010 5:32 AM Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > Hi, > > I've ran across a relatively simple, but specific task. I would like to put > interaction (, , ) data from many sources > (databases) into a single list containing the following in each record: > , , , > . (I am aware that there will be some loss during the ID > conversion.) > > I have found so far the following possibilities: > > (1) BioMart perl API. Seems to be much smarter (and more complex) than what > I would need. Also, I would need to parse input and output just as much as > with newly written subroutines/modules. > > (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and > KEGG IDs, but I could not find them on the "From" list. > > (3) Synergizer. I cannot run it in remote batch mode. From what I would need > I could not find BioGrid, ENSP and KEGG identifiers. > > (4) Writing it all with ID mapping files downloaded from each database and > contributing it to BioPerl. How can I contribute? How do I find the best > place within BioPerl to add a particular module? Whom do I need to ask for > approval? > > Thanks in advance for any comments. > Illes > > -- > http://hal.elte.hu/fij > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun May 30 11:00:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 30 May 2010 10:00:09 -0500 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> References: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Message-ID: Another couple of options: 1) for code changes, fork the code on GitHub, add your code there, then make a push request 2) for adding code, create a repo on github with the code, chris On May 30, 2010, at 8:42 AM, Mark A. Jensen wrote: > Illes-- no approval necessary (or, if you like, I approve). What you can do is describe what you want to do as an enhancement request at http://bugzilla.bioperl.org, and then attach your new code to that request. We can review it from there. > cheers MAJ > ----- Original Message ----- From: "Farkas, Illes" > To: > Sent: Sunday, May 30, 2010 5:32 AM > Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > > >> Hi, >> >> I've ran across a relatively simple, but specific task. I would like to put >> interaction (, , ) data from many sources >> (databases) into a single list containing the following in each record: >> , , , >> . (I am aware that there will be some loss during the ID >> conversion.) >> >> I have found so far the following possibilities: >> >> (1) BioMart perl API. Seems to be much smarter (and more complex) than what >> I would need. Also, I would need to parse input and output just as much as >> with newly written subroutines/modules. >> >> (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and >> KEGG IDs, but I could not find them on the "From" list. >> >> (3) Synergizer. I cannot run it in remote batch mode. From what I would need >> I could not find BioGrid, ENSP and KEGG identifiers. >> >> (4) Writing it all with ID mapping files downloaded from each database and >> contributing it to BioPerl. How can I contribute? How do I find the best >> place within BioPerl to add a particular module? Whom do I need to ask for >> approval? >> >> Thanks in advance for any comments. >> Illes >> >> -- >> http://hal.elte.hu/fij >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun May 30 11:05:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 30 May 2010 10:05:37 -0500 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: Message-ID: <84D300DB-C22D-494E-ABAF-EBC10FEE0E7C@illinois.edu> On May 30, 2010, at 4:32 AM, Farkas, Illes wrote: > Hi, > > I've ran across a relatively simple, but specific task. I would like to put > interaction (, , ) data from many sources > (databases) into a single list containing the following in each record: > , , , > . (I am aware that there will be some loss during the ID > conversion.) > > I have found so far the following possibilities: > > (1) BioMart perl API. Seems to be much smarter (and more complex) than what > I would need. Also, I would need to parse input and output just as much as > with newly written subroutines/modules. Or, wondering whether you could create a set of BioPerl<->BioMart bridge modules. > (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and > KEGG IDs, but I could not find them on the "From" list. I added an id_mapper to Bio::DB::SwissProt that calls to this. It hasn't been broadly tested yet, but you are welcome to add more to it. Might also be useful to have a DB wrapper around a locally-built ID mapping database, which would give you more flexibility than the web interface. > (3) Synergizer. I cannot run it in remote batch mode. From what I would need > I could not find BioGrid, ENSP and KEGG identifiers. > > (4) Writing it all with ID mapping files downloaded from each database and > contributing it to BioPerl. How can I contribute? How do I find the best > place within BioPerl to add a particular module? Whom do I need to ask for > approval? > > Thanks in advance for any comments. > Illes A generalized ID mapping interface would be nice. You could also incorporate some of NCBI's eutils stuff along these lines, or their gi2acc mappings. chris From maj at fortinbras.us Sun May 30 19:59:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 30 May 2010 19:59:38 -0400 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Message-ID: <6553B9DFF86F472B8B2D0D8A72171056@NewLife> Yes, that's definitely the Way to Do It post-git- MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Farkas, Illes" ; Sent: Sunday, May 30, 2010 11:00 AM Subject: Re: [Bioperl-l] ID mapping (or: contributing to BioPerl) Another couple of options: 1) for code changes, fork the code on GitHub, add your code there, then make a push request 2) for adding code, create a repo on github with the code, chris On May 30, 2010, at 8:42 AM, Mark A. Jensen wrote: > Illes-- no approval necessary (or, if you like, I approve). What you can do is > describe what you want to do as an enhancement request at > http://bugzilla.bioperl.org, and then attach your new code to that request. We > can review it from there. > cheers MAJ > ----- Original Message ----- From: "Farkas, Illes" > To: > Sent: Sunday, May 30, 2010 5:32 AM > Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > > >> Hi, >> >> I've ran across a relatively simple, but specific task. I would like to put >> interaction (, , ) data from many sources >> (databases) into a single list containing the following in each record: >> , , , >> . (I am aware that there will be some loss during the ID >> conversion.) >> >> I have found so far the following possibilities: >> >> (1) BioMart perl API. Seems to be much smarter (and more complex) than what >> I would need. Also, I would need to parse input and output just as much as >> with newly written subroutines/modules. >> >> (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and >> KEGG IDs, but I could not find them on the "From" list. >> >> (3) Synergizer. I cannot run it in remote batch mode. From what I would need >> I could not find BioGrid, ENSP and KEGG identifiers. >> >> (4) Writing it all with ID mapping files downloaded from each database and >> contributing it to BioPerl. How can I contribute? How do I find the best >> place within BioPerl to add a particular module? Whom do I need to ask for >> approval? >> >> Thanks in advance for any comments. >> Illes >> >> -- >> http://hal.elte.hu/fij >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 31 09:23:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 31 May 2010 08:23:13 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4C037F22.3090209@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> Message-ID: <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> That sounds like a bug. Does filtering at the hit level work around this? sub hit_filter { my $hit = shift; # filter hsps here my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; @passing_hsps; } sub hsp_filter { # original filter } chris On May 31, 2010, at 4:19 AM, Remi wrote: > Hi, > > Everything is working well but there is still one point that giving me some trouble. > When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. > Is there a way to get rid of this description line ? > Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? > > Thanks, > > R?mi > > > Chris Fields wrote: >> Let us know how it goes, and if you run into any bugs. >> >> chris >> >> On May 28, 2010, at 9:31 AM, Remi wrote: >> >> >> >>> Thank you very much !!!! >>> I'm gonna try it right away >>> >>> Chris Fields wrote: >>> >>> >>>> Remi, >>>> >>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>> >>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>> >>>> Something like the following should work (of course completely untested :) >>>> >>>> my $result = $in->next_result; >>>> >>>> # filter on HSP >>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>> >>>> # rewind the result to go back to the beginning >>>> $result->rewind; >>>> >>>> # open a new filehandle here for second report output >>>> # filter on hit and HSP >>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>> 'HSP' => \&hsp_filter }); >>>> >>>> # rewind the result to go back to the beginning >>>> $result->rewind; >>>> >>>> # and so on.... >>>> >>>> sub write_html { >>>> my ($file, $result, $filters) = @_; >>>> # note that $filter is a hash ref above >>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>> (-filters => $filters ); >>>> >>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >>>> $out->write_result($result); >>>> } >>>> >>>> sub hsp_filter { >>>> my $hsp = shift; >>>> return 1 if $hsp->length('total') > 100; >>>> } >>>> >>>> sub hit_filter { >>>> my $hit = shift; >>>> return 1 if $hit->significance < 1e-5; >>>> } >>>> >>>> chris >>>> >>>> >>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>> >>>> >>>> >>>> >>>> >>>>> You're right, it's not working there is some missing fields ... >>>>> >>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>> >>>>> -Display Result object as HTML >>>>> -Ask for filter criteria >>>>> -Filter Result object >>>>> -Display filtered Result object as HTML. >>>>> ... etc >>>>> >>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>> >>>>> I'll have a look to the modules you've mentioned, thanks. >>>>> >>>>> >>>>> >>>>> >>>>> Dave Messina wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Hi R?mi, >>>>>> >>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>> >>>>>> So I don't think the code you showed will work. >>>>>> >>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>> >>>>>> >>>>>> >>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>> >>>>>> Dave >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> >>>>> >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >> >> >> > From remi.planel at free.fr Mon May 31 09:47:40 2010 From: remi.planel at free.fr (Remi) Date: Mon, 31 May 2010 15:47:40 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> Message-ID: <4C03BDFC.5050109@free.fr> Yes, at the hit level everything works fine. Actually, at the hsp level, the alignment part is not written to the HTML file but the description before the alignment and the description of the hit at the beginning of the file are written. I had a quick look to the code and I'm not sure this is a bug. Chris Fields wrote: > That sounds like a bug. Does filtering at the hit level work around this? > > sub hit_filter { > my $hit = shift; > # filter hsps here > my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; > @passing_hsps; > } > > sub hsp_filter { > # original filter > } > > chris > > On May 31, 2010, at 4:19 AM, Remi wrote: > > >> Hi, >> >> Everything is working well but there is still one point that giving me some trouble. >> When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. >> Is there a way to get rid of this description line ? >> Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? >> >> Thanks, >> >> R?mi >> >> >> Chris Fields wrote: >> >>> Let us know how it goes, and if you run into any bugs. >>> >>> chris >>> >>> On May 28, 2010, at 9:31 AM, Remi wrote: >>> >>> >>> >>> >>>> Thank you very much !!!! >>>> I'm gonna try it right away >>>> >>>> Chris Fields wrote: >>>> >>>> >>>> >>>>> Remi, >>>>> >>>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>>> >>>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>>> >>>>> Something like the following should work (of course completely untested :) >>>>> >>>>> my $result = $in->next_result; >>>>> >>>>> # filter on HSP >>>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>>> >>>>> # rewind the result to go back to the beginning >>>>> $result->rewind; >>>>> >>>>> # open a new filehandle here for second report output >>>>> # filter on hit and HSP >>>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>>> 'HSP' => \&hsp_filter }); >>>>> >>>>> # rewind the result to go back to the beginning >>>>> $result->rewind; >>>>> >>>>> # and so on.... >>>>> >>>>> sub write_html { >>>>> my ($file, $result, $filters) = @_; >>>>> # note that $filter is a hash ref above >>>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>>> (-filters => $filters ); >>>>> >>>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >>>>> $out->write_result($result); >>>>> } >>>>> >>>>> sub hsp_filter { >>>>> my $hsp = shift; >>>>> return 1 if $hsp->length('total') > 100; >>>>> } >>>>> >>>>> sub hit_filter { >>>>> my $hit = shift; >>>>> return 1 if $hit->significance < 1e-5; >>>>> } >>>>> >>>>> chris >>>>> >>>>> >>>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> You're right, it's not working there is some missing fields ... >>>>>> >>>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>>> >>>>>> -Display Result object as HTML >>>>>> -Ask for filter criteria >>>>>> -Filter Result object >>>>>> -Display filtered Result object as HTML. >>>>>> ... etc >>>>>> >>>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>>> >>>>>> I'll have a look to the modules you've mentioned, thanks. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Dave Messina wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Hi R?mi, >>>>>>> >>>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>>> >>>>>>> So I don't think the code you showed will work. >>>>>>> >>>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>>> >>>>>>> Dave >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> >>>>>> >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> > > From cjfields at illinois.edu Mon May 31 09:54:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 31 May 2010 08:54:22 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4C03BDFC.5050109@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> <4C03BDFC.5050109@free.fr> Message-ID: <454FE98D-4EE5-4DFB-A877-6DE7822C4DA4@illinois.edu> My concern is to ensure we aren't filtering twice as much (one at the hit level, one pass at the HSP level). It should be one pass. chris On May 31, 2010, at 8:47 AM, Remi wrote: > Yes, at the hit level everything works fine. > Actually, at the hsp level, the alignment part is not written to the HTML file but the description before the alignment and the description of the hit at the beginning of the file are written. > > I had a quick look to the code and I'm not sure this is a bug. > > Chris Fields wrote: >> That sounds like a bug. Does filtering at the hit level work around this? >> >> sub hit_filter { >> my $hit = shift; >> # filter hsps here >> my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; >> @passing_hsps; >> } >> >> sub hsp_filter { >> # original filter >> } >> >> chris >> >> On May 31, 2010, at 4:19 AM, Remi wrote: >> >> >>> Hi, >>> >>> Everything is working well but there is still one point that giving me some trouble. >>> When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. >>> Is there a way to get rid of this description line ? >>> Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? >>> >>> Thanks, >>> >>> R?mi >>> >>> >>> Chris Fields wrote: >>> >>>> Let us know how it goes, and if you run into any bugs. >>>> >>>> chris >>>> >>>> On May 28, 2010, at 9:31 AM, Remi wrote: >>>> >>>> >>>> >>>>> Thank you very much !!!! >>>>> I'm gonna try it right away >>>>> >>>>> Chris Fields wrote: >>>>> >>>>> >>>>>> Remi, >>>>>> >>>>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>>>> Something like the following should work (of course completely untested :) >>>>>> >>>>>> my $result = $in->next_result; >>>>>> >>>>>> # filter on HSP >>>>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>>>> >>>>>> # rewind the result to go back to the beginning >>>>>> $result->rewind; >>>>>> >>>>>> # open a new filehandle here for second report output >>>>>> # filter on hit and HSP >>>>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>>>> 'HSP' => \&hsp_filter }); >>>>>> >>>>>> # rewind the result to go back to the beginning >>>>>> $result->rewind; >>>>>> >>>>>> # and so on.... >>>>>> >>>>>> sub write_html { >>>>>> my ($file, $result, $filters) = @_; >>>>>> # note that $filter is a hash ref above >>>>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>>>> (-filters => $filters ); >>>>>> >>>>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); $out->write_result($result); >>>>>> } >>>>>> >>>>>> sub hsp_filter { my $hsp = shift; >>>>>> return 1 if $hsp->length('total') > 100; >>>>>> } >>>>>> >>>>>> sub hit_filter { my $hit = shift; >>>>>> return 1 if $hit->significance < 1e-5; >>>>>> } >>>>>> >>>>>> chris >>>>>> >>>>>> >>>>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> You're right, it's not working there is some missing fields ... >>>>>>> >>>>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>>>> >>>>>>> -Display Result object as HTML >>>>>>> -Ask for filter criteria >>>>>>> -Filter Result object >>>>>>> -Display filtered Result object as HTML. >>>>>>> ... etc >>>>>>> >>>>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>>>> >>>>>>> I'll have a look to the modules you've mentioned, thanks. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Dave Messina wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi R?mi, >>>>>>>> >>>>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>>>> >>>>>>>> So I don't think the code you showed will work. >>>>>>>> >>>>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>>>> >>>>>>>> Dave >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> >>>>>>> >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From remi.planel at free.fr Mon May 31 05:19:30 2010 From: remi.planel at free.fr (Remi) Date: Mon, 31 May 2010 11:19:30 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> Message-ID: <4C037F22.3090209@free.fr> An HTML attachment was scrubbed... URL: From aradwen at gmail.com Sat May 1 06:45:18 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Sat, 1 May 2010 12:45:18 +0200 Subject: [Bioperl-l] Pfam_Scan Message-ID: Hello everyone, I would like to know if there is a way to cluster the output of Pfam_Scan results. I mean is we can parse it and then output clusters containing sequences sharing the same domains or Pfams. This is a bit special since we could have multidomains proteins inside, which rule we have to follow in this case ? Rad -- R. ANIBA From David.Messina at sbc.su.se Sat May 1 18:28:48 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 00:28:48 +0200 Subject: [Bioperl-l] Pfam_Scan In-Reply-To: References: Message-ID: <6CA3B4F2-CF3E-45DD-BE51-9F7218C5CEE9@sbc.su.se> Hi Rad, As far as I can tell the Pfam_Scan output is simply tab-delimited text (see details below), so you should be able to group sequences which share domains by sorting on the sixth column. I suspect that sequences with multiple domain hits will have multiple lines in the output, one per hit, so if you want to identify sequences which share the same _set_ of domains you will have to do the bookkeeping yourself. That being said, Pfam_Scan is not part of BioPerl ? it's distributed by the Pfam team ? so you may want to contact them directly for help (pfam-help at sanger.ac.uk). Dave [from the Pfam_Scan documentation] The output format is: Example output (with -pfamB, -as options): Q5NEL3.1 2 224 2 227 PB013481 Pfam-B_13481 Pfam-B 1 184 226 358.5 1.4e-107 NA NA O65039.1 38 93 38 93 PF08246 Inhibitor_I29 Domain 1 58 58 45.9 2.8e-12 1 No_clan O65039.1 126 342 126 342 PF00112 Peptidase_C1 Domain 1 216 216 296.0 1.1e-88 1 CL0125 predicted_active_site[150,285,307] From David.Messina at sbc.su.se Sun May 2 04:54:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 10:54:54 +0200 Subject: [Bioperl-l] RFC: SNP::Inherit In-Reply-To: References: Message-ID: Hi Christopher, Looks good! The only recommendation I would make is to change the namespace to Bio::SNP::Inherit. The convention on CPAN is to minimize the number of new toplevel namespaces (which SNP would be), and although many of the Bio::* modules are part of BioPerl, that namespace is not restricted to BioPerl and there are plenty of non-BioPerl packages there. Dave On Apr 29, 2010, at 10:26 PM, Christopher Bottoms wrote: > Dear Bioperl community, > > I was thinking of uploading a module to CPAN that converts SNP genotype data > to parental allele designations. Below is the perldoc. This is not a > "BioPerl" module per se, so I'm not sure what namespace to put it under. > > I would be glad to send anyone the source if they are interested in checking > it out more. I just did not want to send everyone an unsolicited attachment. > > Thank you for your time, > Christopher Bottoms (molecules) > From David.Messina at sbc.su.se Sun May 2 05:59:07 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 11:59:07 +0200 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <4BDA986D.3020302@bii.a-star.edu.sg> References: <4BDA986D.3020302@bii.a-star.edu.sg> Message-ID: <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> Hi Dimitar, The syntax you want is: # Build a Genewise alignment factory my $factory = Bio::Tools::Run::Genewise->new(); # turn on the quiet switch $factory->QUIET(1); # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects my @genes = $factory->run($protein_seq, $genomic_seq); This turns out be incorrectly documented on the man page, at least in part: > Available Params: > > NB: These should be passed without the '-' or they will be ignored, > except switches such as 'hmmer' (which have no corresponding value) > which should be set on the factory object using the AUTOLOADed methods > of the same name. > > Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] > Alg [-kbyte,-alg] > HMM [-hmmer] > Output [-gff,-gener,-alb,-pal,-block,-divide] > Standard [-help,-version,-silent,-quiet,-errorlog] That is, these don't work as expected: $factory->quiet; $factory->quiet(1); due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. Dave From maj at fortinbras.us Sun May 2 15:28:22 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 2 May 2010 15:28:22 -0400 Subject: [Bioperl-l] new core developers Rob Buels and Dave Messina Message-ID: Hi Folks, On behalf of the core team, I am delighted to announce two new members: Rob Buels and Dave Messina. They are so, er, honored on the basis of their selfless work on the list, on IRC, in development of new modules and their active and sustained participation in BioPerl maintenance, design and promotion. Welcome Rob and Dave! MAJ and the BioPerl core developers From skastu01 at students.poly.edu Sun May 2 22:41:04 2010 From: skastu01 at students.poly.edu (Lakshmi Kastury) Date: Mon, 3 May 2010 02:41:04 +0000 Subject: [Bioperl-l] Using BIO::SEARCHIO Message-ID: I am attempting to use the BIO::SEARCHIO system to parse a Blast output file. A new instance is he file is read through the following: my $input = new BIO::SearchIO (-file =>'blast_report_0.txt', -format =>'blast'); When I run my program, I receive the following message: "Can't locate object method "new" via package "BIO::SearchIO" (perhaps you forgot to load "BIO::SearchIO"? Is this an optional module which needs to be installed separately? Thanks, Lakshmi Kastury From maj at fortinbras.us Sun May 2 22:57:28 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 2 May 2010 22:57:28 -0400 Subject: [Bioperl-l] Using BIO::SEARCHIO In-Reply-To: References: Message-ID: you need to say "Bio::SearchIO", and not "BIO::SearchIO" MAJ ----- Original Message ----- From: "Lakshmi Kastury" To: Sent: Sunday, May 02, 2010 10:41 PM Subject: [Bioperl-l] Using BIO::SEARCHIO > > > > > > > > > > > > I am attempting to use the BIO::SEARCHIO system to parse a Blast output file. > > A new instance is he file is read through the following: > my $input = new BIO::SearchIO (-file =>'blast_report_0.txt', -format > =>'blast'); > > When I run my program, I receive the following message: > "Can't locate object method "new" via package "BIO::SearchIO" (perhaps you > forgot to load "BIO::SearchIO"? > > Is this an optional module which needs to be installed separately? > > > > Thanks, > Lakshmi Kastury > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 3 00:22:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 2 May 2010 23:22:46 -0500 Subject: [Bioperl-l] Full bioperl-live github demo Message-ID: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> All, I have pushed a demo of the bioperl-live (all branches and tags) to github here: http://github.com/bioperl/bioperl-test This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. chris From heikki.lehvaslaiho at gmail.com Mon May 3 07:45:10 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 3 May 2010 14:45:10 +0300 Subject: [Bioperl-l] BLAST parsing broken Message-ID: Chris, latest additions to Bio::SearchIO::blast.pm broke the parsing of normal blast output. $result->query_name returns now undef. (Using the anonymous git now). This change still works: commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 Author: cjfields Date: Sun Dec 20 04:39:58 2009 +0000 Robson's patch for buggy blastpgp output But this does not: commit 9a89c3434597104dd50553e3562983d78d14a544 Author: cjfields Date: Thu Apr 15 04:21:17 2010 +0000 [bug 3031] patches for catching algorithm ref, courtesy Razi Khaja. That makes it easy to find the diffs: $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm index 378023a..6f7eeeb 100644 --- a/Bio/SearchIO/blast.pm +++ b/Bio/SearchIO/blast.pm @@ -209,6 +209,7 @@ BEGIN { 'BlastOutput_program' => 'RESULT-algorithm_name', 'BlastOutput_version' => 'RESULT-algorithm_version', + 'BlastOutput_algorithm-reference' => 'RESULT-algorithm_reference', 'BlastOutput_query-def' => 'RESULT-query_name', 'BlastOutput_query-len' => 'RESULT-query_length', 'BlastOutput_query-acc' => 'RESULT-query_accession', @@ -504,6 +505,26 @@ sub next_result { } ); } + # parse the BLAST algorithm reference + elsif(/^Reference:\s+(.*)$/) { + # want to preserve newlines for the BLAST algorithm reference + my $algorithm_reference = "$1\n"; + $_ = $self->_readline; + # while the current line, does not match an empty line, a RID:, or a Database:, we are still looking at the + # algorithm_reference, append it to what we parsed so far + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { + $algorithm_reference .= "$_"; + $_ = $self->_readline; + } + # if we exited the while loop, we saw an empty line, a RID:, or a Database:, so push it back + $self->_pushback($_); + $self->element( + { + 'Name' => 'BlastOutput_algorithm-reference', + 'Data' => $algorithm_reference + } + ); + } # added Windows workaround for bug 1985 elsif (/^(Searching|Results from round)/) { next unless $1 =~ /Results from round/; I am not sure why reference parsing messes things up. Maybe it eats too many lines from the result file. Yours, -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia From cjfields at illinois.edu Mon May 3 08:08:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 07:08:01 -0500 Subject: [Bioperl-l] BLAST parsing broken In-Reply-To: References: Message-ID: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Odd, I ran tests on that prior to commit. I'll work on fixing that (in svn of course, until the migration is complete). chris On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > Chris, > > latest additions to Bio::SearchIO::blast.pm broke the parsing of normal > blast output. $result->query_name returns now undef. > > (Using the anonymous git now). This change still works: > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > Author: cjfields > Date: Sun Dec 20 04:39:58 2009 +0000 > > Robson's patch for buggy blastpgp output > > But this does not: > > commit 9a89c3434597104dd50553e3562983d78d14a544 > Author: cjfields > Date: Thu Apr 15 04:21:17 2010 +0000 > > [bug 3031] > > patches for catching algorithm ref, courtesy Razi Khaja. > > That makes it easy to find the diffs: > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > index 378023a..6f7eeeb 100644 > --- a/Bio/SearchIO/blast.pm > +++ b/Bio/SearchIO/blast.pm > @@ -209,6 +209,7 @@ BEGIN { > > 'BlastOutput_program' => 'RESULT-algorithm_name', > 'BlastOutput_version' => 'RESULT-algorithm_version', > + 'BlastOutput_algorithm-reference' => 'RESULT-algorithm_reference', > 'BlastOutput_query-def' => 'RESULT-query_name', > 'BlastOutput_query-len' => 'RESULT-query_length', > 'BlastOutput_query-acc' => 'RESULT-query_accession', > @@ -504,6 +505,26 @@ sub next_result { > } > ); > } > + # parse the BLAST algorithm reference > + elsif(/^Reference:\s+(.*)$/) { > + # want to preserve newlines for the BLAST algorithm reference > + my $algorithm_reference = "$1\n"; > + $_ = $self->_readline; > + # while the current line, does not match an empty line, a RID:, > or a Database:, we are still looking at the > + # algorithm_reference, append it to what we parsed so far > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > + $algorithm_reference .= "$_"; > + $_ = $self->_readline; > + } > + # if we exited the while loop, we saw an empty line, a RID:, or > a Database:, so push it back > + $self->_pushback($_); > + $self->element( > + { > + 'Name' => 'BlastOutput_algorithm-reference', > + 'Data' => $algorithm_reference > + } > + ); > + } > # added Windows workaround for bug 1985 > elsif (/^(Searching|Results from round)/) { > next unless $1 =~ /Results from round/; > > > I am not sure why reference parsing messes things up. Maybe it eats too many > lines from the result file. > > Yours, > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon May 3 08:25:10 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 3 May 2010 08:25:10 -0400 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> Message-ID: Hi Chris, I attempted a clone and got the following. Is this my problem? thanks MAJ $ git clone http://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ Getting alternates list for http://github.com/bioperl/bioperl-test.git Getting pack list for http://github.com/bioperl/bioperl-test.git Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, May 03, 2010 12:22 AM Subject: [Bioperl-l] Full bioperl-live github demo > All, > > I have pushed a demo of the bioperl-live (all branches and tags) to github > here: > > http://github.com/bioperl/bioperl-test > > This is separate from the 'bioperl-live' repo at the same github account for > the time being. The conversion was performed using svn2git (the gitorious > C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), > using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and > rerun can be performed very quickly. The actual conversion of the entire > bioperl repo took very little time, actually (less than 3 minutes). I think, > with some additional small work using the svn2git rules pretty much everything > is ready for migration. > > In this run, all subversion tags are converted to git tags (branches remain > git branches as expected). Just in case I'm missing something, I would like > everyone to take a look at this, though. In particular, I would like to make > sure tags and branches are as they are expected. So far I haven't seen > anything that stands out as odd. > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 3 09:07:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 08:07:46 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> Message-ID: <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): cjfields$ git clone git://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ remote: Counting objects: 86737, done. remote: Compressing objects: 100% (22309/22309), done. remote: Total 86737 (delta 64759), reused 85957 (delta 63979) Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. Resolving deltas: 100% (64759/64759), done. For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? chris On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > Hi Chris, > I attempted a clone and got the following. Is this my problem? > thanks MAJ > > $ git clone http://github.com/bioperl/bioperl-test.git > > Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ > Getting alternates list for http://github.com/bioperl/bioperl-test.git > Getting pack list for http://github.com/bioperl/bioperl-test.git > Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c > Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 > Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c > which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f > error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile > fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed > > > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, May 03, 2010 12:22 AM > Subject: [Bioperl-l] Full bioperl-live github demo > > >> All, >> >> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >> >> http://github.com/bioperl/bioperl-test >> >> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >> >> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 3 09:19:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 08:19:17 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <8796492301724F2CA132F97AE57C2700@NewLife> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> Message-ID: Added you in. SSH access should work with any ssh keys you have set in github. We can play around with this for the time being (try post commit hooks, etc), but obviously can't make any serious commits to it until we are ready for complete migration; everything will still need to go to dev svn until then. Also noticed that we are topping the account out at the moment, but removing the old read-only repos should help. May need to think about that in the long-term. chris On May 3, 2010, at 8:13 AM, Mark A. Jensen wrote: > That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with > majensen > cheers Chris- MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Monday, May 03, 2010 9:07 AM > Subject: Re: [Bioperl-l] Full bioperl-live github demo > > > This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): > > cjfields$ git clone git://github.com/bioperl/bioperl-test.git > Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ > remote: Counting objects: 86737, done. > remote: Compressing objects: 100% (22309/22309), done. > remote: Total 86737 (delta 64759), reused 85957 (delta 63979) > Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. > Resolving deltas: 100% (64759/64759), done. > > For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? > > chris > > On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > >> Hi Chris, >> I attempted a clone and got the following. Is this my problem? >> thanks MAJ >> >> $ git clone http://github.com/bioperl/bioperl-test.git >> >> Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ >> Getting alternates list for http://github.com/bioperl/bioperl-test.git >> Getting pack list for http://github.com/bioperl/bioperl-test.git >> Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 >> Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f >> error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile >> fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed >> >> >> ----- Original Message ----- From: "Chris Fields" >> To: "BioPerl List" >> Sent: Monday, May 03, 2010 12:22 AM >> Subject: [Bioperl-l] Full bioperl-live github demo >> >> >>> All, >>> >>> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >>> >>> http://github.com/bioperl/bioperl-test >>> >>> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >>> >>> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From maj at fortinbras.us Mon May 3 09:13:27 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 3 May 2010 09:13:27 -0400 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> Message-ID: <8796492301724F2CA132F97AE57C2700@NewLife> That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with majensen cheers Chris- MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Monday, May 03, 2010 9:07 AM Subject: Re: [Bioperl-l] Full bioperl-live github demo This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): cjfields$ git clone git://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ remote: Counting objects: 86737, done. remote: Compressing objects: 100% (22309/22309), done. remote: Total 86737 (delta 64759), reused 85957 (delta 63979) Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. Resolving deltas: 100% (64759/64759), done. For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? chris On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > Hi Chris, > I attempted a clone and got the following. Is this my problem? > thanks MAJ > > $ git clone http://github.com/bioperl/bioperl-test.git > > Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ > Getting alternates list for http://github.com/bioperl/bioperl-test.git > Getting pack list for http://github.com/bioperl/bioperl-test.git > Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c > Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 > Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c > which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f > error: file > /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack > is not a GIT packfile > fatal: packfile > /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack > cannot be accessed > > > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, May 03, 2010 12:22 AM > Subject: [Bioperl-l] Full bioperl-live github demo > > >> All, >> >> I have pushed a demo of the bioperl-live (all branches and tags) to github >> here: >> >> http://github.com/bioperl/bioperl-test >> >> This is separate from the 'bioperl-live' repo at the same github account for >> the time being. The conversion was performed using svn2git (the gitorious >> C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), >> using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and >> rerun can be performed very quickly. The actual conversion of the entire >> bioperl repo took very little time, actually (less than 3 minutes). I think, >> with some additional small work using the svn2git rules pretty much >> everything is ready for migration. >> >> In this run, all subversion tags are converted to git tags (branches remain >> git branches as expected). Just in case I'm missing something, I would like >> everyone to take a look at this, though. In particular, I would like to make >> sure tags and branches are as they are expected. So far I haven't seen >> anything that stands out as odd. >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 3 10:04:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 09:04:16 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <8796492301724F2CA132F97AE57C2700@NewLife> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> Message-ID: <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> I like this: http://github.com/bioperl/bioperl-test/graphs/impact Kinda cool yet scary. chris On May 3, 2010, at 8:13 AM, Mark A. Jensen wrote: > That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with > majensen > cheers Chris- MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Monday, May 03, 2010 9:07 AM > Subject: Re: [Bioperl-l] Full bioperl-live github demo > > > This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): > > cjfields$ git clone git://github.com/bioperl/bioperl-test.git > Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ > remote: Counting objects: 86737, done. > remote: Compressing objects: 100% (22309/22309), done. > remote: Total 86737 (delta 64759), reused 85957 (delta 63979) > Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. > Resolving deltas: 100% (64759/64759), done. > > For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? > > chris > > On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > >> Hi Chris, >> I attempted a clone and got the following. Is this my problem? >> thanks MAJ >> >> $ git clone http://github.com/bioperl/bioperl-test.git >> >> Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ >> Getting alternates list for http://github.com/bioperl/bioperl-test.git >> Getting pack list for http://github.com/bioperl/bioperl-test.git >> Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 >> Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f >> error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile >> fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed >> >> >> ----- Original Message ----- From: "Chris Fields" >> To: "BioPerl List" >> Sent: Monday, May 03, 2010 12:22 AM >> Subject: [Bioperl-l] Full bioperl-live github demo >> >> >>> All, >>> >>> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >>> >>> http://github.com/bioperl/bioperl-test >>> >>> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >>> >>> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mnrusimh at gmail.com Mon May 3 18:42:41 2010 From: mnrusimh at gmail.com (Ram Podicheti) Date: Mon, 03 May 2010 18:42:41 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID Message-ID: <4BDF5161.4030209@gmail.com> Is there a way to obtain the Ensembl Gene ID from an Entrez Gene ID? In other words, I am hoping to get 'ENSMUSG00000029372' as the output when I supply 57349. Many thanks, Ram Podicheti From sdavis2 at mail.nih.gov Mon May 3 19:14:58 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 3 May 2010 19:14:58 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID In-Reply-To: <4BDF5161.4030209@gmail.com> References: <4BDF5161.4030209@gmail.com> Message-ID: On Mon, May 3, 2010 at 6:42 PM, Ram Podicheti wrote: > Is there a way to obtain the Ensembl Gene ID from an Entrez Gene ID? In > other words, I am hoping to get 'ENSMUSG00000029372' as the output when > I supply 57349. > Check out the Biomart interface to Ensembl. You can supply any type of ID as a filter and get back gene information, including the ID, that map to that ID. I believe there is a perl interface to biomart, but I haven't used it to comment directly. There is also an R/Bioconductor interface. Sean From mnrusimh at gmail.com Mon May 3 20:42:49 2010 From: mnrusimh at gmail.com (Ram Podicheti) Date: Mon, 03 May 2010 20:42:49 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID In-Reply-To: References: <4BDF5161.4030209@gmail.com> Message-ID: <4BDF6D89.2000408@gmail.com> Thanks Sean, that definitely helped. Ram Sean Davis wrote: > > > On Mon, May 3, 2010 at 6:42 PM, Ram Podicheti > wrote: > > Is there a way to obtain the Ensembl Gene ID from an Entrez Gene > ID? In > other words, I am hoping to get 'ENSMUSG00000029372' as the output > when > I supply 57349. > > > Check out the Biomart interface to Ensembl. You can supply any type > of ID as a filter and get back gene information, including the ID, > that map to that ID. I believe there is a perl interface to biomart, > but I haven't used it to comment directly. There is also an > R/Bioconductor interface. > > Sean > From razi.khaja at gmail.com Tue May 4 13:55:00 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Tue, 4 May 2010 13:55:00 -0400 Subject: [Bioperl-l] BLAST parsing broken In-Reply-To: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: That is odd. Heikki, do you have a blast output file that produces this error? Could you attach the file and either send to the list or myself (if the list does not accept attachments). Thanks, Razi On Mon, May 3, 2010 at 8:08 AM, Chris Fields wrote: > Odd, I ran tests on that prior to commit. I'll work on fixing that (in svn > of course, until the migration is complete). > > chris > > On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > > > Chris, > > > > latest additions to Bio::SearchIO::blast.pm broke the parsing of normal > > blast output. $result->query_name returns now undef. > > > > (Using the anonymous git now). This change still works: > > > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > Author: cjfields > > Date: Sun Dec 20 04:39:58 2009 +0000 > > > > Robson's patch for buggy blastpgp output > > > > But this does not: > > > > commit 9a89c3434597104dd50553e3562983d78d14a544 > > Author: cjfields > > Date: Thu Apr 15 04:21:17 2010 +0000 > > > > [bug 3031] > > > > patches for catching algorithm ref, courtesy Razi Khaja. > > > > That makes it easy to find the diffs: > > > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > > index 378023a..6f7eeeb 100644 > > --- a/Bio/SearchIO/blast.pm > > +++ b/Bio/SearchIO/blast.pm > > @@ -209,6 +209,7 @@ BEGIN { > > > > 'BlastOutput_program' => 'RESULT-algorithm_name', > > 'BlastOutput_version' => 'RESULT-algorithm_version', > > + 'BlastOutput_algorithm-reference' => > 'RESULT-algorithm_reference', > > 'BlastOutput_query-def' => 'RESULT-query_name', > > 'BlastOutput_query-len' => 'RESULT-query_length', > > 'BlastOutput_query-acc' => 'RESULT-query_accession', > > @@ -504,6 +505,26 @@ sub next_result { > > } > > ); > > } > > + # parse the BLAST algorithm reference > > + elsif(/^Reference:\s+(.*)$/) { > > + # want to preserve newlines for the BLAST algorithm > reference > > + my $algorithm_reference = "$1\n"; > > + $_ = $self->_readline; > > + # while the current line, does not match an empty line, a > RID:, > > or a Database:, we are still looking at the > > + # algorithm_reference, append it to what we parsed so far > > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > > + $algorithm_reference .= "$_"; > > + $_ = $self->_readline; > > + } > > + # if we exited the while loop, we saw an empty line, a RID:, > or > > a Database:, so push it back > > + $self->_pushback($_); > > + $self->element( > > + { > > + 'Name' => 'BlastOutput_algorithm-reference', > > + 'Data' => $algorithm_reference > > + } > > + ); > > + } > > # added Windows workaround for bug 1985 > > elsif (/^(Searching|Results from round)/) { > > next unless $1 =~ /Results from round/; > > > > > > I am not sure why reference parsing messes things up. Maybe it eats too > many > > lines from the result file. > > > > Yours, > > > > -Heikki > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > #4216 > > 4700 King Abdullah University of Science and Technology (KAUST) > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue May 4 14:18:02 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 4 May 2010 14:18:02 -0400 Subject: [Bioperl-l] parsing GenBank file Message-ID: Hi All, i have a huge GenBank file ( downloaded from RDP containing all bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). I wrote a simple script for this: #!/usr/bin/perl -w use Bio::SeqIO; my $seqio_object = Bio::SeqIO->new(-file => "$ARGV[0]"); while(my $seq_object = $seqio_object->next_seq){ my $id = $seq_object->id; print "$id\t"; my $species_object = $seq_object->species; my @classification = $seq_object->species->classification; foreach my $val (@classification){print "$val\t";} print "\n"; } I am getting the output like: S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root S000148973 uncultured Geothrix sp. Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root S000431649 uncultured Acidobacteria bacterium Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root .. .. This is the exact output i want, but i am missing lot of records (they are there in the genbank file but not in my output). I also got a warning during parsing: --------------------- WARNING --------------------- MSG: Unbalanced quote in: /db_xref="taxon:35783" /germline" /mol_type="genomic DNA" /organism="Enterococcus sp." /strain="LMG12316"No further qualifiers will be added for this feature --------------------------------------------------- So i was just wondering that is this warning message causing that problem or i am doing something wrong? Thanks Shalabh From jay at jays.net Tue May 4 23:30:25 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 4 May 2010 22:30:25 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? Message-ID: $work[0] wants me to fire up Buildbot + Smolder to know when and who broke our tests, and how quickly (or not) our test count is growing over time. Then #moose asked me if I could also host the same for Moose and Class::MOP. And $work[1] uses the heck out of BioPerl. So I'm wondering if I can leverage all my synergies somehow and also host for BioPerl. http://buildbot.net/trac http://sourceforge.net/projects/smolder/ Has anything happened since this 2008 thread?: Subject: Test coverage for BioPerl now available http://article.gmane.org/gmane.comp.lang.perl.bio.general/17731/match=smolder If this would be a Good Thing for BioPerl I could try to try... :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Wed May 5 00:24:51 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 4 May 2010 23:24:51 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: Message-ID: On May 4, 2010, at 10:30 PM, Jay Hannah wrote: > http://sourceforge.net/projects/smolder/ Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) http://search.cpan.org/perldoc?Smolder http://github.com/mpeters/smolder Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From dimitark at bii.a-star.edu.sg Wed May 5 02:58:21 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Wed, 05 May 2010 14:58:21 +0800 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> References: <4BDA986D.3020302@bii.a-star.edu.sg> <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> Message-ID: <4BE1170D.8040108@bii.a-star.edu.sg> Hi Dave, thank you for the tip. Now it works like a charm :) Greetings Dimitar On 05/02/2010 05:59 PM, Dave Messina wrote: > Hi Dimitar, > > The syntax you want is: > > # Build a Genewise alignment factory > my $factory = Bio::Tools::Run::Genewise->new(); > > # turn on the quiet switch > $factory->QUIET(1); > > # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects > my @genes = $factory->run($protein_seq, $genomic_seq); > > > This turns out be incorrectly documented on the man page, at least in part: > >> Available Params: >> >> NB: These should be passed without the '-' or they will be ignored, >> except switches such as 'hmmer' (which have no corresponding value) >> which should be set on the factory object using the AUTOLOADed methods >> of the same name. >> >> Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] >> Alg [-kbyte,-alg] >> HMM [-hmmer] >> Output [-gff,-gener,-alb,-pal,-block,-divide] >> Standard [-help,-version,-silent,-quiet,-errorlog] >> > > That is, these don't work as expected: > > $factory->quiet; > $factory->quiet(1); > > due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. > > And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. > > > So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. > > > Dave > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore email: dimitark at bii.a-star.edu.sg tel: +65 6478 8514 From dimitark at bii.a-star.edu.sg Wed May 5 03:06:04 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Wed, 05 May 2010 15:06:04 +0800 Subject: [Bioperl-l] about gene "boundaries" In-Reply-To: References: <4BD8357B.5030804@bii.a-star.edu.sg> <24714E9B-B3E5-4703-92F8-64483FA59AFC@illinois.edu> <4BD90F94.4040608@bii.a-star.edu.sg> Message-ID: <4BE118DC.7000806@bii.a-star.edu.sg> Hi Malcolm, thank you very much for that information. Didnt even know such program existed :) I now use 'blastdbcmd' for extraction of DNA sequence from my DB. I only had to reformat my DB with 'parse seqids' parameter in order to be able to give the 'entry' parameter to 'blastdbcmd'. Now my script is working. Thanx again. Cheers Dimitar On 04/30/2010 10:16 PM, Cook, Malcolm wrote: > Dimitar, > > Since you have indexed your database with makeblastdb, you might simply use `blastdbcmd` to extract, in fasta format, sub-sequences from the indexed database using identifiers and integer ranges > > blastdbcmd is included in the blast+ suite of programs, which also included makeblastdb which you report you have running. > > see: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/user_maual.pdf > > I've not (yet) used the blast+ suite (still using the old blast) so I've not tested this myself yet, but I think something like the following will work for you: > > blastdbcmd -db yourBlastDatabase -entry chr2 -range 100-300 -outformat fasta > > will extract chr2:100-300 from yourBlastDatabase > > Good Luck > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dimitar Kenanov > Sent: Wednesday, April 28, 2010 11:48 PM > To: Chris Fields; bioperl-l at bioperl.org; scott at scottcain.net; hrh at fmi.ch > Subject: Re: [Bioperl-l] about gene "boundaries" > > Hi guys, > today with rested head and after some reading i found the solution to my problem in BioPerl. Its Bio::DB::Fasta. It does what i want sufficiently well. > Thank you again for the help and im sorry for the trouble caused. > > Cheers > Dimitar > > On 04/28/2010 11:10 PM, Chris Fields wrote: > >> By local DB, do you mean a BioPerl-based local DB? Or is it something else? This is a bit vague. >> >> On the BioPerl side I suggest looking into Bio::DB::SeqFeature::Store for storing and querying genome information (it does exactly what you want if the proper information is loaded), or maybe the Ensembl Perl API, which can be used with a local or remote Ensembl setup. Beyond that you'll need to be more specific. >> >> chris >> >> On Apr 28, 2010, at 8:17 AM, Dimitar Kenanov wrote: >> >> >> >>> Hello guys, >>> i have a question about gene "boundaries". Is there some module in BioPerl which can help me extract the DNA sequence from a genomic DB (from specific chromosome). I have my human genome in a local DB and some "from-to" data sets corresponding to different chromosomes. So i want to get the DNA seqs for these from-to's. I know i can do that the normal way but if there is a way to do it with BioPerl it will be more consistent with the rest of the code. >>> >>> Thanks for any tips :) >>> >>> Cheers >>> Dimitar >>> >>> -- >>> Dimitar Kenanov >>> Postdoctoral research fellow >>> Protein Sequence Analysis Group >>> Bioinformatics Institute >>> A*STAR, Singapore >>> email: dimitark at bii.a-star.edu.sg >>> tel: +65 6478 8514 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > email: dimitark at bii.a-star.edu.sg > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore email: dimitark at bii.a-star.edu.sg tel: +65 6478 8514 From David.Messina at sbc.su.se Wed May 5 03:46:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 09:46:17 +0200 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <4BE1170D.8040108@bii.a-star.edu.sg> References: <4BDA986D.3020302@bii.a-star.edu.sg> <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> <4BE1170D.8040108@bii.a-star.edu.sg> Message-ID: <9F2DC6C9-7707-4C4A-8DE1-0B37387F7F8A@sbc.su.se> Great, glad to hear that. Thanks for letting us know about the problem! Dave On May 5, 2010, at 8:58, Dimitar Kenanov wrote: > Hi Dave, > thank you for the tip. Now it works like a charm :) > > Greetings > Dimitar > > > On 05/02/2010 05:59 PM, Dave Messina wrote: >> Hi Dimitar, >> >> The syntax you want is: >> >> # Build a Genewise alignment factory >> my $factory = Bio::Tools::Run::Genewise->new(); >> >> # turn on the quiet switch >> $factory->QUIET(1); >> >> # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects >> my @genes = $factory->run($protein_seq, $genomic_seq); >> >> >> This turns out be incorrectly documented on the man page, at least in part: >> >>> Available Params: >>> >>> NB: These should be passed without the '-' or they will be ignored, >>> except switches such as 'hmmer' (which have no corresponding value) >>> which should be set on the factory object using the AUTOLOADed methods >>> of the same name. >>> >>> Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] >>> Alg [-kbyte,-alg] >>> HMM [-hmmer] >>> Output [-gff,-gener,-alb,-pal,-block,-divide] >>> Standard [-help,-version,-silent,-quiet,-errorlog] >>> >> >> That is, these don't work as expected: >> >> $factory->quiet; >> $factory->quiet(1); >> >> due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. >> >> And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. >> >> >> So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. >> >> >> Dave >> >> > > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > email: dimitark at bii.a-star.edu.sg > tel: +65 6478 8514 > From torsten.seemann at infotech.monash.edu.au Wed May 5 03:48:55 2010 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 5 May 2010 17:48:55 +1000 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: > ? ? ?i have a huge GenBank file ( downloaded from RDP containing all > bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). > I am getting the output like: > S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > Holophagales Holophagae "Acidobacteria" Bacteria Root > This is the exact output i want, but i am missing lot of records (they are > there in the genbank file but not in my output). > I also got a warning during parsing: > --------------------- WARNING --------------------- > MSG: Unbalanced quote in: > /db_xref="taxon:35783" /germline" > /mol_type="genomic DNA" > /organism="Enterococcus sp." > /strain="LMG12316"No further qualifiers will be added for this feature > --------------------------------------------------- > So i was just wondering that is this warning message causing that problem or > i am doing something wrong? "Unbalanced quote" means there is not an even number (multiple of 2) double-quote (") symbols around the tag's value. I can see that the first line (below) looks problematic: YOU HAVE: /db_xref="taxon:35783" /germline" SHOULD BE: /db_xref="taxon:35783" /germline I suspect there is a problem either with RDP's genbank producer, or Bioperl is having problem with the "germline" qualifier which is a 'null valued' qualifier like /pseudo - it takes no ="value" string. (I think in Bioperl this is handled by setting the value to "_no_value" ?) http://www.ncbi.nlm.nih.gov/collab/FT/ Qualifier /germline Definition the sequence presented in the entry has not undergone somatic rearrangement as part of an adaptive immune response; it is the unrearranged sequence that was inherited from the parental germline Value format none Example /germline Comment /germline should not be used to indicate that the source of the sequence is a gamete or germ cell; /germline and /rearranged cannot be used in the same source feature; /germline and /rearranged should only be used for molecules that can undergo somatic rearrangements as part of an adaptive immune response; these are the T-cell receptor (TCR) and immunoglobulin loci in the jawed vertebrates, and the unrelated variable lymphocyte receptor (VLR) locus in the jawless fish (lampreys and hagfish); /germline and /rearranged should not be used outside of the Craniata (taxid=89593) --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From cjfields at illinois.edu Wed May 5 08:12:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 07:12:30 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: Message-ID: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> On May 4, 2010, at 11:24 PM, Jay Hannah wrote: > On May 4, 2010, at 10:30 PM, Jay Hannah wrote: >> http://sourceforge.net/projects/smolder/ > > Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) > > http://search.cpan.org/perldoc?Smolder > http://github.com/mpeters/smolder > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? chris From cjfields at illinois.edu Wed May 5 08:30:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 07:30:30 -0500 Subject: [Bioperl-l] using default string values for undef/empty, was Re: parsing GenBank file In-Reply-To: References: Message-ID: On May 5, 2010, at 2:48 AM, Torsten Seemann wrote: >> i have a huge GenBank file ( downloaded from RDP containing all >> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). >> I am getting the output like: >> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >> Holophagales Holophagae "Acidobacteria" Bacteria Root >> This is the exact output i want, but i am missing lot of records (they are >> there in the genbank file but not in my output). >> I also got a warning during parsing: >> --------------------- WARNING --------------------- >> MSG: Unbalanced quote in: >> /db_xref="taxon:35783" /germline" >> /mol_type="genomic DNA" >> /organism="Enterococcus sp." >> /strain="LMG12316"No further qualifiers will be added for this feature >> --------------------------------------------------- >> So i was just wondering that is this warning message causing that problem or >> i am doing something wrong? > > "Unbalanced quote" means there is not an even number (multiple of 2) > double-quote (") symbols around the tag's value. I can see that the > first line (below) looks problematic: > > YOU HAVE: > > /db_xref="taxon:35783" /germline" > > SHOULD BE: > > /db_xref="taxon:35783" > /germline > > I suspect there is a problem either with RDP's genbank producer, or > Bioperl is having problem with the "germline" qualifier which is a > 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > think in Bioperl this is handled by setting the value to "_no_value" > ?) > ... > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA Ugh, didn't notice the '_no_value' bit. Probably my opinion, but I don't like stubs like that as they tend to be brittle and run into issues (like this one, for instance). I would prefer we just leave that as undef and only quote defined values (with the exceptions in %FTQUAL_NO_QUOTE). Any reason for this behavior (is it related to ORM-related stuff like bioperl-db)? Can we change that to something a bit more realistic? chris From David.Messina at sbc.su.se Wed May 5 09:00:39 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 15:00:39 +0200 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> Message-ID: <252790EC-6A2D-4DFA-B2A0-8D0F8E169E30@sbc.su.se> Yeah, absolutely, Jay! it would be wonderful to have this for BioPerl. Dave On May 5, 2010, at 14:12, Chris Fields wrote: > On May 4, 2010, at 11:24 PM, Jay Hannah wrote: > >> On May 4, 2010, at 10:30 PM, Jay Hannah wrote: >>> http://sourceforge.net/projects/smolder/ >> >> Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) >> >> http://search.cpan.org/perldoc?Smolder >> http://github.com/mpeters/smolder >> >> Jay Hannah >> http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? > > chris From cjfields at illinois.edu Wed May 5 10:46:23 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 09:46:23 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub Message-ID: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> All, I would like to finalize moving over to git/github very soon. We're sort of in limbo on this, so it needs to progress forward. We'll need to do some initial cleanup after the move (Heikki is already doing a few things on the test repo, which we'll need to diff over to the new one). So with that in mind, here are my thoughts. This is copied over to this wiki page, in case you don't want to reply here: http://www.bioperl.org/wiki/From_SVN_to_Git (thanks Mark!) 1) Timeline When? Sooner the better (weeks as opposed to months). Our anon. svn is down, likely permanently (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). 2) Migration strategy Now mainly worked out using svn2git, which is very fast. We would need to make the svn repo on dev read-only during this transition. My guess is it would take very little time. Do we want to retain the git-SVN metadata on commits? This is viewable with our current read-only mirror on github: http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca 3) Developers Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly The current authors file used for mapping commit authors to emails used their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I think, once one has signed up with github, you can add that same address to your current ones, and it should map to your github account. If we use dev.open-bio.org as our central git repo, we won't need to go through with that, but we will need a viewable version of dev available somehow (mirrored on github or otherwise). Speaking of... 4) Development strategy Are we sticking with a single centralized repo (SVN-like)? Will that be github, or will github be a downstream repo to our work on dev? We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). Git makes it very easy to make branches and merge in code to trunk. With that in mind, I would highly suggest we start working on branches for almost everything and merge over to trunk. There is very little to no overhead in doing so with git. I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model Also, several points were raised in a related project (Parrot) considering a move to git/github from svn. One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. 5) Encouraging outside contributors Do we want to adopt a policy similar to Moose? http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod This is easy with github and forks. 6) SVN Read/Write to GitHub It was recently announced that one can access a github repo using subversion as read-only, and just yesterday experimental write to github is allowed: http://github.com/blog/644-subversion-write-support I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? 7) Others? chris From shalabh.sharma7 at gmail.com Wed May 5 10:46:19 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 10:46:19 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: Hi Torsten, Thanks for pointing that out. But this is just a warning, it will not break the script. i found the the point where script is breaking. Its breaking and giving this message: Can't call method "classification" on an undefined value at parseGB.pl line 9, line 10067733. So the script is breaking when its coming to this record: LOCUS S001198291 1521 bp rRNA linear BCT 02-Feb-2009 DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2. ACCESSION AP010656 REGION: 61786..63306 PROJECT GenomeProject:29025 SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". REFERENCE 1 (bases 1 to 1521) AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; TITLE ; JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. Contact:Atsushi Toyoda National Institute of Genetics, Comparative Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, Japan REFERENCE 2 AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor T.D., Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; It is unable to parse this record, but i don't understand why it is doing so? The only reason i can think of is the organism's name which is very long as compared to others. Thanks Shalabh On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < torsten.seemann at infotech.monash.edu.au> wrote: > > i have a huge GenBank file ( downloaded from RDP containing all > > bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > linage (in ORGANISM). > > I am getting the output like: > > S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > > Holophagales Holophagae "Acidobacteria" Bacteria Root > > This is the exact output i want, but i am missing lot of records (they > are > > there in the genbank file but not in my output). > > I also got a warning during parsing: > > --------------------- WARNING --------------------- > > MSG: Unbalanced quote in: > > /db_xref="taxon:35783" /germline" > > /mol_type="genomic DNA" > > /organism="Enterococcus sp." > > /strain="LMG12316"No further qualifiers will be added for this feature > > --------------------------------------------------- > > So i was just wondering that is this warning message causing that problem > or > > i am doing something wrong? > > "Unbalanced quote" means there is not an even number (multiple of 2) > double-quote (") symbols around the tag's value. I can see that the > first line (below) looks problematic: > > YOU HAVE: > > /db_xref="taxon:35783" /germline" > > SHOULD BE: > > /db_xref="taxon:35783" > /germline > > I suspect there is a problem either with RDP's genbank producer, or > Bioperl is having problem with the "germline" qualifier which is a > 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > think in Bioperl this is handled by setting the value to "_no_value" > ?) > > http://www.ncbi.nlm.nih.gov/collab/FT/ > > Qualifier /germline > Definition the sequence presented in the entry has not undergone > somatic > rearrangement as part of an adaptive immune response; it is > the > unrearranged sequence that was inherited from the parental > germline > Value format none > Example /germline > Comment /germline should not be used to indicate that the source of > the sequence is a gamete or germ cell; > /germline and /rearranged cannot be used in the same source > feature; > /germline and /rearranged should only be used for molecules > that > can undergo somatic rearrangements as part of an > adaptive immune > response; these are the T-cell receptor (TCR) and > immunoglobulin > loci in the jawed vertebrates, and the unrelated variable > lymphocyte receptor (VLR) locus in the jawless fish > (lampreys > and hagfish); > /germline and /rearranged should not be used outside of the > Craniata (taxid=89593) > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > From cjfields at illinois.edu Wed May 5 11:32:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 10:32:41 -0500 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Shalabh, What is the source of this file? It's not from GenBank; if I look up the parent sequence using Bio::DB::GenBank it works fine: use Modern::Perl; use Bio::DB::GenBank; my $id = 'AP010656'; my $gb = Bio::DB::GenBank->new(); my $seq = $gb->get_Seq_by_acc($id); say join(',',$seq->species->classification); chris On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > Hi Torsten, > Thanks for pointing that out. But this is just a warning, > it will not break the script. i found the the point where script is > breaking. > Its breaking and giving this message: > Can't call method "classification" on an undefined value at parseGB.pl line > 9, line 10067733. > > So the script is breaking when its coming to this record: > > LOCUS S001198291 1521 bp rRNA linear BCT > 02-Feb-2009 > DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2. > ACCESSION AP010656 REGION: 61786..63306 > PROJECT GenomeProject:29025 > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > "Bacteroidales"; > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > REFERENCE 1 (bases 1 to 1521) > AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; > TITLE ; > JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > Contact:Atsushi Toyoda National Institute of Genetics, > Comparative > Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > Japan > REFERENCE 2 > AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > T.D., > Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > > It is unable to parse this record, but i don't understand why it is doing > so? The only reason i can think of is the organism's name which is very long > as compared to others. > > Thanks > Shalabh > > > > On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > torsten.seemann at infotech.monash.edu.au> wrote: > >>> i have a huge GenBank file ( downloaded from RDP containing all >>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's >> linage (in ORGANISM). >>> I am getting the output like: >>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >>> Holophagales Holophagae "Acidobacteria" Bacteria Root >>> This is the exact output i want, but i am missing lot of records (they >> are >>> there in the genbank file but not in my output). >>> I also got a warning during parsing: >>> --------------------- WARNING --------------------- >>> MSG: Unbalanced quote in: >>> /db_xref="taxon:35783" /germline" >>> /mol_type="genomic DNA" >>> /organism="Enterococcus sp." >>> /strain="LMG12316"No further qualifiers will be added for this feature >>> --------------------------------------------------- >>> So i was just wondering that is this warning message causing that problem >> or >>> i am doing something wrong? >> >> "Unbalanced quote" means there is not an even number (multiple of 2) >> double-quote (") symbols around the tag's value. I can see that the >> first line (below) looks problematic: >> >> YOU HAVE: >> >> /db_xref="taxon:35783" /germline" >> >> SHOULD BE: >> >> /db_xref="taxon:35783" >> /germline >> >> I suspect there is a problem either with RDP's genbank producer, or >> Bioperl is having problem with the "germline" qualifier which is a >> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I >> think in Bioperl this is handled by setting the value to "_no_value" >> ?) >> >> http://www.ncbi.nlm.nih.gov/collab/FT/ >> >> Qualifier /germline >> Definition the sequence presented in the entry has not undergone >> somatic >> rearrangement as part of an adaptive immune response; it is >> the >> unrearranged sequence that was inherited from the parental >> germline >> Value format none >> Example /germline >> Comment /germline should not be used to indicate that the source of >> the sequence is a gamete or germ cell; >> /germline and /rearranged cannot be used in the same source >> feature; >> /germline and /rearranged should only be used for molecules >> that >> can undergo somatic rearrangements as part of an >> adaptive immune >> response; these are the T-cell receptor (TCR) and >> immunoglobulin >> loci in the jawed vertebrates, and the unrelated variable >> lymphocyte receptor (VLR) locus in the jawless fish >> (lampreys >> and hagfish); >> /germline and /rearranged should not be used outside of the >> Craniata (taxid=89593) >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed May 5 11:38:11 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 11:38:11 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Message-ID: Hi Chris, I downloaded this file from RDP, it contain all bacterial 16s. Thanks Shalabh On Wed, May 5, 2010 at 11:32 AM, Chris Fields wrote: > Shalabh, > > What is the source of this file? It's not from GenBank; if I look up the > parent sequence using Bio::DB::GenBank it works fine: > > use Modern::Perl; > use Bio::DB::GenBank; > > my $id = 'AP010656'; > > my $gb = Bio::DB::GenBank->new(); > > my $seq = $gb->get_Seq_by_acc($id); > > say join(',',$seq->species->classification); > > chris > > On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > > > Hi Torsten, > > Thanks for pointing that out. But this is just a warning, > > it will not break the script. i found the the point where script is > > breaking. > > Its breaking and giving this message: > > Can't call method "classification" on an undefined value at parseGB.pl > line > > 9, line 10067733. > > > > So the script is breaking when its coming to this record: > > > > LOCUS S001198291 1521 bp rRNA linear BCT > > 02-Feb-2009 > > DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. > CFP2. > > ACCESSION AP010656 REGION: 61786..63306 > > PROJECT GenomeProject:29025 > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > > "Bacteroidales"; > > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > > REFERENCE 1 (bases 1 to 1521) > > AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; > > TITLE ; > > JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > > Contact:Atsushi Toyoda National Institute of Genetics, > > Comparative > > Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > > Japan > > REFERENCE 2 > > AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > > T.D., > > Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > > > > It is unable to parse this record, but i don't understand why it is doing > > so? The only reason i can think of is the organism's name which is very > long > > as compared to others. > > > > Thanks > > Shalabh > > > > > > > > On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > > torsten.seemann at infotech.monash.edu.au> wrote: > > > >>> i have a huge GenBank file ( downloaded from RDP containing all > >>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > >> linage (in ORGANISM). > >>> I am getting the output like: > >>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > >>> Holophagales Holophagae "Acidobacteria" Bacteria Root > >>> This is the exact output i want, but i am missing lot of records (they > >> are > >>> there in the genbank file but not in my output). > >>> I also got a warning during parsing: > >>> --------------------- WARNING --------------------- > >>> MSG: Unbalanced quote in: > >>> /db_xref="taxon:35783" /germline" > >>> /mol_type="genomic DNA" > >>> /organism="Enterococcus sp." > >>> /strain="LMG12316"No further qualifiers will be added for this feature > >>> --------------------------------------------------- > >>> So i was just wondering that is this warning message causing that > problem > >> or > >>> i am doing something wrong? > >> > >> "Unbalanced quote" means there is not an even number (multiple of 2) > >> double-quote (") symbols around the tag's value. I can see that the > >> first line (below) looks problematic: > >> > >> YOU HAVE: > >> > >> /db_xref="taxon:35783" /germline" > >> > >> SHOULD BE: > >> > >> /db_xref="taxon:35783" > >> /germline > >> > >> I suspect there is a problem either with RDP's genbank producer, or > >> Bioperl is having problem with the "germline" qualifier which is a > >> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > >> think in Bioperl this is handled by setting the value to "_no_value" > >> ?) > >> > >> http://www.ncbi.nlm.nih.gov/collab/FT/ > >> > >> Qualifier /germline > >> Definition the sequence presented in the entry has not undergone > >> somatic > >> rearrangement as part of an adaptive immune response; it is > >> the > >> unrearranged sequence that was inherited from the parental > >> germline > >> Value format none > >> Example /germline > >> Comment /germline should not be used to indicate that the source > of > >> the sequence is a gamete or germ cell; > >> /germline and /rearranged cannot be used in the same source > >> feature; > >> /germline and /rearranged should only be used for molecules > >> that > >> can undergo somatic rearrangements as part of an > >> adaptive immune > >> response; these are the T-cell receptor (TCR) and > >> immunoglobulin > >> loci in the jawed vertebrates, and the unrelated variable > >> lymphocyte receptor (VLR) locus in the jawless fish > >> (lampreys > >> and hagfish); > >> /germline and /rearranged should not be used outside of the > >> Craniata (taxid=89593) > >> > >> > >> --Torsten Seemann > >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > >> University, AUSTRALIA > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed May 5 12:01:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 11:01:55 -0500 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Message-ID: <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> Shalabh, There are several problems with this file that make it somewhat problematic and somewhat non-GenBank like. It does parse (it has seq data) but doesn't catch the SOURCE/ORGANISM b/c of the somewhat non-canonical way of displaying the classification: SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". It's different enough from the NCBI version (from here: http://www.ncbi.nlm.nih.gov/nuccore/212548595) that it's probably breaking the parser: SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Bacteria; Bacteroidetes; Bacteroidia; Bacteroidales; Candidatus Azobacteroides. Please file this as a bug, we can take a look at it. It's a bit non-standard so I can't promise it'll be fixed unless it's fairly easy to do. chris On May 5, 2010, at 10:38 AM, shalabh sharma wrote: > Hi Chris, > I downloaded this file from RDP, it contain all bacterial 16s. > > Thanks > Shalabh > > > On Wed, May 5, 2010 at 11:32 AM, Chris Fields wrote: > >> Shalabh, >> >> What is the source of this file? It's not from GenBank; if I look up the >> parent sequence using Bio::DB::GenBank it works fine: >> >> use Modern::Perl; >> use Bio::DB::GenBank; >> >> my $id = 'AP010656'; >> >> my $gb = Bio::DB::GenBank->new(); >> >> my $seq = $gb->get_Seq_by_acc($id); >> >> say join(',',$seq->species->classification); >> >> chris >> >> On May 5, 2010, at 9:46 AM, shalabh sharma wrote: >> >>> Hi Torsten, >>> Thanks for pointing that out. But this is just a warning, >>> it will not break the script. i found the the point where script is >>> breaking. >>> Its breaking and giving this message: >>> Can't call method "classification" on an undefined value at parseGB.pl >> line >>> 9, line 10067733. >>> >>> So the script is breaking when its coming to this record: >>> >>> LOCUS S001198291 1521 bp rRNA linear BCT >>> 02-Feb-2009 >>> DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. >> CFP2. >>> ACCESSION AP010656 REGION: 61786..63306 >>> PROJECT GenomeProject:29025 >>> SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 >>> ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 >>> Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; >>> "Bacteroidales"; >>> "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". >>> REFERENCE 1 (bases 1 to 1521) >>> AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; >>> TITLE ; >>> JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. >>> Contact:Atsushi Toyoda National Institute of Genetics, >>> Comparative >>> Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, >>> Japan >>> REFERENCE 2 >>> AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor >>> T.D., >>> Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; >>> >>> It is unable to parse this record, but i don't understand why it is doing >>> so? The only reason i can think of is the organism's name which is very >> long >>> as compared to others. >>> >>> Thanks >>> Shalabh >>> >>> >>> >>> On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < >>> torsten.seemann at infotech.monash.edu.au> wrote: >>> >>>>> i have a huge GenBank file ( downloaded from RDP containing all >>>>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's >>>> linage (in ORGANISM). >>>>> I am getting the output like: >>>>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >>>>> Holophagales Holophagae "Acidobacteria" Bacteria Root >>>>> This is the exact output i want, but i am missing lot of records (they >>>> are >>>>> there in the genbank file but not in my output). >>>>> I also got a warning during parsing: >>>>> --------------------- WARNING --------------------- >>>>> MSG: Unbalanced quote in: >>>>> /db_xref="taxon:35783" /germline" >>>>> /mol_type="genomic DNA" >>>>> /organism="Enterococcus sp." >>>>> /strain="LMG12316"No further qualifiers will be added for this feature >>>>> --------------------------------------------------- >>>>> So i was just wondering that is this warning message causing that >> problem >>>> or >>>>> i am doing something wrong? >>>> >>>> "Unbalanced quote" means there is not an even number (multiple of 2) >>>> double-quote (") symbols around the tag's value. I can see that the >>>> first line (below) looks problematic: >>>> >>>> YOU HAVE: >>>> >>>> /db_xref="taxon:35783" /germline" >>>> >>>> SHOULD BE: >>>> >>>> /db_xref="taxon:35783" >>>> /germline >>>> >>>> I suspect there is a problem either with RDP's genbank producer, or >>>> Bioperl is having problem with the "germline" qualifier which is a >>>> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I >>>> think in Bioperl this is handled by setting the value to "_no_value" >>>> ?) >>>> >>>> http://www.ncbi.nlm.nih.gov/collab/FT/ >>>> >>>> Qualifier /germline >>>> Definition the sequence presented in the entry has not undergone >>>> somatic >>>> rearrangement as part of an adaptive immune response; it is >>>> the >>>> unrearranged sequence that was inherited from the parental >>>> germline >>>> Value format none >>>> Example /germline >>>> Comment /germline should not be used to indicate that the source >> of >>>> the sequence is a gamete or germ cell; >>>> /germline and /rearranged cannot be used in the same source >>>> feature; >>>> /germline and /rearranged should only be used for molecules >>>> that >>>> can undergo somatic rearrangements as part of an >>>> adaptive immune >>>> response; these are the T-cell receptor (TCR) and >>>> immunoglobulin >>>> loci in the jawed vertebrates, and the unrelated variable >>>> lymphocyte receptor (VLR) locus in the jawless fish >>>> (lampreys >>>> and hagfish); >>>> /germline and /rearranged should not be used outside of the >>>> Craniata (taxid=89593) >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed May 5 12:10:33 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 12:10:33 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> Message-ID: Hi Chris, I will do that, so how i can solve my problem, do you have any suggestion? I am thinking of taking all the accessions from the file i have and use Bio::DB::Genbank to get classification. Thanks shalabh On Wed, May 5, 2010 at 12:01 PM, Chris Fields wrote: > Shalabh, > > There are several problems with this file that make it somewhat problematic > and somewhat non-GenBank like. It does parse (it has seq data) but doesn't > catch the SOURCE/ORGANISM b/c of the somewhat non-canonical way of > displaying the classification: > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > > It's different enough from the NCBI version (from here: > http://www.ncbi.nlm.nih.gov/nuccore/212548595) that it's probably breaking > the parser: > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Bacteria; Bacteroidetes; Bacteroidia; Bacteroidales; Candidatus > Azobacteroides. > > Please file this as a bug, we can take a look at it. It's a bit > non-standard so I can't promise it'll be fixed unless it's fairly easy to > do. > > chris > > On May 5, 2010, at 10:38 AM, shalabh sharma wrote: > > > Hi Chris, > > I downloaded this file from RDP, it contain all bacterial 16s. > > > > Thanks > > Shalabh > > > > > > On Wed, May 5, 2010 at 11:32 AM, Chris Fields > wrote: > > > >> Shalabh, > >> > >> What is the source of this file? It's not from GenBank; if I look up > the > >> parent sequence using Bio::DB::GenBank it works fine: > >> > >> use Modern::Perl; > >> use Bio::DB::GenBank; > >> > >> my $id = 'AP010656'; > >> > >> my $gb = Bio::DB::GenBank->new(); > >> > >> my $seq = $gb->get_Seq_by_acc($id); > >> > >> say join(',',$seq->species->classification); > >> > >> chris > >> > >> On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > >> > >>> Hi Torsten, > >>> Thanks for pointing that out. But this is just a warning, > >>> it will not break the script. i found the the point where script is > >>> breaking. > >>> Its breaking and giving this message: > >>> Can't call method "classification" on an undefined value at parseGB.pl > >> line > >>> 9, line 10067733. > >>> > >>> So the script is breaking when its coming to this record: > >>> > >>> LOCUS S001198291 1521 bp rRNA linear BCT > >>> 02-Feb-2009 > >>> DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. > >> CFP2. > >>> ACCESSION AP010656 REGION: 61786..63306 > >>> PROJECT GenomeProject:29025 > >>> SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. > CFP2 > >>> ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > >>> Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > >>> "Bacteroidales"; > >>> "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > >>> REFERENCE 1 (bases 1 to 1521) > >>> AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki > Y.; > >>> TITLE ; > >>> JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > >>> Contact:Atsushi Toyoda National Institute of Genetics, > >>> Comparative > >>> Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > >>> Japan > >>> REFERENCE 2 > >>> AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > >>> T.D., > >>> Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > >>> > >>> It is unable to parse this record, but i don't understand why it is > doing > >>> so? The only reason i can think of is the organism's name which is very > >> long > >>> as compared to others. > >>> > >>> Thanks > >>> Shalabh > >>> > >>> > >>> > >>> On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > >>> torsten.seemann at infotech.monash.edu.au> wrote: > >>> > >>>>> i have a huge GenBank file ( downloaded from RDP containing all > >>>>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > >>>> linage (in ORGANISM). > >>>>> I am getting the output like: > >>>>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > >>>>> Holophagales Holophagae "Acidobacteria" Bacteria Root > >>>>> This is the exact output i want, but i am missing lot of records > (they > >>>> are > >>>>> there in the genbank file but not in my output). > >>>>> I also got a warning during parsing: > >>>>> --------------------- WARNING --------------------- > >>>>> MSG: Unbalanced quote in: > >>>>> /db_xref="taxon:35783" /germline" > >>>>> /mol_type="genomic DNA" > >>>>> /organism="Enterococcus sp." > >>>>> /strain="LMG12316"No further qualifiers will be added for this > feature > >>>>> --------------------------------------------------- > >>>>> So i was just wondering that is this warning message causing that > >> problem > >>>> or > >>>>> i am doing something wrong? > >>>> > >>>> "Unbalanced quote" means there is not an even number (multiple of 2) > >>>> double-quote (") symbols around the tag's value. I can see that the > >>>> first line (below) looks problematic: > >>>> > >>>> YOU HAVE: > >>>> > >>>> /db_xref="taxon:35783" /germline" > >>>> > >>>> SHOULD BE: > >>>> > >>>> /db_xref="taxon:35783" > >>>> /germline > >>>> > >>>> I suspect there is a problem either with RDP's genbank producer, or > >>>> Bioperl is having problem with the "germline" qualifier which is a > >>>> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > >>>> think in Bioperl this is handled by setting the value to "_no_value" > >>>> ?) > >>>> > >>>> http://www.ncbi.nlm.nih.gov/collab/FT/ > >>>> > >>>> Qualifier /germline > >>>> Definition the sequence presented in the entry has not undergone > >>>> somatic > >>>> rearrangement as part of an adaptive immune response; it is > >>>> the > >>>> unrearranged sequence that was inherited from the parental > >>>> germline > >>>> Value format none > >>>> Example /germline > >>>> Comment /germline should not be used to indicate that the > source > >> of > >>>> the sequence is a gamete or germ cell; > >>>> /germline and /rearranged cannot be used in the same source > >>>> feature; > >>>> /germline and /rearranged should only be used for molecules > >>>> that > >>>> can undergo somatic rearrangements as part of an > >>>> adaptive immune > >>>> response; these are the T-cell receptor (TCR) and > >>>> immunoglobulin > >>>> loci in the jawed vertebrates, and the unrelated variable > >>>> lymphocyte receptor (VLR) locus in the jawless fish > >>>> (lampreys > >>>> and hagfish); > >>>> /germline and /rearranged should not be used outside of the > >>>> Craniata (taxid=89593) > >>>> > >>>> > >>>> --Torsten Seemann > >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > >>>> University, AUSTRALIA > >>>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jay at jays.net Wed May 5 12:28:10 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 11:28:10 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> Message-ID: <512A88E4-85A0-4841-B6A7-9915FE0800BA@jays.net> On May 5, 2010, at 10:59 AM, Jay Hannah wrote: > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah Oops. Should have checked Smolder before sending that email... Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah $ prove -v t/email_signatures.t t/email_signatures.t .. 1..7 ok 1 - $work->[0]->{Outlook} email signatures up to date ok 2 - $work->[0]->{Netmail} email signatures up to date ok 3 - $work->[1]->{Lotus_Notes} email signatures up to date not ok 4 - $home->[0]->{MacMini_Mail.app} email signatures up to date ok 5 - $home->[0]->{MacMini_Entourage.app} email signatures up to date ok 6 - $home->[0]->{laptop_Mail.app} email signatures up to date ok 7 - $home->[0]->{laptop_Entourage.app} email signatures up to date # Failed test '$home->[0]->{MacMini_Mail.app} email signatures up to date' # at t/email_signatures.t line 5. # Looks like you failed 1 test of 7. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/7 subtests Test Summary Report ------------------- t/email_signatures.t (Wstat: 256 Tests: 7 Failed: 1) Failed test: 4 Non-zero exit status: 1 Files=1, Tests=7, 0 wallclock secs ( 0.03 usr 0.01 sys + 0.03 cusr 0.00 csys = 0.07 CPU) Result: FAIL From jay at jays.net Wed May 5 11:59:37 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 10:59:37 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> Message-ID: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> On May 5, 2010, at 7:12 AM, Chris Fields wrote: > I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? I would definitely start with trunk and see how it goes. Last night I tried to smoke all our old $work[0] tags and failed impressively. Our tests were (and probably still are) too reliant on 3rd party black boxes being online and responsive, and servers tend to move and get reconfigured over the years. Presumably BioPerl and Moose and more self-contained (unless external deps are explicitly enabled), so perhaps historical smoking would work fairly well. In Moose land the request is that I smoke not only Moose, but everything on CPAN that *depends on Moose*: export MOOSE_TEST_MD=1; prove xt/test-my-dependents.t Which should be ... educational. :) While exciting, I don't think that concept translates to the BioPerl monolith. If I'm the only one smoking, you'll get a very limited number of architecture + perl version combinations reported. Which begs the question of how to harness a broader tester pool. It's great that 342 systems smoked our latest CPAN upload: http://static.cpantesters.org/distro/B/bioperl.html But the crazy I'm embarking on would mean several smokes each day (every svn/git commit?), compared to the cpantesters who haven't had a new CPAN release to smoke since Sep 2009 (1.6.1). Maybe I'd just do one or two a day or something? Whoever wanted to could report into our central Smolder server using their architectures + perl versions. A volunteer would just install Smolder from CPAN and run this in their bioperl-live directory: prove -I . --recurse --archive test_run.tar.gz smolder_smoke_signal --server smolder.jays.net \ --username MyUserName --password MyPass \ --file test_run.tar.gz --project bioperl-live --tags trunk Deep ponderings, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From David.Messina at sbc.su.se Wed May 5 17:27:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 23:27:24 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> > Do we want to retain the git-SVN metadata on commits? What are the tradeoffs with this? >From the little reading I've done, it seems that space and clutter are the chief drawbacks, but that it's easy to strip this metadata out later. Does that jibe with your impression? > Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly My github account name is: DaveMessina Do I have an @bioperl.org address? I tried sending mail to a few likely permutations without success. In any case, I added dave_messina -at- bioperl.org as an email address on my github account. > Are we sticking with a single centralized repo (SVN-like)? I am a total git novice, but it's my understanding that it's still a good idea, particularly with a big many-author project like BioPerl, to have a primary, official repo. But I'd be interested in hearing more discussion on this. We're at a good place to make large-ish changes to how we do things, I think. > Will that be github, or will github be a downstream repo to our work on dev? My only concern with github being primary is in case something happens to github. Not likely, I know, but it seems prudent to maintain a certain amount of control over our destiny. So I'm inclined to make dev be primary and github downstream, with the assumption that it'd trivial to abandon dev and make github primary in the future if we want. Or would it be enough to auto-mirror to dev.open-bio.org, which could serve as a fallback in case github goes offline, temporarily or permanently? > We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). Are there any git-familiar folks out there who could comment on the pros and cons of this? Perhaps some of the other Bio* projects who have switched to git could advise. Right now, without further technical details, I think it'd be better to have one true primary just because it's less confusing and easier to manage, particularly if we're to follow a model like the one mentioned just below: > I would highly suggest we start working on branches for almost everything and merge over to trunk. > [...] > I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model Yep, that looks good to me, too. > One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. We should try to make sure we have this sorted before going "live". > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? Yes! We want more people to jump in ? one of the benefits of git and github is that they encourage this. > 6) SVN Read/Write to GitHub > > I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? Read-only for sure ? that seems harmless, and we want to give people lots of ways to get BioPerl. Write ? let's play with it a bit, making a few test commits to bioperl-test, and see what happens. It would be nice if we don't force everyone who contributes to BioPerl to have to switch over to git immediately. Me included. :) > 7) Others? What happens when we start splitting up bioperl into separate distros? Do we put them each into a separate repo? Dave From David.Messina at sbc.su.se Wed May 5 17:40:46 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 23:40:46 +0200 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> Message-ID: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> > Presumably BioPerl and Moose and more self-contained (unless external deps are explicitly enabled), so perhaps historical smoking would work fairly well. Very few of BioPerl's tests rely on outside servers, and those that do have to be turned on explicitly with a network-tests flag. So hopefully that won't be an issue. > In Moose land the request is that I smoke not only Moose, but everything on CPAN that *depends on Moose*: > [...] > While exciting, I don't think that concept translates to the BioPerl monolith. Agreed, not really. Except for some of the GMOD stuff. And anyway this could always be done later if desired. Probably much later. :) > Whoever wanted to could report into our central Smolder server using their architectures + perl versions. A volunteer would just install Smolder from CPAN and run this in their bioperl-live directory: > > prove -I . --recurse --archive test_run.tar.gz > smolder_smoke_signal --server smolder.jays.net \ > --username MyUserName --password MyPass \ > --file test_run.tar.gz --project bioperl-live --tags trunk Would the reporter need to have any special setup to do this? Could this kind of reporting be written into the BioPerl Build.PL as a user-settable option (just like the options for installing scripts or running network tests)? If so, then we could get lots of feedback on trunk (master) commits and not just releases. Dave From jason at bioperl.org Wed May 5 18:45:41 2010 From: jason at bioperl.org (Jason Stajich) Date: Wed, 05 May 2010 15:45:41 -0700 Subject: [Bioperl-l] Modules in Bio:Tree In-Reply-To: <4BE1D0E2.9010500@mail.mcgill.ca> References: <4BE1D0E2.9010500@mail.mcgill.ca> Message-ID: <4BE1F515.7090604@bioperl.org> Please use the mailing list for questions. The nodes are objects not strings you print - as it shows in http://bioperl.org/wiki/HOWTO:Trees#Example_Code you access information from them with the object methods like 'id' so print $leaf->id, "\n" would probably accomplish what you are looking for right now. -jason Sudeep Mehrotra wrote, On 5/5/10 1:11 PM: > Hello Jason, > I am using the Bio:Tree modules to get a list of all the leaves in > their respective clusters. I looked at the examples and followed the > functions of various modules but I am not able to get the desired result. > > My input looks as follows: > ((((Candidatus_Korarchaeum)Korarchaeota,((((Cenarchaeum_symbiosum)Cenarchaeum)Cenarchaeaceae)Cenarchaeales,((((Nitrosopumilus_maritimus)Nitrosopumilus)Nitrosopumilaceae)Nitrosopumilales)marine_archaeal_group_1)Thaumarchaeota,(((((Archaeoglobus_fulgidus)Archaeoglobus)Archaeoglobaceae)Archaeoglobales)Archaeoglobi, > > and so on.... > > Code is like this: > $input = new Bio::TreeIO(-file =>"$file1",-format => "newick"); > $tree = $input->next_tree; > @leaves = $tree->get_leaf_nodes(); > foreach $leaf (@leaves) > { > print "$leaf\n"; > } > The ouput I get is: > Bio::Tree::Node=HASH(0xa783e0) > Bio::Tree::Node=HASH(0xa78710) > Bio::Tree::Node=HASH(0xa78ab0) > > Not sure what I am doing wrong. > > Objective is to get a cluster of all the leaves. > > Thanks From florent.angly at gmail.com Wed May 5 20:16:05 2010 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 06 May 2010 10:16:05 +1000 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <4BE20A45.5090206@gmail.com> Hi Chris, On 06/05/10 00:46, Chris Fields wrote: > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I think, once one has signed up with github, you can add that same address to your current ones, and it should map to your github account. If we use dev.open-bio.org as our central git repo, we won't need to go through with that, but we will need a viewable version of dev available somehow (mirrored on github or otherwise). Speaking of... > I have a GitHub account, fangly, on which I just added the email address fangly at bioperl.org . Thanks for your efforts working on the Git migration. Florent From jay at jays.net Wed May 5 23:18:47 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:18:47 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: I smoked trunk a few times. Check out all the pretty buttons and graphs and such: http://biobase2.ist.unomaha.edu:8080/app/projects/smoke_reports/1 How you too can submit smoke results: http://jays.net/wiki/Smolder Neat? Not? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Wed May 5 23:31:05 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:31:05 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: On May 5, 2010, at 4:40 PM, Dave Messina wrote: > Very few of BioPerl's tests rely on outside servers, and those that do have to be turned on explicitly with a network-tests flag. So hopefully that won't be an issue. I said "no" to the network tests for my smoke runs. Haven't really examined the results enough to know if the failures are my fault or what. Since I always use bioperl-live out of SVN (soon git) I may not be following the ./Build.PL procedure correctly. > Agreed, not really. Except for some of the GMOD stuff. And anyway this could always be done later if desired. Probably much later. :) Ya. Some day http://smolder.open-bio.org hosting jillions of projects would be dreamy! :) Any open-bio.org projects using TAP other than BioPerl? Smolder can host anything TAP, and TAP producers are available in at least 17 languages: http://testanything.org/wiki/index.php/TAP_Producers > Would the reporter need to have any special setup to do this? LWP::UserAgent or Smolder's smolder_smoke_signal are the two methods I've successfully executed so far: http://jays.net/wiki/Smolder > Could this kind of reporting be written into the BioPerl Build.PL as a user-settable option (just like the options for installing scripts or running network tests)? > > If so, then we could get lots of feedback on trunk (master) commits and not just releases. Ya, wow. I've never built BioPerl "the right way" (I'm an SVN/git junkie) so I'm not sure how this would get put into Build.PL. Would you prompt the user, something like "Since you just installed BioPerl, we'd like to connect to the Internet and report in your test results. Is this ok? [yes] " ? It would be very cool to collect and trend thousands of reports, assuming it can be 100% automated for the user. Thanks for the feedback! :) Time to putter my motorcycle home before it gets too cold. G'night, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Wed May 5 23:43:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 22:43:14 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. chris On May 5, 2010, at 10:18 PM, Jay Hannah wrote: > I smoked trunk a few times. Check out all the pretty buttons and graphs and such: > > http://biobase2.ist.unomaha.edu:8080/app/projects/smoke_reports/1 > > How you too can submit smoke results: > > http://jays.net/wiki/Smolder > > Neat? Not? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Wed May 5 23:55:40 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:55:40 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: On May 5, 2010, at 10:43 PM, Chris Fields wrote: > Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. Ya, seems like the way to go. LWP is all over inside BioPerl already, whereas Smolder itself has 147 dependencies, most of which probably aren't relevant to most BioPerl users. :) http://deps.cpantesters.org/?module=Smolder;perl=latest So a stand-alone script that could be run whenever, plus (eventually) a prompt in Build.PL asking about running it? Not sure if Build.PL can somehow use the "prove --archive" hook to store the results during the normal installation run through all the tests... Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From lincoln.stein at gmail.com Thu May 6 08:01:09 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 6 May 2010 08:01:09 -0400 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: My github username is lstein and I've just added lstein at bioperl.org to my linked email addresses. I hope I have a bioperl.org address; I never use it! Lincoln On Wed, May 5, 2010 at 10:46 AM, Chris Fields wrote: > All, > > I would like to finalize moving over to git/github very soon. We're sort > of in limbo on this, so it needs to progress forward. We'll need to do some > initial cleanup after the move (Heikki is already doing a few things on the > test repo, which we'll need to diff over to the new one). > > So with that in mind, here are my thoughts. This is copied over to this > wiki page, in case you don't want to reply here: > > http://www.bioperl.org/wiki/From_SVN_to_Git > > (thanks Mark!) > > 1) Timeline > > When? Sooner the better (weeks as opposed to months). Our anon. svn is > down, likely permanently ( > http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > 2) Migration strategy > > Now mainly worked out using svn2git, which is very fast. We would need to > make the svn repo on dev read-only during this transition. My guess is it > would take very little time. Do we want to retain the git-SVN metadata on > commits? This is viewable with our current read-only mirror on github: > > > http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca > > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on > github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used > their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I > think, once one has signed up with github, you can add that same address to > your current ones, and it should map to your github account. If we use > dev.open-bio.org as our central git repo, we won't need to go through with > that, but we will need a viewable version of dev available somehow (mirrored > on github or otherwise). Speaking of... > > 4) Development strategy > > Are we sticking with a single centralized repo (SVN-like)? Will that be > github, or will github be a downstream repo to our work on dev? We could > feasibly have github be an active, forkable repo that could be > bidirectionally synced with dev, but I'm not sure of the logistics on this > (this popped up before with svn migration and was rejected b/c it was > considered too difficult to maintain). > > Git makes it very easy to make branches and merge in code to trunk. With > that in mind, I would highly suggest we start working on branches for almost > everything and merge over to trunk. There is very little to no overhead in > doing so with git. > > I like this strategy (Mark Jensen pointed this out): > http://nvie.com/git-model > > Also, several points were raised in a related project (Parrot) considering > a move to git/github from svn. One in particular was that git allows > destructive commits. Jonathan Leto indicated we can set up specific > branches that don't allow this, using commit hooks, so my guess is the > master branch and release branches wouldn't allow rewinds. > > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? > > http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod > > This is easy with github and forks. > > 6) SVN Read/Write to GitHub > > It was recently announced that one can access a github repo using > subversion as read-only, and just yesterday experimental write to github is > allowed: > > http://github.com/blog/644-subversion-write-support > > I can see allowing read-only svn, but write support is still experimental. > Do we want to allow that? > > 7) Others? > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Thu May 6 09:01:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 08:01:56 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> Message-ID: <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> (comments interspersed below) On May 5, 2010, at 4:27 PM, Dave Messina wrote: >> Do we want to retain the git-SVN metadata on commits? > > What are the tradeoffs with this? > > From the little reading I've done, it seems that space and clutter are the chief drawbacks, but that it's easy to strip this metadata out later. Does that jibe with your impression? I don't really see much use for it personally, beyond retaining the SVN commit #. >> Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly > > My github account name is: DaveMessina > > Do I have an @bioperl.org address? I tried sending mail to a few likely permutations without success. In any case, I added dave_messina -at- bioperl.org as an email address on my github account. I think if you have a bioperl dev account you should have a bioperl.org set up. That's one thing I'm not absolutely sure of. >> Are we sticking with a single centralized repo (SVN-like)? > > I am a total git novice, but it's my understanding that it's still a good idea, particularly with a big many-author project like BioPerl, to have a primary, official repo. But I'd be interested in hearing more discussion on this. We're at a good place to make large-ish changes to how we do things, I think. > > >> Will that be github, or will github be a downstream repo to our work on dev? > > My only concern with github being primary is in case something happens to github. Not likely, I know, but it seems prudent to maintain a certain amount of control over our destiny. > > So I'm inclined to make dev be primary and github downstream, with the assumption that it'd trivial to abandon dev and make github primary in the future if we want. > > Or would it be enough to auto-mirror to dev.open-bio.org, which could serve as a fallback in case github goes offline, temporarily or permanently? Well, the nice thing about git is essentially everyone who pulls has a copy of the repo. It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. We could also use alternate mirrors for github besides dev. http://repo.or.cz/w is one example. >> We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). > > Are there any git-familiar folks out there who could comment on the pros and cons of this? Perhaps some of the other Bio* projects who have switched to git could advise. > > Right now, without further technical details, I think it'd be better to have one true primary just because it's less confusing and easier to manage, particularly if we're to follow a model like the one mentioned just below: We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. >> I would highly suggest we start working on branches for almost everything and merge over to trunk. >> [...] >> I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model > > Yep, that looks good to me, too. > > > >> One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. > > We should try to make sure we have this sorted before going "live". Would be adding a pre-commit hook to disallow this. I'll look into it. >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? > > Yes! > > We want more people to jump in ? one of the benefits of git and github is that they encourage this. > > > >> 6) SVN Read/Write to GitHub >> >> I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? > > Read-only for sure ? that seems harmless, and we want to give people lots of ways to get BioPerl. > > Write ? let's play with it a bit, making a few test commits to bioperl-test, and see what happens. It would be nice if we don't force everyone who contributes to BioPerl to have to switch over to git immediately. Me included. :) Sounds good to me. >> 7) Others? > > What happens when we start splitting up bioperl into separate distros? Do we put them each into a separate repo? Yes. > Dave Thanks! chris From cjfields at illinois.edu Thu May 6 10:19:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 09:19:06 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: <3E35F38F-29A0-4419-AE24-AD25A0D6A6A1@illinois.edu> prove generally is just a perl script frontend for Test::Harness and App::Prove, correct? It is included in core from perl 5 on. Here is the code for 'prove' on my local setup: use strict; use App::Prove; my $app = App::Prove->new; $app->process_args(@ARGV); exit( $app->run ? 0 : 1 ); We could add a 'Build smoke' or somesuch that does this internally. I'm tending to shift away from Bio::Root::Build for such things at the moment, but maybe add something there? chris On May 5, 2010, at 10:55 PM, Jay Hannah wrote: > On May 5, 2010, at 10:43 PM, Chris Fields wrote: >> Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. > > Ya, seems like the way to go. LWP is all over inside BioPerl already, whereas Smolder itself has 147 dependencies, most of which probably aren't relevant to most BioPerl users. :) > > http://deps.cpantesters.org/?module=Smolder;perl=latest > > So a stand-alone script that could be run whenever, plus (eventually) a prompt in Build.PL asking about running it? Not sure if Build.PL can somehow use the "prove --archive" hook to store the results during the normal installation run through all the tests... > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Thu May 6 10:50:42 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 6 May 2010 09:50:42 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> Message-ID: <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> Chris, I added 'jhannah at bioperl.org' to my github list of email addresses. Can you add jhannah to the list of github committers in case github becomes the master repo? I need to clean up branches 'jhannah' and 'yapc10hackathon' whenever the transition is official and the master repo is declared (github or open-bio.org). Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Thu May 6 10:56:25 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 6 May 2010 09:56:25 -0500 Subject: [Bioperl-l] new core developers Rob Buels and Dave Messina In-Reply-To: References: Message-ID: On May 2, 2010, at 2:28 PM, Mark A. Jensen wrote: > On behalf of the core team, I am delighted to announce two new members: Rob Buels and Dave Messina. Woot! Congrats! Suddenly we WILL have a core dev at YAPC::NA for the hackathon! I'm now expecting great things from us. :) http://bioperl.org/wiki/YAPC Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Thu May 6 11:02:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 10:02:36 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> Message-ID: Done. I think, unless there are a terrible number of objections, we'll push this in the next week or two. Need to look into the pre-commit hook setup for non-destructive commits, post-commit hook for posting commits to bioperl-guts, etc. chris On May 6, 2010, at 9:50 AM, Jay Hannah wrote: > Chris, > > I added 'jhannah at bioperl.org' to my github list of email addresses. Can you add jhannah to the list of github committers in case github becomes the master repo? > > I need to clean up branches 'jhannah' and 'yapc10hackathon' whenever the transition is official and the master repo is declared (github or open-bio.org). > > Thanks, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki.lehvaslaiho at gmail.com Thu May 6 13:26:48 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 6 May 2010 20:26:48 +0300 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: On 5 May 2010 17:46, Chris Fields wrote: > All, > > I would like to finalize moving over to git/github very soon. We're sort > of in limbo on this, so it needs to progress forward. We'll need to do some > initial cleanup after the move (Heikki is already doing a few things on the > test repo, which we'll need to diff over to the new one). > Do not worry about those, I'll move them into the final repo once it is there. I am just making sure everything works. > So with that in mind, here are my thoughts. This is copied over to this > wiki page, in case you don't want to reply here: > > http://www.bioperl.org/wiki/From_SVN_to_Git > > (thanks Mark!) > > 1) Timeline > > When? Sooner the better (weeks as opposed to months). Our anon. svn is > down, likely permanently ( > http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > ASAP. > 2) Migration strategy > > Now mainly worked out using svn2git, which is very fast. We would need to > make the svn repo on dev read-only during this transition. My guess is it > would take very little time. Do we want to retain the git-SVN metadata on > commits? This is viewable with our current read-only mirror on github: > > > http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca > > Keep it. It does no harm. > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on > github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used > their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I > think, once one has signed up with github, you can add that same address to > your current ones, and it should map to your github account. If we use > dev.open-bio.org as our central git repo, we won't need to go through with > that, but we will need a viewable version of dev available somehow (mirrored > on github or otherwise). Speaking of... > Let's go for github as the main repo. It adds visibility and has the coolness factor that helps. > 4) Development strategy > > Are we sticking with a single centralized repo (SVN-like)? Will that be > github, or will github be a downstream repo to our work on dev? We could > feasibly have github be an active, forkable repo that could be > bidirectionally synced with dev, but I'm not sure of the logistics on this > (this popped up before with svn migration and was rejected b/c it was > considered too difficult to maintain). > > Git makes it very easy to make branches and merge in code to trunk. With > that in mind, I would highly suggest we start working on branches for almost > everything and merge over to trunk. There is very little to no overhead in > doing so with git. > > I like this strategy (Mark Jensen pointed this out): > http://nvie.com/git-model > Lets try to follow this strategy. I do not think moving away from svn and going decentralized at one go would work at all. > Also, several points were raised in a related project (Parrot) considering > a move to git/github from svn. One in particular was that git allows > destructive commits. Jonathan Leto indicated we can set up specific > branches that don't allow this, using commit hooks, so my guess is the > master branch and release branches wouldn't allow rewinds. > I would not worry too much about that. With git we'll have dozens if not not hundreds of full copies of the repo as a backup. > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? > > http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod > Interesting and educational document. Let's learn as much a we can from it. This is easy with github and forks. > The more the merrier. BTW, I can see Moose using Shipit, http://search.cpan.org/~bradfitz/ShipIt-0.55/ that might be worth using in BioPerl. > 6) SVN Read/Write to GitHub > > It was recently announced that one can access a github repo using > subversion as read-only, and just yesterday experimental write to github is > allowed: > > http://github.com/blog/644-subversion-write-support > > I can see allowing read-only svn, but write support is still experimental. > Do we want to allow that? > Why not is someone insists on using it. Once people get over the initial problems of moving to a different mind set in git, very few will want to use svn. There might be situtations when git does not work, however, so lets allow for svn usage. > > 7) Others? > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Thu May 6 14:35:55 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 6 May 2010 20:35:55 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> Message-ID: [ git-SVN metadata ] > I don't really see much use for it personally, beyond retaining the SVN commit #. Oh well heck, in that case we may as well ditch it. If there's some way we could easily keep an inactive, archived version with the SVN to github commit # mapping, that would be a nice safety measure, but if it's too much trouble we needn't bother. [ github or dev as primary ] > It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. Great, okay, sounds like there won't be any problem there. [ single repo? ] > We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. Sounds like a plan. I'm pretty swamped until late next week, but if there's anything I can do to help at that time, just holler... Dave From cseligman at earthlink.net Thu May 6 15:23:40 2010 From: cseligman at earthlink.net (Chet Seligman) Date: Thu, 6 May 2010 12:23:40 -0700 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 Message-ID: <001b01caed51$a2e745c0$e8b5d140$@net> I need some help in installing this as it is not in the Active-perl repository. Here's what I have done: 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz 2. Extracted it into an empty directory IN 3. Planned to install by specifying the ppd file directly: ppm install c:\IN\whatever module-name.ppd However, there is no .ppd file extracted. I'd appreciate it if someone would explain how to get Bio::Graphics installed? Chet From scott at scottcain.net Thu May 6 15:44:04 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 6 May 2010 15:44:04 -0400 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 In-Reply-To: <001b01caed51$a2e745c0$e8b5d140$@net> References: <001b01caed51$a2e745c0$e8b5d140$@net> Message-ID: Hi Chet, Install it via the cpan shell: $ cpan cpan> install Bio::Graphics Scott On Thu, May 6, 2010 at 3:23 PM, Chet Seligman wrote: > I need some help in installing this as it is not in the Active-perl > repository. Here's what I have done: > 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz > 2. Extracted it into an empty directory IN > 3. Planned to install by specifying the ppd file directly: > ppm install c:\IN\whatever module-name.ppd > > However, there is no .ppd file extracted. > > I'd appreciate it if someone would explain how to get Bio::Graphics > installed? > > Chet > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Thu May 6 15:57:03 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 6 May 2010 15:57:03 -0400 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 In-Reply-To: <002301caed55$53bfc400$fb3f4c00$@net> References: <001b01caed51$a2e745c0$e8b5d140$@net> <002301caed55$53bfc400$fb3f4c00$@net> Message-ID: Hi Chet, Please keep your responses on the bioperl mailing list. As long as you install BioPerl and GD before you try to install Bio::Graphics from cpan, yes, it is perfectly doable. You need to do that in the cmd shell. GD needs to be installed from ppm because it requires compiled code. Scott On Thu, May 6, 2010 at 3:50 PM, Chet Seligman wrote: > Hey Scott: > Is your suggestion doable in Windows? > > How? > > Chet > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Scott Cain > Sent: Thursday, May 06, 2010 12:44 PM > To: Chet Seligman > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Installing Bio-Graphics-2.06 > > Hi Chet, > > Install it via the cpan shell: > > $ cpan > cpan> install Bio::Graphics > > Scott > > > On Thu, May 6, 2010 at 3:23 PM, Chet Seligman > wrote: >> I need some help in installing this as it is not in the Active-perl >> repository. Here's what I have done: >> 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz >> 2. Extracted it into an empty directory IN >> 3. Planned to install by specifying the ppd file directly: >> ppm install c:\IN\whatever module-name.ppd >> >> However, there is no .ppd file extracted. >> >> I'd appreciate it if someone would explain how to get Bio::Graphics >> installed? >> >> Chet >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Thu May 6 16:04:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 15:04:39 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> Message-ID: <48C987D6-A7F2-4FBC-AB75-38F0B234961C@illinois.edu> On May 6, 2010, at 1:35 PM, Dave Messina wrote: > [ git-SVN metadata ] > >> I don't really see much use for it personally, beyond retaining the SVN commit #. > > Oh well heck, in that case we may as well ditch it. > > If there's some way we could easily keep an inactive, archived version with the SVN to github commit # mapping, that would be a nice safety measure, but if it's too much trouble we needn't bother. I think we'll keep it in for the SVN commits. Better to have it just in case. > [ github or dev as primary ] > >> It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. > > Great, okay, sounds like there won't be any problem there. > > > [ single repo? ] > >> We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. > > Sounds like a plan. > > > I'm pretty swamped until late next week, but if there's anything I can do to help at that time, just holler... > > > Dave Okay, will prep another email for the final push over to git. chris From cjfields at illinois.edu Thu May 6 16:13:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 15:13:44 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> On May 6, 2010, at 12:26 PM, Heikki Lehvaslaiho wrote: > ... >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? >> >> http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod >> > > Interesting and educational document. Let's learn as much a we can from it. > > This is easy with github and forks. >> > > The more the merrier. > > BTW, I can see Moose using Shipit, > http://search.cpan.org/~bradfitz/ShipIt-0.55/ > that might be worth using in BioPerl. I agree. Have thought about that, primarily for easier releases down the road. >> 6) SVN Read/Write to GitHub >> >> It was recently announced that one can access a github repo using >> subversion as read-only, and just yesterday experimental write to github is >> allowed: >> >> http://github.com/blog/644-subversion-write-support >> >> I can see allowing read-only svn, but write support is still experimental. >> Do we want to allow that? >> > > Why not is someone insists on using it. Once people get over the initial > problems of moving to a different mind set in git, very few will want to use > svn. There might be situtations when git does not work, however, so lets > allow for svn usage. Nothing really stopping it, unless we add something to a pre-commit hook that prevents it somehow. I'm thinking a move in the next 5 days, maybe starting Monday? I'll try getting a post out on it. chris From rmb32 at cornell.edu Thu May 6 17:09:03 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 06 May 2010 14:09:03 -0700 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> Message-ID: <4BE32FEF.6080707@cornell.edu> The branching model at http://nvie.com/git-model is a good one, but the diagram might be a little intimidating for devs that are new to git. Note that the only branches that most devs will need to be concerned with are the feature branches (sometimes called topic branches), and the main development branch. The other branches are mostly concerned with making releases. To weigh in on other issues on this thread: * Might as well keep the svn metadata, it doesn't hurt and could help in any situations that call for historical digging around. * I don't think we should allow any svn write support. Anybody that truly cannot get over the hump can send patches to the list. Thanks so much for heading this up Chris. Rob From cjfields at illinois.edu Thu May 6 17:28:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 16:28:25 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <4BE32FEF.6080707@cornell.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> Message-ID: <9676F5A9-A778-4440-95EF-14282DF72454@illinois.edu> On May 6, 2010, at 4:09 PM, Robert Buels wrote: > The branching model at http://nvie.com/git-model is a good one, but the diagram might be a little intimidating for devs that are new to git. > > Note that the only branches that most devs will need to be concerned with are the feature branches (sometimes called topic branches), and the main development branch. The other branches are mostly concerned with making releases. > > To weigh in on other issues on this thread: > > * Might as well keep the svn metadata, it doesn't hurt and could help in > any situations that call for historical digging around. > * I don't think we should allow any svn write support. Anybody that > truly cannot get over the hump can send patches to the list. > > Thanks so much for heading this up Chris. > > Rob One stumbling block that I'm seeing is there is a current lack of pre-commit hook support in github (to prevent destructive or history-changing commits). I don't think this will be a problem, but it's worth noting. post-commit is fine. chris From David.Messina at sbc.su.se Thu May 6 17:59:56 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 6 May 2010 23:59:56 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <4BE32FEF.6080707@cornell.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> Message-ID: <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> > * I don't think we should allow any svn write support. Anybody that > truly cannot get over the hump can send patches to the list. Unless svn commits are somehow problematic, is there another reason to disallow it? We're switching to git soon and with little advance notice. We'd be asking all the devs to make the move on our schedule. Dave From dimitark at bii.a-star.edu.sg Thu May 6 22:25:23 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 07 May 2010 10:25:23 +0800 Subject: [Bioperl-l] about Genewise Message-ID: <4BE37A13.6010309@bii.a-star.edu.sg> Hi guys, i have a question about Genewise. Is it possible to get the percent identity between query and target? I am now trying to figure that out. I found no such method so i suppose i should calculate it myself. Thank you for your time and help. Greetings Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From dimitark at bii.a-star.edu.sg Fri May 7 01:03:58 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 07 May 2010 13:03:58 +0800 Subject: [Bioperl-l] more genewise Message-ID: <4BE39F3E.4090204@bii.a-star.edu.sg> Hi guys, another question about genewise. Is it possible to get the query seq and the protein translation of the target seq somehow? So, up to now i could not find a way to get the percent identity between query and target(the protein translation) :( I spent some time on CPAN and perldoc and even checked the code of several modules but still no solution. Then i decided to extract the sequences out of the output file and compare them somehow but i could not find a way and for that. I found that the module 'Bio::Tools::Run::Genewise' is creating internal temp output file which i cant access so i can parse it myself and extract whatever. Because with current implementation i cant access that temp output i hacked a bit 'Bio::Tools::Run::Genewise' so i can pass my output file to the constructor, like that: my $factory = Bio::Tools::Run::Genewise->new( output => $tmpout); #not "-output" cos the module currently doesnt like it I modified the BEGIN section and the '_run' subroutine. My lines and the originals are marked : -------------- BEGIN { @GENEWISE_PARAMS = qw( DYMEM CODON GENE CFREQ SPLICE GENESTATS INIT SUBS INDEL INTRON NULL INSERT SPLICE_MAX_COLLAR SPLICE_MIN_COLLAR GW_EDGEQUERY GW_EDGETARGET GW_SPLICESPREAD KBYTE HNAME ALG BLOCK DIVIDE GENER U V S T G E M); @GENEWISE_SWITCHES = qw(HELP SILENT QUIET ERROROFFSTD TREV PSEUDO NOSPLICE_GTAG SPLICE_GTAG NOGWHSP GWHSP TFOR TABS BOTH HMMER ); $OK_FIELD{OUTPUT}++; *#dimitar * # Authorize attribute fields foreach my $attr ( @GENEWISE_PARAMS, @GENEWISE_SWITCHES, @OTHER_SWITCHES) { $OK_FIELD{$attr}++; } } ----------------------- ----------------------- my ($tfh1,$outfile1) = $self->io->tempfile(-dir=>$self->tempdir); $self->debug("genewise command = $commandstring"); my $outfile2=$self->output; *#dimitar* # my $status = system("$commandstring > $outfile1"); *#original* my $status = system("$commandstring > $outfile2 "); *#dimitar* $self->throw("Genewies call $commandstring crashed: $? \n") unless $status==0; # my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile1); *#original* my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile2); *#dimitar* ----------------------- More the method 'cds' from 'Bio::SeqFeature::Gene::Exon/I' gives nothing back it doesnt matter what i tried. And i tried a lot :) Fortunately for me i dont need that for now. But tried and didnt work so had to say. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From O.Niehuis.zfmk at uni-bonn.de Fri May 7 02:34:54 2010 From: O.Niehuis.zfmk at uni-bonn.de (Dr. Oliver Niehuis) Date: Fri, 7 May 2010 08:34:54 +0200 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifying alignment parameters Message-ID: Hi, I have a question about how to specify parameters for the alignment program MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run MAFFT with the following alignment parameters: --maxiterate 1000 --localpair Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module before, I specified the MAFFT run parameters as follows: @params = ('localpair', 'maxiterate' => 1000); $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); Unfortunately, this code causes an exception error: ------------- EXCEPTION ------------- MSG: Unallowed parameter: LOCALPAIR ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/ Generate_FASTA_files_of_orthologs.pl:55 ------------------------------------- I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT module, but only when leaving the @params array empty; MAFFT then runs with the default parameters. Has anyone an idea how I can specify run parameters for MAFFT via the Bio::Tools::Run::Alignment::MAFFT module? Any help is much appreciated! Best wishes, Oliver From biopython at maubp.freeserve.co.uk Fri May 7 04:51:38 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 May 2010 09:51:38 +0100 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> Message-ID: On Thu, May 6, 2010 at 10:59 PM, Dave Messina wrote: >> * I don't think we should allow any svn write support. ?Anybody that >> ?truly cannot get over the hump can send patches to the list. > > Unless svn commits are somehow problematic, is there another reason to disallow it? >From my reading of the github blog post, svn merges are potentially problematic. http://github.com/blog/644-subversion-write-support Peter From maj at fortinbras.us Fri May 7 07:53:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 07:53:55 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters In-Reply-To: References: Message-ID: Hi Oliver, This module looks like it needs some updating. Here's a hack that should make it work (or at least prevent that exception); put the following lines before the new() call: push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_PARAMS, 'MAXITERATE'; push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, 'LOCALPAIR'; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; HTH, Mark ----- Original Message ----- From: "Dr. Oliver Niehuis" To: Sent: Friday, May 07, 2010 2:34 AM Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters > Hi, > > I have a question about how to specify parameters for the alignment program > MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run > MAFFT with the following alignment parameters: > > --maxiterate 1000 --localpair > > Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module > before, I specified the MAFFT run parameters as follows: > > @params = ('localpair', 'maxiterate' => 1000); > $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); > > Unfortunately, this code causes an exception error: > > ------------- EXCEPTION ------------- > MSG: Unallowed parameter: LOCALPAIR ! > STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ > Bio/Tools/Run/Alignment/MAFFT.pm:211 > STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ > Tools/Run/Alignment/MAFFT.pm:196 > STACK toplevel /Users/Oliver/Desktop/Orthologs/ > Generate_FASTA_files_of_orthologs.pl:55 > ------------------------------------- > > I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT > module, but only when leaving the @params array empty; MAFFT then runs with > the default parameters. > > Has anyone an idea how I can specify run parameters for MAFFT via the > Bio::Tools::Run::Alignment::MAFFT module? > > Any help is much appreciated! > > Best wishes, > Oliver > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri May 7 08:12:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 7 May 2010 07:12:05 -0500 Subject: [Bioperl-l] more genewise In-Reply-To: <4BE39F3E.4090204@bii.a-star.edu.sg> References: <4BE39F3E.4090204@bii.a-star.edu.sg> Message-ID: <4899F495-FA46-4030-B984-EEFF81579C27@illinois.edu> Dimitar, It would be better if you could create a bug report describing the problem (with minimal example data and code) and provide a diff file or patch. This gives us a chance to do some code review and commit the patch if it passes tests. Here's a HOWTO on this: http://www.bioperl.org/wiki/HOWTO:SubmitPatch Let us know when it's submitted and we can take a look. chris On May 7, 2010, at 12:03 AM, Dimitar Kenanov wrote: > Hi guys, > another question about genewise. Is it possible to get the query seq and the protein translation of the target seq somehow? > > So, up to now i could not find a way to get the percent identity between query and target(the protein translation) :( I spent some time on CPAN and perldoc and even checked the code of several modules but still no solution. Then i decided to extract the sequences out of the output file and compare them somehow but i could not find a way and for that. I found that the module 'Bio::Tools::Run::Genewise' is creating internal temp output file which i cant access so i can parse it myself and extract whatever. > > Because with current implementation i cant access that temp output i hacked a bit 'Bio::Tools::Run::Genewise' so i can pass my output file to the constructor, like that: > > my $factory = Bio::Tools::Run::Genewise->new( output => $tmpout); #not "-output" cos the module currently doesnt like it > > I modified the BEGIN section and the '_run' subroutine. My lines and the originals are marked : > -------------- > BEGIN { > @GENEWISE_PARAMS = qw( DYMEM CODON GENE CFREQ SPLICE GENESTATS INIT > SUBS INDEL INTRON NULL INSERT SPLICE_MAX_COLLAR SPLICE_MIN_COLLAR > GW_EDGEQUERY GW_EDGETARGET GW_SPLICESPREAD > KBYTE HNAME ALG BLOCK DIVIDE GENER U V S T G E M); > > @GENEWISE_SWITCHES = qw(HELP SILENT QUIET ERROROFFSTD TREV PSEUDO NOSPLICE_GTAG > SPLICE_GTAG NOGWHSP GWHSP > TFOR TABS BOTH HMMER ); > > $OK_FIELD{OUTPUT}++; *#dimitar > * # Authorize attribute fields > foreach my $attr ( @GENEWISE_PARAMS, @GENEWISE_SWITCHES, > @OTHER_SWITCHES) { $OK_FIELD{$attr}++; } > } > ----------------------- > ----------------------- > my ($tfh1,$outfile1) = $self->io->tempfile(-dir=>$self->tempdir); > $self->debug("genewise command = $commandstring"); > my $outfile2=$self->output; *#dimitar* > # my $status = system("$commandstring > $outfile1"); *#original* > my $status = system("$commandstring > $outfile2 "); *#dimitar* > $self->throw("Genewies call $commandstring crashed: $? \n") unless $status==0; > > # my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile1); *#original* > my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile2); *#dimitar* > ----------------------- > > More the method 'cds' from 'Bio::SeqFeature::Gene::Exon/I' gives nothing back it doesnt matter what i tried. And i tried a lot :) Fortunately for me i dont need that for now. But tried and didnt work so had to say. > > Cheers > Dimitar > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri May 7 11:34:09 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 11:34:09 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters In-Reply-To: <332A01DD-64DA-41EC-B5CE-2BC74BE78038@uni-bonn.de> References: <332A01DD-64DA-41EC-B5CE-2BC74BE78038@uni-bonn.de> Message-ID: <9764564B5CC44A89883498C6309DA045@NewLife> Hi Oliver, I think so, looking at the module again. Instead of the lines in the previous post, put push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, '(LOCALPAIR', 'MAXITERATE'); $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; and create your @params array with @params = ('localpair' => 1, 'maxiterate' => 1000); The switches need to be set with something that returns true, I believe. I *think* this should work for you. But if you would, please submit your original problem as a bug at http://bugzilla.bioperl.org. The module definitely needs some tender loving care. Thanks Mark ----- Original Message ----- From: Dr. Oliver Niehuis To: Mark A. Jensen Sent: Friday, May 07, 2010 11:07 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters Dear Mark, Thanks for your quick reply and the MAFFT module hack. I added your code to my script and it seems to works, except that I can't specify the number of iterations (at least, I don't know how). I can specify my @params = ('localpair', 'maxiterate'); but when I assign 1000 to 'maxiterate' (i.e. 'maxiterate' => 1000), I get again an exception error, complaining about 1000 being an unallowed parameter. ------------- EXCEPTION ------------- MSG: Unallowed parameter: 1000 ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/Generate_FASTA_files_of_orthologs.pl:61 ------------------------------------- Do you know how to fix this? Best wishes, Oliver Am 07.05.2010 um 13:53 schrieb Mark A. Jensen: Hi Oliver, This module looks like it needs some updating. Here's a hack that should make it work (or at least prevent that exception); put the following lines before the new() call: push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_PARAMS, 'MAXITERATE'; push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, 'LOCALPAIR'; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; HTH, Mark ----- Original Message ----- From: "Dr. Oliver Niehuis" To: Sent: Friday, May 07, 2010 2:34 AM Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters Hi, I have a question about how to specify parameters for the alignment program MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run MAFFT with the following alignment parameters: --maxiterate 1000 --localpair Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module before, I specified the MAFFT run parameters as follows: @params = ('localpair', 'maxiterate' => 1000); $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); Unfortunately, this code causes an exception error: ------------- EXCEPTION ------------- MSG: Unallowed parameter: LOCALPAIR ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/ Generate_FASTA_files_of_orthologs.pl:55 ------------------------------------- I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT module, but only when leaving the @params array empty; MAFFT then runs with the default parameters. Has anyone an idea how I can specify run parameters for MAFFT via the Bio::Tools::Run::Alignment::MAFFT module? Any help is much appreciated! Best wishes, Oliver _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Fri May 7 12:42:38 2010 From: hartzell at alerce.com (George Hartzell) Date: Fri, 7 May 2010 09:42:38 -0700 Subject: [Bioperl-l] [job] Contract programmer in Bioinformatics at Genentech. Message-ID: <19428.17150.181595.755965@gargle.gargle.HOWL> Genentech's Bioinformatics department seeks an experienced software engineer for a six month contract. Modern Perl (or enlightened, or ..., just not circa 1998) style is required. We build tools to support our Research labs, collecting, storing, massaging, and presenting information to computer-philes and -phobes. We have more to do than we can handle, you'll be pitching in. Exactly what you'd be doing will be a function of your skills and our needs, and will probably vary a bit over the six month period. You write tests, sometimes even before you write code. You're not afraid of a little SQL and are comfortable collaborating with folks who were born speaking it. You're familiar with things like Moose, Rose::DB::Object, CGI::Application, NYTProf, and their ilk (or brethren) and more importantly are excited about learning more about them and using them in real-world work. Smoothing out our in-house DPAN, setting up an automated build/smoke system (we have Hudson handling Java builds already) and helping with some other infrastructure stuff is also on the table. You'll be working more-or-less full time in South San Fransisco, there's the potential for a bit of telecommuting once things get running smoothly but the bulk of the job is onsite. Things that you should be comfortable with include: Perl ("modern") SQL, object relational mappers Web application (CGI::Application, or similar) CPAN, Module::Build, Dist::Zilla, etc.... Linux Software engineering in a professional environment. Experience in bioinformatics, biology, or supporting scientists would be helpful but is not required. Please send cover letters and resumes to my work address: georgewh at gene.com (the ability to follow directions is important). Bonus points for easy formats (PDF is great!), demerits for sending me stuff in DOS specific archive formats. g. From qqq2395 at gmail.com Thu May 6 14:51:13 2010 From: qqq2395 at gmail.com (visitor555) Date: Thu, 6 May 2010 11:51:13 -0700 (PDT) Subject: [Bioperl-l] Bio::Align - alignment by position? Message-ID: <28478022.post@talk.nabble.com> Hi, I have a list alignment positions and I want to get each column them from the alignment. If I slice the alignment the sequence with gaps in these positions disappear. I can rotate on each seq and then split the sequence. Is there better way to go over the alignment position by position? thanks ! -- View this message in context: http://old.nabble.com/Bio%3A%3AAlign---alignment-by-position--tp28478022p28478022.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jillianrowe91286 at gmail.com Mon May 3 08:42:56 2010 From: jillianrowe91286 at gmail.com (mindlessbrain) Date: Mon, 3 May 2010 05:42:56 -0700 (PDT) Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall Message-ID: <28434717.post@talk.nabble.com> Hey all, I'm trying to run some code for StandAloneBLast in Windows Vista: [code] #!/usr/bin/perl use Bio::DB::SwissProt; use Bio::Tools::Run::StandAloneBlast; BEGIN { $ENV{PATH}="D:/blast-2.2.23+/bin/:"; } my $database = new Bio::DB::SwissProt; my $query = $database->get_Seq_by_id('TAUD_ECOLI'); my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastp', 'database' => 'swissprot', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); my $result = $blast_report->next_result; while( my $hit = $result->next_hit()) { print "\thit name: ", $hit->name(), " significance: ", $hit->significance(), "\n"; } [/code] I installed BLAST from the NCBI website. I get this when I run dir on the bin: D:\blast-2.2.23+\bin>dir Volume in drive D has no label. Volume Serial Number is 224C-0190 Directory of D:\blast-2.2.23+\bin 05/03/2010 03:02 PM . 05/03/2010 03:02 PM .. 03/08/2010 11:09 PM 2,789,376 blastdbcheck.exe 03/08/2010 11:09 PM 4,009,984 blastdbcmd.exe 03/08/2010 11:09 PM 1,810,432 blastdb_aliastool.exe 03/08/2010 11:09 PM 6,225,920 blastn.exe 03/08/2010 11:09 PM 6,221,824 blastp.exe 03/08/2010 11:09 PM 6,213,632 blastx.exe 03/08/2010 11:09 PM 5,316,608 blast_formatter.exe 03/08/2010 11:09 PM 3,215,360 convert2blastmask.exe 03/08/2010 11:09 PM 3,211,264 dustmasker.exe 03/08/2010 11:09 PM 51,178 legacy_blast.pl 03/08/2010 11:09 PM 3,866,624 makeblastdb.exe 03/08/2010 11:09 PM 3,612,672 makembindex.exe 03/08/2010 11:09 PM 6,344,704 psiblast.exe 03/08/2010 11:09 PM 6,201,344 rpsblast.exe 03/08/2010 11:09 PM 6,205,440 rpstblastn.exe 03/08/2010 11:09 PM 3,608,576 segmasker.exe 03/08/2010 11:09 PM 6,320,128 tblastn.exe 03/08/2010 11:09 PM 6,209,536 tblastx.exe 03/08/2010 11:09 PM 10,010 update_blastdb.pl 03/08/2010 11:09 PM 3,530,752 windowmasker.exe 20 File(s) 84,975,364 bytes 2 Dir(s) 122,390,626,304 bytes free I have an ncbi.ini file in my windows directory that contains: [NCBI] DATA=D:\blast-2.2.23+\data [BLAST] BLASTDB=D:\blast-2.2.23+\db Here's what my environmental variables looks like: http://old.nabble.com/file/p28434717/environmental%2Bvariables.jpg Help would be very, very appreciated! -- View this message in context: http://old.nabble.com/Bio%3A%3ATools%3A%3ARun%3A%3AStandAloneBlast-can%27t-find-path-to-blastall-tp28434717p28434717.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Fri May 7 16:07:58 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 16:07:58 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall In-Reply-To: <28434717.post@talk.nabble.com> References: <28434717.post@talk.nabble.com> Message-ID: <670B2E492D9E4D158618EC4750C595AF@NewLife> You've got blast+, so have a look at Bio::Tools::Run::StandAloneBlastPlus, should solve it. MAJ ----- Original Message ----- From: "mindlessbrain" To: Sent: Monday, May 03, 2010 8:42 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall > > Hey all, > > I'm trying to run some code for StandAloneBLast in Windows Vista: > > [code] > #!/usr/bin/perl > > use Bio::DB::SwissProt; > use Bio::Tools::Run::StandAloneBlast; > > BEGIN > { > $ENV{PATH}="D:/blast-2.2.23+/bin/:"; > } > > my $database = new Bio::DB::SwissProt; > my $query = $database->get_Seq_by_id('TAUD_ECOLI'); > > my $factory = Bio::Tools::Run::StandAloneBlast->new( > 'program' => 'blastp', > 'database' => 'swissprot', > _READMETHOD => "Blast" > ); > my $blast_report = $factory->blastall($query); > my $result = $blast_report->next_result; > while( my $hit = $result->next_hit()) { > print "\thit name: ", $hit->name(), > " significance: ", $hit->significance(), "\n"; > } > [/code] > > I installed BLAST from the NCBI website. I get this when I run dir on the > bin: > > D:\blast-2.2.23+\bin>dir > Volume in drive D has no label. > Volume Serial Number is 224C-0190 > > Directory of D:\blast-2.2.23+\bin > > 05/03/2010 03:02 PM . > 05/03/2010 03:02 PM .. > 03/08/2010 11:09 PM 2,789,376 blastdbcheck.exe > 03/08/2010 11:09 PM 4,009,984 blastdbcmd.exe > 03/08/2010 11:09 PM 1,810,432 blastdb_aliastool.exe > 03/08/2010 11:09 PM 6,225,920 blastn.exe > 03/08/2010 11:09 PM 6,221,824 blastp.exe > 03/08/2010 11:09 PM 6,213,632 blastx.exe > 03/08/2010 11:09 PM 5,316,608 blast_formatter.exe > 03/08/2010 11:09 PM 3,215,360 convert2blastmask.exe > 03/08/2010 11:09 PM 3,211,264 dustmasker.exe > 03/08/2010 11:09 PM 51,178 legacy_blast.pl > 03/08/2010 11:09 PM 3,866,624 makeblastdb.exe > 03/08/2010 11:09 PM 3,612,672 makembindex.exe > 03/08/2010 11:09 PM 6,344,704 psiblast.exe > 03/08/2010 11:09 PM 6,201,344 rpsblast.exe > 03/08/2010 11:09 PM 6,205,440 rpstblastn.exe > 03/08/2010 11:09 PM 3,608,576 segmasker.exe > 03/08/2010 11:09 PM 6,320,128 tblastn.exe > 03/08/2010 11:09 PM 6,209,536 tblastx.exe > 03/08/2010 11:09 PM 10,010 update_blastdb.pl > 03/08/2010 11:09 PM 3,530,752 windowmasker.exe > 20 File(s) 84,975,364 bytes > 2 Dir(s) 122,390,626,304 bytes free > > I have an ncbi.ini file in my windows directory that contains: > [NCBI] > DATA=D:\blast-2.2.23+\data > [BLAST] > BLASTDB=D:\blast-2.2.23+\db > > Here's what my environmental variables looks like: > > http://old.nabble.com/file/p28434717/environmental%2Bvariables.jpg > > Help would be very, very appreciated! > > > -- > View this message in context: > http://old.nabble.com/Bio%3A%3ATools%3A%3ARun%3A%3AStandAloneBlast-can%27t-find-path-to-blastall-tp28434717p28434717.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From manchunjohn-ma at uiowa.edu Fri May 7 16:17:52 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Fri, 7 May 2010 15:17:52 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> Hi, Right now I'm migrating some of my bioperl scripts from remote to stand-alone BLAST, and stumbled at how RemoteBlast->submit_blast and the StandAloneNCBIBlast->blastall deal with an array parameter. Common code for both versions: My p3_machine=Tools::Run::Primer3(@p3_parameters); [...] My $primer3_results=$p3_machine->run($seq); My $p3_results=$primers3_results->next_primer(); My @temp_primer_info=$p3_results->get_primer; My %primer_info; $primer_info{primer}[0]=$temp_primer_info[0]->seq; $primer_info{primer}[1]=$temp_primer_info[1]->seq; $primer_into{primer}[0]->display_id('F'); $primer_into{primer}[1]->display_id('R'); Code using RemoteBlast: My $remote_blast_machine=Tools::Run::RemoteBlast->new(@remote_blast_params) ; [Parameter setting skipped] $my $r=$remote_blast_machine->submit_blast(@primer_info{primer}); [etc, etc for iteration] Using this code, I have been able to put both sequences forth to the NCBI server and obtain results accordingly; each result object contains hits from an input sequence. However, when I switched to StandAlongBlast this way: My $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_blast _params); My $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); While (my $result=$blast_report->next_result()){ [etc, etc for iteration] } There is only one result object for sequence "F"-- and even so the loop went through twice. I would first suspect I made a mistake first-- but where? John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 From sumanth41277 at yahoo.com Fri May 7 17:34:53 2010 From: sumanth41277 at yahoo.com (polsum) Date: Fri, 7 May 2010 14:34:53 -0700 (PDT) Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU Message-ID: <28491725.post@talk.nabble.com> Hi - We have a pretty powerful computer with Dual-Quadcore intel Xeon w5580 prcoessor with 24 GB ram. When I use Bioperl programs for routine operations like Blastn and blast parsing etc. the programs dont seem to utilize the computer power to the fullest. I mean they just use one of the 8 cores and only 8GB of RAM. Is there a way to ask Perl to use all the available power? I have 64 bit windows and 64 bit Ubuntu and Ubuntu is definitely faster but still it also doesnt use entire cores of the cpu. thanks in advance -- View this message in context: http://old.nabble.com/Bio-Perl-and-multiple-cores-of-CPU-tp28491725p28491725.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Fri May 7 17:46:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 7 May 2010 16:46:24 -0500 Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU In-Reply-To: <28491725.post@talk.nabble.com> References: <28491725.post@talk.nabble.com> Message-ID: You can specify the number of processors to use. With legacy BLAST this is -a 8, with BLAST+ I think this is -num_threads 8 (with the explicit caveat I haven't tried the latter much, so no guarantees, we're not liable for explosions and such). chris On May 7, 2010, at 4:34 PM, polsum wrote: > Hi - We have a pretty powerful computer with Dual-Quadcore intel Xeon w5580 > prcoessor with 24 GB ram. When I use Bioperl programs for routine operations > like Blastn and blast parsing etc. the programs dont seem to utilize the > computer power to the fullest. I mean they just use one of the 8 cores and > only 8GB of RAM. Is there a way to ask Perl to use all the available power? > I have 64 bit windows and 64 bit Ubuntu and Ubuntu is definitely faster but > still it also doesnt use entire cores of the cpu. > > thanks in advance > -- > View this message in context: http://old.nabble.com/Bio-Perl-and-multiple-cores-of-CPU-tp28491725p28491725.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri May 7 18:14:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 00:14:24 +0200 Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU In-Reply-To: References: <28491725.post@talk.nabble.com> Message-ID: On May 7, 2010, at 11:46 PM, Chris Fields wrote: > With legacy BLAST this is -a 8, with BLAST+ I think this is -num_threads 8 (with the explicit caveat I haven't tried the latter much, so no guarantees, we're not liable for explosions and such). Once other caveat if you use BLAST+: be sure you have the latest version 2.2.23. In my informal testing, the num_threads option wasn't working correctly in 2.2.22. Blast parsing will still be single-threaded, by the way. BioPerl programs, like everything else unfortunately, need to explicitly spawn multiple threads or forks to take advantage of multiple cores. While I've never done it myself, I ran across this post which may be helpful in case you want to try it: http://computationalbiologynews.blogspot.com/2008/07/harnessing-power-of-multicore.html Dave From David.Messina at sbc.su.se Fri May 7 18:34:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 00:34:10 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> Hi John, You're right that passing parameters should work similarly for both RemoteBlast and StandAloneBlast, but without seeing exactly the parameter array you're passing, it's not possible to identify the problem. Could you perhaps post a small, but complete test program that demonstrates the problem? Dave PS ? is this the actual code you ran? > My $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_blast_params); > My $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); > While (my $result=$blast_report->next_result()){ > [etc, etc for iteration] > } I'm guessing you were paraphrasing, but I ask because My, with a capital "M", will generate an error, you're calling Tools::Run::StandAloneBlast instead of Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), i.e. it should be: my $Stand_alone_blast_machine = Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); From florent.angly at gmail.com Sat May 8 00:42:18 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 08 May 2010 14:42:18 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: References: <28491725.post@talk.nabble.com> Message-ID: <4BE4EBAA.5010709@gmail.com> Hi all, I am working on updating some of the Bio::Assembly::* modules right now. I need to sort a list of IDs. These IDs could be numbers, "words" or a mix of the two, for example: @arr = ('singlet1', 'contig10', 'contig2', '101', '3'); I cannot sort them with the numerical sort: sort { $a <=> $b } @array This would generates warnings because some of'singlet1' the IDs are numbers. I cannot sort them lexically: sort @array Lexical sorting would not take into account numbers properly and result in: singlet1 contig10 contig2 3 101 So, what I really need is natural sorting, which is not in any core function of Perl. I'd like to use the CPAN module Sort::Naturally for this purpose: nsort @arr The results would be what we expect, i.e.: 3 101 contig2 contig10 singlet1 Can I add this module as an additional dependency of BioPerl? I imagine that some other modules might want to use this. On the assembly side, it would be used by the writing methods of Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around my problem that doesn't require any external module? Florent From manchunjohn-ma at uiowa.edu Sat May 8 17:37:13 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 16:37:13 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> Hi, And that's my problem here: I checked the BLAST output, and the two sequences did get aligned-- just that SearchIO, in whatever flavour (I tried blast, blasttable and blastxml) didn't see to do to the next result when next_result() is called. It knows there're two results, but still getting the first result on the second call. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 4:33 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Hi John, Please remember to keep Cc'ing the mailing list so that everyone can participate in the discussion. If I understand your question correctly, yes, you can iterate through the blast results in a report called $blast_report using next_result. If you haven't already, you may want to look at the SearchIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SearchIO (although the BioPerl website appears to be temporarily offline, so check back a little later.) Dave On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > Hi, > > I have did some more investigation and found that the issue is > probably that of SearchIO rather than StandAloneBlast--in case I made > a mistake, so if I parsed a standard @array of Bio::Seq objects into > StandAloneBlast (blastn with SearchIO output), the result for each of > the seqs in the array can be assessed by $blast_report->next_result, > right? > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, May 07, 2010 5:34 PM > To: Ma, Man Chun John > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Array Handling Differences between > RemoteBlast and StandAloneBlast > > Hi John, > > You're right that passing parameters should work similarly for both > RemoteBlast and StandAloneBlast, but without seeing exactly the > parameter array you're passing, it's not possible to identify the > problem. > > Could you perhaps post a small, but complete test program that > demonstrates the problem? > > > Dave > > > PS - is this the actual code you ran? > >> My >> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bl >> a >> st_params); My >> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >> While (my $result=$blast_report->next_result()){ >> [etc, etc for iteration] >> } > > I'm guessing you were paraphrasing, but I ask because My, with a > capital "M", will generate an error, you're calling > Tools::Run::StandAloneBlast instead of > Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), i.e. it should be: > > my $Stand_alone_blast_machine = > Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); > > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5095 (20100507) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > From David.Messina at sbc.su.se Sat May 8 17:32:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:32:42 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> Hi John, Please remember to keep Cc'ing the mailing list so that everyone can participate in the discussion. If I understand your question correctly, yes, you can iterate through the blast results in a report called $blast_report using next_result. If you haven't already, you may want to look at the SearchIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SearchIO (although the BioPerl website appears to be temporarily offline, so check back a little later.) Dave On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > Hi, > > I have did some more investigation and found that the issue is probably > that of SearchIO rather than StandAloneBlast--in case I made a mistake, > so if I parsed a standard @array of Bio::Seq objects into > StandAloneBlast (blastn with SearchIO output), the result for each of > the seqs in the array can be assessed by $blast_report->next_result, > right? > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, May 07, 2010 5:34 PM > To: Ma, Man Chun John > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast > and StandAloneBlast > > Hi John, > > You're right that passing parameters should work similarly for both > RemoteBlast and StandAloneBlast, but without seeing exactly the > parameter array you're passing, it's not possible to identify the > problem. > > Could you perhaps post a small, but complete test program that > demonstrates the problem? > > > Dave > > > PS - is this the actual code you ran? > >> My >> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bla >> st_params); My >> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >> While (my $result=$blast_report->next_result()){ >> [etc, etc for iteration] >> } > > I'm guessing you were paraphrasing, but I ask because My, with a capital > "M", will generate an error, you're calling Tools::Run::StandAloneBlast > instead of Bio::Tools::Run::StandAloneBlast, and there's no method call > to new(), i.e. it should be: > > my $Stand_alone_blast_machine = > Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); > > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5095 (20100507) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > From cjfields at illinois.edu Sat May 8 15:41:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 14:41:58 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> Lincoln, Just an update, I've added you, as well as Dave and Florent. Still not sure about the bioperl.org address myself, but it seems to work for Dave and others. We posted to root-l and Chris D. to make sure that's correct or if we should be using open-bio.org instead, but I believe it is. chris On May 6, 2010, at 7:01 AM, Lincoln Stein wrote: > My github username is lstein and I've just added lstein at bioperl.org to my > linked email addresses. I hope I have a bioperl.org address; I never use it! > > Lincoln > > On Wed, May 5, 2010 at 10:46 AM, Chris Fields wrote: > >> All, >> >> I would like to finalize moving over to git/github very soon. We're sort >> of in limbo on this, so it needs to progress forward. We'll need to do some >> initial cleanup after the move (Heikki is already doing a few things on the >> test repo, which we'll need to diff over to the new one). >> >> So with that in mind, here are my thoughts. This is copied over to this >> wiki page, in case you don't want to reply here: >> >> http://www.bioperl.org/wiki/From_SVN_to_Git >> >> (thanks Mark!) >> >> 1) Timeline >> >> When? Sooner the better (weeks as opposed to months). Our anon. svn is >> down, likely permanently ( >> http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). >> >> 2) Migration strategy >> >> Now mainly worked out using svn2git, which is very fast. We would need to >> make the svn repo on dev read-only during this transition. My guess is it >> would take very little time. Do we want to retain the git-SVN metadata on >> commits? This is viewable with our current read-only mirror on github: >> >> >> http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca >> >> 3) Developers >> >> Not everyone has a github account. Recent ones who I couldn't find on >> github: dmessina, fangly >> >> The current authors file used for mapping commit authors to emails used >> their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I >> think, once one has signed up with github, you can add that same address to >> your current ones, and it should map to your github account. If we use >> dev.open-bio.org as our central git repo, we won't need to go through with >> that, but we will need a viewable version of dev available somehow (mirrored >> on github or otherwise). Speaking of... >> >> 4) Development strategy >> >> Are we sticking with a single centralized repo (SVN-like)? Will that be >> github, or will github be a downstream repo to our work on dev? We could >> feasibly have github be an active, forkable repo that could be >> bidirectionally synced with dev, but I'm not sure of the logistics on this >> (this popped up before with svn migration and was rejected b/c it was >> considered too difficult to maintain). >> >> Git makes it very easy to make branches and merge in code to trunk. With >> that in mind, I would highly suggest we start working on branches for almost >> everything and merge over to trunk. There is very little to no overhead in >> doing so with git. >> >> I like this strategy (Mark Jensen pointed this out): >> http://nvie.com/git-model >> >> Also, several points were raised in a related project (Parrot) considering >> a move to git/github from svn. One in particular was that git allows >> destructive commits. Jonathan Leto indicated we can set up specific >> branches that don't allow this, using commit hooks, so my guess is the >> master branch and release branches wouldn't allow rewinds. >> >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? >> >> http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod >> >> This is easy with github and forks. >> >> 6) SVN Read/Write to GitHub >> >> It was recently announced that one can access a github repo using >> subversion as read-only, and just yesterday experimental write to github is >> allowed: >> >> http://github.com/blog/644-subversion-write-support >> >> I can see allowing read-only svn, but write support is still experimental. >> Do we want to allow that? >> >> 7) Others? >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 8 15:23:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 14:23:35 -0500 Subject: [Bioperl-l] GitHub migration Wednesday Message-ID: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> Seems like we're all pretty much in agreement that this needs to happen sooner than later. So, I'm scheduling the git/github migration aggressively, for this Wednesday. Key steps: 1) Notify the list prior to locking the svn repo and/or making it read-only. 2) We need to set up post-commit hooks to forward commit messages on to bioperl-guts and elsewhere. I have tried this out off github and so far it's a little problematic (not working off bioperl-test, but working off my own github commits). 3) The current bioperl github repos will all be replaced with their live counterparts (branches and all), generated off the latest SVN via svn2git (including metadata). I'll have to reinstate collaborators at that time, but the author mapping should be the same as before (DEVACCOUNT at bioperl.org, where DEVACCOUNT is one's user name on dev.open-bio.org). 4) Update the wiki pages as needed to point to the github repo instead of the code.open-bio.org one. Also, I'm sure this will catch many devs not paying attention to the list by surprise, so we'll need a developer migration page set up. Anything else? chris From cjfields at illinois.edu Sat May 8 16:33:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 15:33:36 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <7EC12A62-249D-4816-9FDD-6D321095AA4B@illinois.edu> I don't have a problem with this personally, seeing how complex the code can get for natural sorting. It would become a recommended module, though, not a full dependency. chris On May 7, 2010, at 11:42 PM, Florent Angly wrote: > Hi all, > > I am working on updating some of the Bio::Assembly::* modules right now. > I need to sort a list of IDs. These IDs could be numbers, "words" or a mix of the two, for example: @arr = ('singlet1', 'contig10', 'contig2', '101', '3'); > > I cannot sort them with the numerical sort: sort { $a <=> $b } @array > This would generates warnings because some of'singlet1' the IDs are numbers. > > I cannot sort them lexically: sort @array > Lexical sorting would not take into account numbers properly and result in: > singlet1 contig10 contig2 3 101 > > So, what I really need is natural sorting, which is not in any core function of Perl. I'd like to use the CPAN module Sort::Naturally for this purpose: nsort @arr > The results would be what we expect, i.e.: > 3 101 contig2 contig10 singlet1 > > Can I add this module as an additional dependency of BioPerl? I imagine that some other modules might want to use this. On the assembly side, it would be used by the writing methods of Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around my problem that doesn't require any external module? > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Sat May 8 17:47:07 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:47:07 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> There was a report last week of a possible problem with BLAST parsing introduced in the last few days. I don't know what the status of that is, but it's possible that it's related. In any case, if you post your code and the blast report you're parsing, we might be able to diagnose the problem. Also, what version of BioPerl are you using? Dave On May 8, 2010, at 11:37 PM, Ma, Man Chun John wrote: > Hi, > > And that's my problem here: I checked the BLAST output, and the two > sequences did get aligned-- just that SearchIO, in whatever flavour (I > tried blast, blasttable and blastxml) didn't see to do to the next > result when next_result() is called. It knows there're two results, but > still getting the first result on the second call. > > Cheers, > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Saturday, May 08, 2010 4:33 PM > To: Ma, Man Chun John > Cc: BioPerl List > Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast > and StandAloneBlast > > Hi John, > > Please remember to keep Cc'ing the mailing list so that everyone can > participate in the discussion. > > If I understand your question correctly, yes, you can iterate through > the blast results in a report called $blast_report using next_result. > > If you haven't already, you may want to look at the SearchIO HOWTO: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > (although the BioPerl website appears to be temporarily offline, so > check back a little later.) > > > Dave > > > > On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > >> Hi, >> >> I have did some more investigation and found that the issue is >> probably that of SearchIO rather than StandAloneBlast--in case I made >> a mistake, so if I parsed a standard @array of Bio::Seq objects into >> StandAloneBlast (blastn with SearchIO output), the result for each of >> the seqs in the array can be assessed by $blast_report->next_result, >> right? >> >> >> John MC Ma >> Graduate Assistant >> Kwitek Lab >> Department of Internal Medicine >> 3125E MERF >> 375 Newton Road >> Iowa City IA 52242 >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, May 07, 2010 5:34 PM >> To: Ma, Man Chun John >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Array Handling Differences between >> RemoteBlast and StandAloneBlast >> >> Hi John, >> >> You're right that passing parameters should work similarly for both >> RemoteBlast and StandAloneBlast, but without seeing exactly the >> parameter array you're passing, it's not possible to identify the >> problem. >> >> Could you perhaps post a small, but complete test program that >> demonstrates the problem? >> >> >> Dave >> >> >> PS - is this the actual code you ran? >> >>> My >>> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bl >>> a >>> st_params); My >>> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >>> While (my $result=$blast_report->next_result()){ >>> [etc, etc for iteration] >>> } >> >> I'm guessing you were paraphrasing, but I ask because My, with a >> capital "M", will generate an error, you're calling >> Tools::Run::StandAloneBlast instead of >> Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), > i.e. it should be: >> >> my $Stand_alone_blast_machine = >> Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); >> >> >> >> __________ Information from ESET NOD32 Antivirus, version of virus >> signature database 5095 (20100507) __________ >> >> The message was checked by ESET NOD32 Antivirus. >> >> http://www.eset.com >> > From cjfields at illinois.edu Sat May 8 14:59:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 13:59:13 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> Message-ID: <73BDDA86-F487-484F-A87C-1DF37CDEA7D8@illinois.edu> On May 7, 2010, at 3:51 AM, Peter wrote: > On Thu, May 6, 2010 at 10:59 PM, Dave Messina wrote: >>> * I don't think we should allow any svn write support. Anybody that >>> truly cannot get over the hump can send patches to the list. >> >> Unless svn commits are somehow problematic, is there another reason to disallow it? > >> From my reading of the github blog post, svn merges are potentially problematic. > http://github.com/blog/644-subversion-write-support > > Peter Yes, they're still working out the kinks. I think we would only support read until the bugs get worked out of write. chris From David.Messina at sbc.su.se Sat May 8 17:33:53 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:33:53 +0200 Subject: [Bioperl-l] wiki offline? Message-ID: <064068F0-FF78-4557-9356-54CB1DB1783B@sbc.su.se> Hi, The BioPerl website appears to be down, at least from my spot on the net ? could someone please look into it? Thanks, Dave From David.Messina at sbc.su.se Sat May 8 16:07:02 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 22:07:02 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> Message-ID: <9A27A797-027E-445D-A8C3-6A7B6FBF4F13@sbc.su.se> Thanks, Chris. It took a few days for github to "notice" my @bioperl.org address and connect it to my commits. Since Lincoln added his @bioperl.org email to github a little later than I did, it may just be still trickling through the github pipes. Dave From florent.angly at gmail.com Sat May 8 07:34:15 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 08 May 2010 21:34:15 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <4BE54C37.7020304@gmail.com> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. It looks like the Bio::SeqIO modules tests could use it as well. Cheers, Florent From David.Messina at sbc.su.se Sat May 8 18:40:22 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 9 May 2010 00:40:22 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Message-ID: Hi John, Your blast report works fine for me with the following code taken from the Bio::SearchIO HOWTO: #!usr/bin/perl use strict; use warnings; use Bio::SearchIO; my $in = Bio::SearchIO->new('-file' => 'blastout', '-format' => 'blast'); while(my $result = $in->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; } } } ## Here is the output: Query=F Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Query=F Hit=ref|NC_005117.2|NC_005117 Length=18 Percent_id=100 Query=F Hit=ref|NC_005105.2|NC_005105 Length=18 Percent_id=100 Query=R Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Dave From manchunjohn-ma at uiowa.edu Sat May 8 18:43:11 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 17:43:11 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> Hi Dave, Yes, I tried to write a separate script to parse all those files, and they came out fine. It just happens when I run the entire target script; and if I replace the StandAloneBlast part with the standard RemoteBlast code, it's file, too. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 5:40 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Hi John, Your blast report works fine for me with the following code taken from the Bio::SearchIO HOWTO: #!usr/bin/perl use strict; use warnings; use Bio::SearchIO; my $in = Bio::SearchIO->new('-file' => 'blastout', '-format' => 'blast'); while(my $result = $in->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; } } } ## Here is the output: Query=F Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Query=F Hit=ref|NC_005117.2|NC_005117 Length=18 Percent_id=100 Query=F Hit=ref|NC_005105.2|NC_005105 Length=18 Percent_id=100 Query=R Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Dave From David.Messina at sbc.su.se Sat May 8 18:58:41 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 9 May 2010 00:58:41 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <41281436-08D3-46F9-BDD0-A8D5306DB412@sbc.su.se> I cannot help you without seeing the code. It sounds like you've already tested the parsing part in a script by itself and that works. If you haven't already, you can test the running Blast part in its own script and see if that works. If both parts work separately, then there's something wrong with the way they have been put together. Dave From jason at bioperl.org Sat May 8 12:06:28 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 08 May 2010 09:06:28 -0700 Subject: [Bioperl-l] Bio::Align - alignment by position? In-Reply-To: <28478022.post@talk.nabble.com> References: <28478022.post@talk.nabble.com> Message-ID: <4BE58C04.8090901@bioperl.org> Not clear what you want to make. You want a new alignment that only contains the columns in your list or You want to extract each column in your list one by one? visitor555 wrote, On 5/6/10 11:51 AM: > Hi, > > I have a list alignment positions and I want to get each column them from > the alignment. If I slice the alignment the sequence with gaps in these > positions disappear. I can rotate on each seq and then split the sequence. > Is there better way to go over the alignment position by position? > > thanks ! > From jason at bioperl.org Sat May 8 12:12:26 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 08 May 2010 09:12:26 -0700 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <4BE58D6A.9080601@bioperl.org> Unless necessary I don't know if adding yet another dependency is warranted here. I don't know how complicated the words will be but can't you just strip out the numbers and do this in a schwartzian transformation? #!/usr/bin/perl -w use strict; my @arr = qw(single1 contig10 101 contig2 3); my @sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ /(\d+)/, $_] } @arr; print join("\n", at sorted),"\n"; But I'm not sure how do you want to sort 10 vs contig10 vs singlet10 reliably? -jason Florent Angly wrote, On 5/7/10 9:42 PM: > Hi all, > > I am working on updating some of the Bio::Assembly::* modules right now. > I need to sort a list of IDs. These IDs could be numbers, "words" or a > mix of the two, for example: @arr = ('singlet1', 'contig10', > 'contig2', '101', '3'); > > I cannot sort them with the numerical sort: sort { $a <=> $b } @array > This would generates warnings because some of'singlet1' the IDs are > numbers. > > I cannot sort them lexically: sort @array > Lexical sorting would not take into account numbers properly and > result in: > singlet1 contig10 contig2 3 101 > > So, what I really need is natural sorting, which is not in any core > function of Perl. I'd like to use the CPAN module Sort::Naturally for > this purpose: nsort @arr > The results would be what we expect, i.e.: > 3 101 contig2 contig10 singlet1 > > Can I add this module as an additional dependency of BioPerl? I > imagine that some other modules might want to use this. On the > assembly side, it would be used by the writing methods of > Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around > my problem that doesn't require any external module? > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 8 19:47:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 18:47:58 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE54C37.7020304@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> Message-ID: To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. chris On May 8, 2010, at 6:34 AM, Florent Angly wrote: > Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. > > It looks like the Bio::SeqIO modules tests could use it as well. > > Cheers, > > Florent > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 8 20:02:28 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 19:02:28 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> Message-ID: <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. chris On May 8, 2010, at 6:47 PM, Chris Fields wrote: > To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. > > chris > > On May 8, 2010, at 6:34 AM, Florent Angly wrote: > >> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >> >> It looks like the Bio::SeqIO modules tests could use it as well. >> >> Cheers, >> >> Florent >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sat May 8 19:30:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 8 May 2010 19:30:48 -0400 Subject: [Bioperl-l] GitHub migration Wednesday In-Reply-To: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> References: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> Message-ID: <9B5043D308B942AEB4F9AA199470812B@NewLife> Sail on, great Ship of State. ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Saturday, May 08, 2010 3:23 PM Subject: [Bioperl-l] GitHub migration Wednesday > Seems like we're all pretty much in agreement that this needs to happen sooner > than later. So, I'm scheduling the git/github migration aggressively, for > this Wednesday. Key steps: > > 1) Notify the list prior to locking the svn repo and/or making it read-only. > > 2) We need to set up post-commit hooks to forward commit messages on to > bioperl-guts and elsewhere. I have tried this out off github and so far it's > a little problematic (not working off bioperl-test, but working off my own > github commits). > > 3) The current bioperl github repos will all be replaced with their live > counterparts (branches and all), generated off the latest SVN via svn2git > (including metadata). I'll have to reinstate collaborators at that time, but > the author mapping should be the same as before (DEVACCOUNT at bioperl.org, where > DEVACCOUNT is one's user name on dev.open-bio.org). > > 4) Update the wiki pages as needed to point to the github repo instead of the > code.open-bio.org one. Also, I'm sure this will catch many devs not paying > attention to the list by surprise, so we'll need a developer migration page > set up. > > Anything else? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From manchunjohn-ma at uiowa.edu Sat May 8 17:59:08 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 16:59:08 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Hi, I use bioperl-live 16950 with blast 2.2.23 I haven't been able to put together a simplier script with problem at this time, so I'd put the BLASTn outputs (in blast, blasttable and blastxml formats) here-- they look perfectly normal except that look like 2 separate output files appended together. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 4:47 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast There was a report last week of a possible problem with BLAST parsing introduced in the last few days. I don't know what the status of that is, but it's possible that it's related. In any case, if you post your code and the blast report you're parsing, we might be able to diagnose the problem. Also, what version of BioPerl are you using? Dave On May 8, 2010, at 11:37 PM, Ma, Man Chun John wrote: > Hi, > > And that's my problem here: I checked the BLAST output, and the two > sequences did get aligned-- just that SearchIO, in whatever flavour (I > tried blast, blasttable and blastxml) didn't see to do to the next > result when next_result() is called. It knows there're two results, > but still getting the first result on the second call. > > Cheers, > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Saturday, May 08, 2010 4:33 PM > To: Ma, Man Chun John > Cc: BioPerl List > Subject: Re: [Bioperl-l] Array Handling Differences between > RemoteBlast and StandAloneBlast > > Hi John, > > Please remember to keep Cc'ing the mailing list so that everyone can > participate in the discussion. > > If I understand your question correctly, yes, you can iterate through > the blast results in a report called $blast_report using next_result. > > If you haven't already, you may want to look at the SearchIO HOWTO: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > (although the BioPerl website appears to be temporarily offline, so > check back a little later.) > > > Dave > > > > On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > >> Hi, >> >> I have did some more investigation and found that the issue is >> probably that of SearchIO rather than StandAloneBlast--in case I made >> a mistake, so if I parsed a standard @array of Bio::Seq objects into >> StandAloneBlast (blastn with SearchIO output), the result for each of >> the seqs in the array can be assessed by $blast_report->next_result, >> right? >> >> >> John MC Ma >> Graduate Assistant >> Kwitek Lab >> Department of Internal Medicine >> 3125E MERF >> 375 Newton Road >> Iowa City IA 52242 >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, May 07, 2010 5:34 PM >> To: Ma, Man Chun John >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Array Handling Differences between >> RemoteBlast and StandAloneBlast >> >> Hi John, >> >> You're right that passing parameters should work similarly for both >> RemoteBlast and StandAloneBlast, but without seeing exactly the >> parameter array you're passing, it's not possible to identify the >> problem. >> >> Could you perhaps post a small, but complete test program that >> demonstrates the problem? >> >> >> Dave >> >> >> PS - is this the actual code you ran? >> >>> My >>> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_b >>> l >>> a >>> st_params); My >>> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >>> While (my $result=$blast_report->next_result()){ >>> [etc, etc for iteration] >>> } >> >> I'm guessing you were paraphrasing, but I ask because My, with a >> capital "M", will generate an error, you're calling >> Tools::Run::StandAloneBlast instead of >> Bio::Tools::Run::StandAloneBlast, and there's no method call to >> new(), > i.e. it should be: >> >> my $Stand_alone_blast_machine = >> Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); >> >> >> >> __________ Information from ESET NOD32 Antivirus, version of virus >> signature database 5095 (20100507) __________ >> >> The message was checked by ESET NOD32 Antivirus. >> >> http://www.eset.com >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: blasttable Type: application/octet-stream Size: 842 bytes Desc: blasttable URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blast.xml Type: text/xml Size: 7598 bytes Desc: blast.xml URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastout Type: application/octet-stream Size: 3576 bytes Desc: blastout URL: From florent.angly at gmail.com Sun May 9 01:12:03 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sun, 09 May 2010 15:12:03 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE58D6A.9080601@bioperl.org> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE58D6A.9080601@bioperl.org> Message-ID: <4BE64423.1040104@gmail.com> Within one assembly file, contig IDs typically tend to follow one formatting convention. The two most popular ones are a numerical ID, or an alphanumeric ID, such as 'contig13'. The later case already requires natural sorting. There is no way to know in advance what format to expect, and in fact, the format being specified by the user, it could be arbitrarily complicated, although probably, IDs would be sorted naturally. I will follow Chris's recommendation of using Sort::Naturally as a recommended package. The users who don't have this dependency will have their IDs sorted in a safe way, lexically. Florent On 09/05/10 02:12, Jason Stajich wrote: > Unless necessary I don't know if adding yet another dependency is > warranted here. > > I don't know how complicated the words will be but can't you just > strip out the numbers and do this in a schwartzian transformation? > > #!/usr/bin/perl -w > use strict; > my @arr = qw(single1 contig10 101 contig2 3); > my @sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ > /(\d+)/, $_] } @arr; > print join("\n", at sorted),"\n"; > > But I'm not sure how do you want to sort > 10 vs contig10 vs singlet10 reliably? > > -jason > > Florent Angly wrote, On 5/7/10 9:42 PM: >> Hi all, >> >> I am working on updating some of the Bio::Assembly::* modules right now. >> I need to sort a list of IDs. These IDs could be numbers, "words" or >> a mix of the two, for example: @arr = ('singlet1', >> 'contig10', 'contig2', '101', '3'); >> >> I cannot sort them with the numerical sort: sort { $a <=> $b } @array >> This would generates warnings because some of'singlet1' the IDs are >> numbers. >> >> I cannot sort them lexically: sort @array >> Lexical sorting would not take into account numbers properly and >> result in: >> singlet1 contig10 contig2 3 101 >> >> So, what I really need is natural sorting, which is not in any core >> function of Perl. I'd like to use the CPAN module Sort::Naturally for >> this purpose: nsort @arr >> The results would be what we expect, i.e.: >> 3 101 contig2 contig10 singlet1 >> >> Can I add this module as an additional dependency of BioPerl? I >> imagine that some other modules might want to use this. On the >> assembly side, it would be used by the writing methods of >> Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around >> my problem that doesn't require any external module? >> >> Florent >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Sun May 9 03:26:19 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sun, 09 May 2010 17:26:19 +1000 Subject: [Bioperl-l] Read/write round-tripping Was: Re: New Bioperl dependency? Sort::Naturally In-Reply-To: <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> Message-ID: <4BE6639B.6060004@gmail.com> Chris, I've thought some more on the problem and I now agree with you that round-tripping at the object-level is more powerful. It has the problem that some objects are given IDs dynamically every time, which means that identical input files won't have an identical object. > is_deeply( $obj_out , $obj_in , 'deep compare' ); > not ok 1 - deep compare > # Failed test 'deep compare' > # at ./test_roundtrip.pl line 33. > # Structures begin differing at: > # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '56438592' > # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '54980512' > 1..1 > # Looks like you failed 1 test of 1. And when I re-run this again: > not ok 1 - deep compare > # Failed test 'deep compare' > # at ./test_roundtrip.pl line 33. > # Structures begin differing at: > # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '47763264' > # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '46305184' > 1..1 > # Looks like you failed 1 test of 1. Note how the value of _btree changes everytime. Maybe using Test::Deep would be a good approach (http://search.cpan.org/~fdaly/Test-Deep-0.106/lib/Test/Deep.pod): > Where it becomes more interesting is in allowing you to do something > besides simple exact comparisons. With strings, the |eq| operator > checks that 2 strings are exactly equal but sometimes that's not what > you want. When you don't know exactly what the string should be but > you do know some things about how it should look, |eq| is no good and > you must use pattern matching instead. Test::Deep provides pattern > matching for complex data structures Florent On 09/05/10 10:02, Chris Fields wrote: > Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. > > chris > > On May 8, 2010, at 6:47 PM, Chris Fields wrote: > > >> To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. >> >> chris >> >> On May 8, 2010, at 6:34 AM, Florent Angly wrote: >> >> >>> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >>> >>> It looks like the Bio::SeqIO modules tests could use it as well. >>> >>> Cheers, >>> >>> Florent >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From ibi2008006 at iiita.ac.in Sun May 9 10:46:28 2010 From: ibi2008006 at iiita.ac.in (roserp) Date: Sun, 9 May 2010 07:46:28 -0700 (PDT) Subject: [Bioperl-l] where to find standard substitution matrices Message-ID: <28503204.post@talk.nabble.com> hi , I want blosum62, blosum80 , pam30, and pam70 matrices. I am getting different values in different sites for these matrices. can anyone suggest some authenticated site for getting these ?? thanks in advance -- View this message in context: http://old.nabble.com/where-to-find-standard-substitution-matrices-tp28503204p28503204.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From razi.khaja at gmail.com Sun May 9 15:23:47 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 15:23:47 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: Attached (blast.pm.diff) is a patch that fixes Heikki's problem. Can someone advise an appropriate way to have this patch applied, given that it is an amendment to a previous patch? Thanks Razi ---------- Forwarded message ---------- From: Heikki Lehvaslaiho Date: Wed, May 5, 2010 at 2:11 AM Subject: Re: [Bioperl-l] BLAST parsing broken To: Razi Khaja Hi Raja, Thanks for trying to fix this. I am attaching an example output file to this message. I just tested again that master from git repository fails to get query ID, but the previous version works. bala ~/src/bioperl-live> git checkout master Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp output Switched to branch 'master' When I started using the latest mpiBLAST code a few months ago I did compare the 0 output from it to standard NCBI blast and they were identical. Also, I've noticed a discrepancy between within bioperl blast parsing that I have not had time to work on. Would you be interested in having a look? I am creating output from mpiBLAST in 0 format and then converting it into tab-delimited 8 format. I am unable to get 100% similarity for all cases when I compare the conversion to the output straight from mpiBLAST in format 8. Sometimes the mismatch and gap values are off by one. I am attaching a script that does the conversion. It is the same one I was using when I noticed the problem above. I was going to put the code into bioperl but that got delayed when I noticed the discrepancies. Cheers, -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 4 May 2010 20:55, Razi Khaja wrote: > That is odd. Heikki, do you have a blast output file that produces this > error? > Could you attach the file and either send to the list or myself (if the > list > does not accept attachments). > Thanks, > Razi > > > On Mon, May 3, 2010 at 8:08 AM, Chris Fields > wrote: > > > Odd, I ran tests on that prior to commit. I'll work on fixing that (in > svn > > of course, until the migration is complete). > > > > chris > > > > On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > > > > > Chris, > > > > > > latest additions to Bio::SearchIO::blast.pm broke the parsing of > normal > > > blast output. $result->query_name returns now undef. > > > > > > (Using the anonymous git now). This change still works: > > > > > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > > Author: cjfields > > > Date: Sun Dec 20 04:39:58 2009 +0000 > > > > > > Robson's patch for buggy blastpgp output > > > > > > But this does not: > > > > > > commit 9a89c3434597104dd50553e3562983d78d14a544 > > > Author: cjfields > > > Date: Thu Apr 15 04:21:17 2010 +0000 > > > > > > [bug 3031] > > > > > > patches for catching algorithm ref, courtesy Razi Khaja. > > > > > > That makes it easy to find the diffs: > > > > > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > > > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > > > index 378023a..6f7eeeb 100644 > > > --- a/Bio/SearchIO/blast.pm > > > +++ b/Bio/SearchIO/blast.pm > > > @@ -209,6 +209,7 @@ BEGIN { > > > > > > 'BlastOutput_program' => 'RESULT-algorithm_name', > > > 'BlastOutput_version' => > 'RESULT-algorithm_version', > > > + 'BlastOutput_algorithm-reference' => > > 'RESULT-algorithm_reference', > > > 'BlastOutput_query-def' => 'RESULT-query_name', > > > 'BlastOutput_query-len' => 'RESULT-query_length', > > > 'BlastOutput_query-acc' => 'RESULT-query_accession', > > > @@ -504,6 +505,26 @@ sub next_result { > > > } > > > ); > > > } > > > + # parse the BLAST algorithm reference > > > + elsif(/^Reference:\s+(.*)$/) { > > > + # want to preserve newlines for the BLAST algorithm > > reference > > > + my $algorithm_reference = "$1\n"; > > > + $_ = $self->_readline; > > > + # while the current line, does not match an empty line, a > > RID:, > > > or a Database:, we are still looking at the > > > + # algorithm_reference, append it to what we parsed so far > > > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > > > + $algorithm_reference .= "$_"; > > > + $_ = $self->_readline; > > > + } > > > + # if we exited the while loop, we saw an empty line, a > RID:, > > or > > > a Database:, so push it back > > > + $self->_pushback($_); > > > + $self->element( > > > + { > > > + 'Name' => 'BlastOutput_algorithm-reference', > > > + 'Data' => $algorithm_reference > > > + } > > > + ); > > > + } > > > # added Windows workaround for bug 1985 > > > elsif (/^(Searching|Results from round)/) { > > > next unless $1 =~ /Results from round/; > > > > > > > > > I am not sure why reference parsing messes things up. Maybe it eats too > > many > > > lines from the result file. > > > > > > Yours, > > > > > > -Heikki > > > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > > #4216 > > > 4700 King Abdullah University of Science and Technology (KAUST) > > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: mpiblast.out Type: application/octet-stream Size: 34844 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastparser028.pl Type: application/x-perl Size: 2024 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blast.pm.diff Type: text/x-patch Size: 994 bytes Desc: not available URL: From cjfields at illinois.edu Sun May 9 16:43:29 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 15:43:29 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> If the patch is against main trunk it isn't a problem, otherwise the diff should be vs. that code. chris On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > Can someone advise an appropriate way to have this patch applied, given that > it is an amendment to a previous patch? > Thanks > Razi > > > ---------- Forwarded message ---------- > From: Heikki Lehvaslaiho > Date: Wed, May 5, 2010 at 2:11 AM > Subject: Re: [Bioperl-l] BLAST parsing broken > To: Razi Khaja > > > Hi Raja, > > Thanks for trying to fix this. > > I am attaching an example output file to this message. I just tested again > that master from git repository fails to get query ID, but the previous > version works. > > bala ~/src/bioperl-live> git checkout master > Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > output > Switched to branch 'master' > > When I started using the latest mpiBLAST code a few months ago I did compare > the 0 output from it to standard NCBI blast and they were identical. > > > > > Also, I've noticed a discrepancy between within bioperl blast parsing that > I have not had time to work on. Would you be interested in having a look? > > I am creating output from mpiBLAST in 0 format and then converting it into > tab-delimited 8 format. I am unable to get 100% similarity for all cases > when I compare the conversion to the output straight from mpiBLAST in format > 8. Sometimes the mismatch and gap values are off by one. > > I am attaching a script that does the conversion. It is the same one I was > using when I noticed the problem above. I was going to put the code into > bioperl but that got delayed when I noticed the discrepancies. > > > Cheers, > > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > On 4 May 2010 20:55, Razi Khaja wrote: > >> That is odd. Heikki, do you have a blast output file that produces this >> error? >> Could you attach the file and either send to the list or myself (if the >> list >> does not accept attachments). >> Thanks, >> Razi >> >> >> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >> wrote: >> >>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in >> svn >>> of course, until the migration is complete). >>> >>> chris >>> >>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>> >>>> Chris, >>>> >>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >> normal >>>> blast output. $result->query_name returns now undef. >>>> >>>> (Using the anonymous git now). This change still works: >>>> >>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>> Author: cjfields >>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>> >>>> Robson's patch for buggy blastpgp output >>>> >>>> But this does not: >>>> >>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>> Author: cjfields >>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>> >>>> [bug 3031] >>>> >>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>> >>>> That makes it easy to find the diffs: >>>> >>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>> index 378023a..6f7eeeb 100644 >>>> --- a/Bio/SearchIO/blast.pm >>>> +++ b/Bio/SearchIO/blast.pm >>>> @@ -209,6 +209,7 @@ BEGIN { >>>> >>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>> 'BlastOutput_version' => >> 'RESULT-algorithm_version', >>>> + 'BlastOutput_algorithm-reference' => >>> 'RESULT-algorithm_reference', >>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>> @@ -504,6 +505,26 @@ sub next_result { >>>> } >>>> ); >>>> } >>>> + # parse the BLAST algorithm reference >>>> + elsif(/^Reference:\s+(.*)$/) { >>>> + # want to preserve newlines for the BLAST algorithm >>> reference >>>> + my $algorithm_reference = "$1\n"; >>>> + $_ = $self->_readline; >>>> + # while the current line, does not match an empty line, a >>> RID:, >>>> or a Database:, we are still looking at the >>>> + # algorithm_reference, append it to what we parsed so far >>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { >>>> + $algorithm_reference .= "$_"; >>>> + $_ = $self->_readline; >>>> + } >>>> + # if we exited the while loop, we saw an empty line, a >> RID:, >>> or >>>> a Database:, so push it back >>>> + $self->_pushback($_); >>>> + $self->element( >>>> + { >>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>> + 'Data' => $algorithm_reference >>>> + } >>>> + ); >>>> + } >>>> # added Windows workaround for bug 1985 >>>> elsif (/^(Searching|Results from round)/) { >>>> next unless $1 =~ /Results from round/; >>>> >>>> >>>> I am not sure why reference parsing messes things up. Maybe it eats too >>> many >>>> lines from the result file. >>>> >>>> Yours, >>>> >>>> -Heikki >>>> >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>> >>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>> #4216 >>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Sun May 9 17:15:38 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 17:15:38 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: Hi Chris, The patch is against the main trunk. I checked out version 11326 of the repository today. Razi On Sun, May 9, 2010 at 4:43 PM, Chris Fields wrote: > If the patch is against main trunk it isn't a problem, otherwise the diff > should be vs. that code. > > chris > > On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > > > Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > > Can someone advise an appropriate way to have this patch applied, given > that > > it is an amendment to a previous patch? > > Thanks > > Razi > > > > > > ---------- Forwarded message ---------- > > From: Heikki Lehvaslaiho > > Date: Wed, May 5, 2010 at 2:11 AM > > Subject: Re: [Bioperl-l] BLAST parsing broken > > To: Razi Khaja > > > > > > Hi Raja, > > > > Thanks for trying to fix this. > > > > I am attaching an example output file to this message. I just tested > again > > that master from git repository fails to get query ID, but the previous > > version works. > > > > bala ~/src/bioperl-live> git checkout master > > Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > > output > > Switched to branch 'master' > > > > When I started using the latest mpiBLAST code a few months ago I did > compare > > the 0 output from it to standard NCBI blast and they were identical. > > > > > > > > > > Also, I've noticed a discrepancy between within bioperl blast parsing > that > > I have not had time to work on. Would you be interested in having a look? > > > > I am creating output from mpiBLAST in 0 format and then converting it > into > > tab-delimited 8 format. I am unable to get 100% similarity for all cases > > when I compare the conversion to the output straight from mpiBLAST in > format > > 8. Sometimes the mismatch and gap values are off by one. > > > > I am attaching a script that does the conversion. It is the same one I > was > > using when I noticed the problem above. I was going to put the code into > > bioperl but that got delayed when I noticed the discrepancies. > > > > > > Cheers, > > > > > > -Heikki > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > #4216 > > 4700 King Abdullah University of Science and Technology (KAUST) > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > > > > > On 4 May 2010 20:55, Razi Khaja wrote: > > > >> That is odd. Heikki, do you have a blast output file that produces this > >> error? > >> Could you attach the file and either send to the list or myself (if the > >> list > >> does not accept attachments). > >> Thanks, > >> Razi > >> > >> > >> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > >> wrote: > >> > >>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in > >> svn > >>> of course, until the migration is complete). > >>> > >>> chris > >>> > >>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>> > >>>> Chris, > >>>> > >>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >> normal > >>>> blast output. $result->query_name returns now undef. > >>>> > >>>> (Using the anonymous git now). This change still works: > >>>> > >>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>> Author: cjfields > >>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>> > >>>> Robson's patch for buggy blastpgp output > >>>> > >>>> But this does not: > >>>> > >>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>> Author: cjfields > >>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>> > >>>> [bug 3031] > >>>> > >>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>> > >>>> That makes it easy to find the diffs: > >>>> > >>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>> index 378023a..6f7eeeb 100644 > >>>> --- a/Bio/SearchIO/blast.pm > >>>> +++ b/Bio/SearchIO/blast.pm > >>>> @@ -209,6 +209,7 @@ BEGIN { > >>>> > >>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>> 'BlastOutput_version' => > >> 'RESULT-algorithm_version', > >>>> + 'BlastOutput_algorithm-reference' => > >>> 'RESULT-algorithm_reference', > >>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>> @@ -504,6 +505,26 @@ sub next_result { > >>>> } > >>>> ); > >>>> } > >>>> + # parse the BLAST algorithm reference > >>>> + elsif(/^Reference:\s+(.*)$/) { > >>>> + # want to preserve newlines for the BLAST algorithm > >>> reference > >>>> + my $algorithm_reference = "$1\n"; > >>>> + $_ = $self->_readline; > >>>> + # while the current line, does not match an empty line, a > >>> RID:, > >>>> or a Database:, we are still looking at the > >>>> + # algorithm_reference, append it to what we parsed so far > >>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) > { > >>>> + $algorithm_reference .= "$_"; > >>>> + $_ = $self->_readline; > >>>> + } > >>>> + # if we exited the while loop, we saw an empty line, a > >> RID:, > >>> or > >>>> a Database:, so push it back > >>>> + $self->_pushback($_); > >>>> + $self->element( > >>>> + { > >>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>> + 'Data' => $algorithm_reference > >>>> + } > >>>> + ); > >>>> + } > >>>> # added Windows workaround for bug 1985 > >>>> elsif (/^(Searching|Results from round)/) { > >>>> next unless $1 =~ /Results from round/; > >>>> > >>>> > >>>> I am not sure why reference parsing messes things up. Maybe it eats > too > >>> many > >>>> lines from the result file. > >>>> > >>>> Yours, > >>>> > >>>> -Heikki > >>>> > >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>> > >>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>> #4216 > >>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >_______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun May 9 17:30:52 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 16:30:52 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: Then something is wrong, as current trunk is at r16969. Where are you pulling your code from? Our only working anon. server is the sync'ed github one. chris On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > Hi Chris, > The patch is against the main trunk. I checked out version 11326 of the > repository today. > Razi > > > On Sun, May 9, 2010 at 4:43 PM, Chris Fields wrote: > >> If the patch is against main trunk it isn't a problem, otherwise the diff >> should be vs. that code. >> >> chris >> >> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: >> >>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. >>> Can someone advise an appropriate way to have this patch applied, given >> that >>> it is an amendment to a previous patch? >>> Thanks >>> Razi >>> >>> >>> ---------- Forwarded message ---------- >>> From: Heikki Lehvaslaiho >>> Date: Wed, May 5, 2010 at 2:11 AM >>> Subject: Re: [Bioperl-l] BLAST parsing broken >>> To: Razi Khaja >>> >>> >>> Hi Raja, >>> >>> Thanks for trying to fix this. >>> >>> I am attaching an example output file to this message. I just tested >> again >>> that master from git repository fails to get query ID, but the previous >>> version works. >>> >>> bala ~/src/bioperl-live> git checkout master >>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp >>> output >>> Switched to branch 'master' >>> >>> When I started using the latest mpiBLAST code a few months ago I did >> compare >>> the 0 output from it to standard NCBI blast and they were identical. >>> >>> >>> >>> >>> Also, I've noticed a discrepancy between within bioperl blast parsing >> that >>> I have not had time to work on. Would you be interested in having a look? >>> >>> I am creating output from mpiBLAST in 0 format and then converting it >> into >>> tab-delimited 8 format. I am unable to get 100% similarity for all cases >>> when I compare the conversion to the output straight from mpiBLAST in >> format >>> 8. Sometimes the mismatch and gap values are off by one. >>> >>> I am attaching a script that does the conversion. It is the same one I >> was >>> using when I noticed the problem above. I was going to put the code into >>> bioperl but that got delayed when I noticed the discrepancies. >>> >>> >>> Cheers, >>> >>> >>> -Heikki >>> >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +966 545 595 849 office: +966 2 808 2429 >>> >>> Computational Bioscience Research Centre (CBRC), Building #2, Office >> #4216 >>> 4700 King Abdullah University of Science and Technology (KAUST) >>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>> >>> >>> >>> On 4 May 2010 20:55, Razi Khaja wrote: >>> >>>> That is odd. Heikki, do you have a blast output file that produces this >>>> error? >>>> Could you attach the file and either send to the list or myself (if the >>>> list >>>> does not accept attachments). >>>> Thanks, >>>> Razi >>>> >>>> >>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >>>> wrote: >>>> >>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in >>>> svn >>>>> of course, until the migration is complete). >>>>> >>>>> chris >>>>> >>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>>>> >>>>>> Chris, >>>>>> >>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >>>> normal >>>>>> blast output. $result->query_name returns now undef. >>>>>> >>>>>> (Using the anonymous git now). This change still works: >>>>>> >>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>> Author: cjfields >>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>>>> >>>>>> Robson's patch for buggy blastpgp output >>>>>> >>>>>> But this does not: >>>>>> >>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>>>> Author: cjfields >>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>>>> >>>>>> [bug 3031] >>>>>> >>>>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>>>> >>>>>> That makes it easy to find the diffs: >>>>>> >>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>>>> index 378023a..6f7eeeb 100644 >>>>>> --- a/Bio/SearchIO/blast.pm >>>>>> +++ b/Bio/SearchIO/blast.pm >>>>>> @@ -209,6 +209,7 @@ BEGIN { >>>>>> >>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>>>> 'BlastOutput_version' => >>>> 'RESULT-algorithm_version', >>>>>> + 'BlastOutput_algorithm-reference' => >>>>> 'RESULT-algorithm_reference', >>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>>>> @@ -504,6 +505,26 @@ sub next_result { >>>>>> } >>>>>> ); >>>>>> } >>>>>> + # parse the BLAST algorithm reference >>>>>> + elsif(/^Reference:\s+(.*)$/) { >>>>>> + # want to preserve newlines for the BLAST algorithm >>>>> reference >>>>>> + my $algorithm_reference = "$1\n"; >>>>>> + $_ = $self->_readline; >>>>>> + # while the current line, does not match an empty line, a >>>>> RID:, >>>>>> or a Database:, we are still looking at the >>>>>> + # algorithm_reference, append it to what we parsed so far >>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) >> { >>>>>> + $algorithm_reference .= "$_"; >>>>>> + $_ = $self->_readline; >>>>>> + } >>>>>> + # if we exited the while loop, we saw an empty line, a >>>> RID:, >>>>> or >>>>>> a Database:, so push it back >>>>>> + $self->_pushback($_); >>>>>> + $self->element( >>>>>> + { >>>>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>>>> + 'Data' => $algorithm_reference >>>>>> + } >>>>>> + ); >>>>>> + } >>>>>> # added Windows workaround for bug 1985 >>>>>> elsif (/^(Searching|Results from round)/) { >>>>>> next unless $1 =~ /Results from round/; >>>>>> >>>>>> >>>>>> I am not sure why reference parsing messes things up. Maybe it eats >> too >>>>> many >>>>>> lines from the result file. >>>>>> >>>>>> Yours, >>>>>> >>>>>> -Heikki >>>>>> >>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>>> >>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>>> #4216 >>>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Sun May 9 19:48:28 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 19:48:28 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: I checked out bioperl-live from github: svn checkout http://svn.github.com/bioperl/bioperl-live.git I just checked it out again, a few seconds ago and by default I got revision 11326. Razi On Sun, May 9, 2010 at 5:30 PM, Chris Fields wrote: > Then something is wrong, as current trunk is at r16969. Where are you > pulling your code from? Our only working anon. server is the sync'ed github > one. > > chris > > On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > > > Hi Chris, > > The patch is against the main trunk. I checked out version 11326 of the > > repository today. > > Razi > > > > > > On Sun, May 9, 2010 at 4:43 PM, Chris Fields > wrote: > > > >> If the patch is against main trunk it isn't a problem, otherwise the > diff > >> should be vs. that code. > >> > >> chris > >> > >> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > >> > >>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > >>> Can someone advise an appropriate way to have this patch applied, given > >> that > >>> it is an amendment to a previous patch? > >>> Thanks > >>> Razi > >>> > >>> > >>> ---------- Forwarded message ---------- > >>> From: Heikki Lehvaslaiho > >>> Date: Wed, May 5, 2010 at 2:11 AM > >>> Subject: Re: [Bioperl-l] BLAST parsing broken > >>> To: Razi Khaja > >>> > >>> > >>> Hi Raja, > >>> > >>> Thanks for trying to fix this. > >>> > >>> I am attaching an example output file to this message. I just tested > >> again > >>> that master from git repository fails to get query ID, but the previous > >>> version works. > >>> > >>> bala ~/src/bioperl-live> git checkout master > >>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > >>> output > >>> Switched to branch 'master' > >>> > >>> When I started using the latest mpiBLAST code a few months ago I did > >> compare > >>> the 0 output from it to standard NCBI blast and they were identical. > >>> > >>> > >>> > >>> > >>> Also, I've noticed a discrepancy between within bioperl blast parsing > >> that > >>> I have not had time to work on. Would you be interested in having a > look? > >>> > >>> I am creating output from mpiBLAST in 0 format and then converting it > >> into > >>> tab-delimited 8 format. I am unable to get 100% similarity for all > cases > >>> when I compare the conversion to the output straight from mpiBLAST in > >> format > >>> 8. Sometimes the mismatch and gap values are off by one. > >>> > >>> I am attaching a script that does the conversion. It is the same one I > >> was > >>> using when I noticed the problem above. I was going to put the code > into > >>> bioperl but that got delayed when I noticed the discrepancies. > >>> > >>> > >>> Cheers, > >>> > >>> > >>> -Heikki > >>> > >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>> cell: +966 545 595 849 office: +966 2 808 2429 > >>> > >>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >> #4216 > >>> 4700 King Abdullah University of Science and Technology (KAUST) > >>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>> > >>> > >>> > >>> On 4 May 2010 20:55, Razi Khaja wrote: > >>> > >>>> That is odd. Heikki, do you have a blast output file that produces > this > >>>> error? > >>>> Could you attach the file and either send to the list or myself (if > the > >>>> list > >>>> does not accept attachments). > >>>> Thanks, > >>>> Razi > >>>> > >>>> > >>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > >>>> wrote: > >>>> > >>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that > (in > >>>> svn > >>>>> of course, until the migration is complete). > >>>>> > >>>>> chris > >>>>> > >>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>>>> > >>>>>> Chris, > >>>>>> > >>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >>>> normal > >>>>>> blast output. $result->query_name returns now undef. > >>>>>> > >>>>>> (Using the anonymous git now). This change still works: > >>>>>> > >>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>> Author: cjfields > >>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>>>> > >>>>>> Robson's patch for buggy blastpgp output > >>>>>> > >>>>>> But this does not: > >>>>>> > >>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>>>> Author: cjfields > >>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>>>> > >>>>>> [bug 3031] > >>>>>> > >>>>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>>>> > >>>>>> That makes it easy to find the diffs: > >>>>>> > >>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>>>> index 378023a..6f7eeeb 100644 > >>>>>> --- a/Bio/SearchIO/blast.pm > >>>>>> +++ b/Bio/SearchIO/blast.pm > >>>>>> @@ -209,6 +209,7 @@ BEGIN { > >>>>>> > >>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>>>> 'BlastOutput_version' => > >>>> 'RESULT-algorithm_version', > >>>>>> + 'BlastOutput_algorithm-reference' => > >>>>> 'RESULT-algorithm_reference', > >>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>>>> @@ -504,6 +505,26 @@ sub next_result { > >>>>>> } > >>>>>> ); > >>>>>> } > >>>>>> + # parse the BLAST algorithm reference > >>>>>> + elsif(/^Reference:\s+(.*)$/) { > >>>>>> + # want to preserve newlines for the BLAST algorithm > >>>>> reference > >>>>>> + my $algorithm_reference = "$1\n"; > >>>>>> + $_ = $self->_readline; > >>>>>> + # while the current line, does not match an empty line, > a > >>>>> RID:, > >>>>>> or a Database:, we are still looking at the > >>>>>> + # algorithm_reference, append it to what we parsed so > far > >>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ > /^Database:/) > >> { > >>>>>> + $algorithm_reference .= "$_"; > >>>>>> + $_ = $self->_readline; > >>>>>> + } > >>>>>> + # if we exited the while loop, we saw an empty line, a > >>>> RID:, > >>>>> or > >>>>>> a Database:, so push it back > >>>>>> + $self->_pushback($_); > >>>>>> + $self->element( > >>>>>> + { > >>>>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>>>> + 'Data' => $algorithm_reference > >>>>>> + } > >>>>>> + ); > >>>>>> + } > >>>>>> # added Windows workaround for bug 1985 > >>>>>> elsif (/^(Searching|Results from round)/) { > >>>>>> next unless $1 =~ /Results from round/; > >>>>>> > >>>>>> > >>>>>> I am not sure why reference parsing messes things up. Maybe it eats > >> too > >>>>> many > >>>>>> lines from the result file. > >>>>>> > >>>>>> Yours, > >>>>>> > >>>>>> -Heikki > >>>>>> > >>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>>> > >>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>>>> #4216 > >>>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun May 9 20:39:33 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 19:39:33 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> Ok, that's fine. It may be something off with revision numbers when using svn with github (git doesn't have incremental revisions, but a SHA). Committed the patch to dev svn, in r16970. chris On May 9, 2010, at 6:48 PM, Razi Khaja wrote: > I checked out bioperl-live from github: > svn checkout http://svn.github.com/bioperl/bioperl-live.git > > I just checked it out again, a few seconds ago and by default I got revision > 11326. > Razi > > > On Sun, May 9, 2010 at 5:30 PM, Chris Fields wrote: > >> Then something is wrong, as current trunk is at r16969. Where are you >> pulling your code from? Our only working anon. server is the sync'ed github >> one. >> >> chris >> >> On May 9, 2010, at 4:15 PM, Razi Khaja wrote: >> >>> Hi Chris, >>> The patch is against the main trunk. I checked out version 11326 of the >>> repository today. >>> Razi >>> >>> >>> On Sun, May 9, 2010 at 4:43 PM, Chris Fields >> wrote: >>> >>>> If the patch is against main trunk it isn't a problem, otherwise the >> diff >>>> should be vs. that code. >>>> >>>> chris >>>> >>>> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: >>>> >>>>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. >>>>> Can someone advise an appropriate way to have this patch applied, given >>>> that >>>>> it is an amendment to a previous patch? >>>>> Thanks >>>>> Razi >>>>> >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: Heikki Lehvaslaiho >>>>> Date: Wed, May 5, 2010 at 2:11 AM >>>>> Subject: Re: [Bioperl-l] BLAST parsing broken >>>>> To: Razi Khaja >>>>> >>>>> >>>>> Hi Raja, >>>>> >>>>> Thanks for trying to fix this. >>>>> >>>>> I am attaching an example output file to this message. I just tested >>>> again >>>>> that master from git repository fails to get query ID, but the previous >>>>> version works. >>>>> >>>>> bala ~/src/bioperl-live> git checkout master >>>>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp >>>>> output >>>>> Switched to branch 'master' >>>>> >>>>> When I started using the latest mpiBLAST code a few months ago I did >>>> compare >>>>> the 0 output from it to standard NCBI blast and they were identical. >>>>> >>>>> >>>>> >>>>> >>>>> Also, I've noticed a discrepancy between within bioperl blast parsing >>>> that >>>>> I have not had time to work on. Would you be interested in having a >> look? >>>>> >>>>> I am creating output from mpiBLAST in 0 format and then converting it >>>> into >>>>> tab-delimited 8 format. I am unable to get 100% similarity for all >> cases >>>>> when I compare the conversion to the output straight from mpiBLAST in >>>> format >>>>> 8. Sometimes the mismatch and gap values are off by one. >>>>> >>>>> I am attaching a script that does the conversion. It is the same one I >>>> was >>>>> using when I noticed the problem above. I was going to put the code >> into >>>>> bioperl but that got delayed when I noticed the discrepancies. >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> >>>>> -Heikki >>>>> >>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>> >>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>> #4216 >>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>> >>>>> >>>>> >>>>> On 4 May 2010 20:55, Razi Khaja wrote: >>>>> >>>>>> That is odd. Heikki, do you have a blast output file that produces >> this >>>>>> error? >>>>>> Could you attach the file and either send to the list or myself (if >> the >>>>>> list >>>>>> does not accept attachments). >>>>>> Thanks, >>>>>> Razi >>>>>> >>>>>> >>>>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >>>>>> wrote: >>>>>> >>>>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that >> (in >>>>>> svn >>>>>>> of course, until the migration is complete). >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>>>>>> >>>>>>>> Chris, >>>>>>>> >>>>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >>>>>> normal >>>>>>>> blast output. $result->query_name returns now undef. >>>>>>>> >>>>>>>> (Using the anonymous git now). This change still works: >>>>>>>> >>>>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>>>> Author: cjfields >>>>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>>>>>> >>>>>>>> Robson's patch for buggy blastpgp output >>>>>>>> >>>>>>>> But this does not: >>>>>>>> >>>>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>>>>>> Author: cjfields >>>>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>>>>>> >>>>>>>> [bug 3031] >>>>>>>> >>>>>>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>>>>>> >>>>>>>> That makes it easy to find the diffs: >>>>>>>> >>>>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>>>>>> index 378023a..6f7eeeb 100644 >>>>>>>> --- a/Bio/SearchIO/blast.pm >>>>>>>> +++ b/Bio/SearchIO/blast.pm >>>>>>>> @@ -209,6 +209,7 @@ BEGIN { >>>>>>>> >>>>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>>>>>> 'BlastOutput_version' => >>>>>> 'RESULT-algorithm_version', >>>>>>>> + 'BlastOutput_algorithm-reference' => >>>>>>> 'RESULT-algorithm_reference', >>>>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>>>>>> @@ -504,6 +505,26 @@ sub next_result { >>>>>>>> } >>>>>>>> ); >>>>>>>> } >>>>>>>> + # parse the BLAST algorithm reference >>>>>>>> + elsif(/^Reference:\s+(.*)$/) { >>>>>>>> + # want to preserve newlines for the BLAST algorithm >>>>>>> reference >>>>>>>> + my $algorithm_reference = "$1\n"; >>>>>>>> + $_ = $self->_readline; >>>>>>>> + # while the current line, does not match an empty line, >> a >>>>>>> RID:, >>>>>>>> or a Database:, we are still looking at the >>>>>>>> + # algorithm_reference, append it to what we parsed so >> far >>>>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ >> /^Database:/) >>>> { >>>>>>>> + $algorithm_reference .= "$_"; >>>>>>>> + $_ = $self->_readline; >>>>>>>> + } >>>>>>>> + # if we exited the while loop, we saw an empty line, a >>>>>> RID:, >>>>>>> or >>>>>>>> a Database:, so push it back >>>>>>>> + $self->_pushback($_); >>>>>>>> + $self->element( >>>>>>>> + { >>>>>>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>>>>>> + 'Data' => $algorithm_reference >>>>>>>> + } >>>>>>>> + ); >>>>>>>> + } >>>>>>>> # added Windows workaround for bug 1985 >>>>>>>> elsif (/^(Searching|Results from round)/) { >>>>>>>> next unless $1 =~ /Results from round/; >>>>>>>> >>>>>>>> >>>>>>>> I am not sure why reference parsing messes things up. Maybe it eats >>>> too >>>>>>> many >>>>>>>> lines from the result file. >>>>>>>> >>>>>>>> Yours, >>>>>>>> >>>>>>>> -Heikki >>>>>>>> >>>>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>>>>> >>>>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>>>>> #4216 >>>>>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> >>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cmb433 at nyu.edu Sun May 9 22:22:52 2010 From: cmb433 at nyu.edu (bergeycm) Date: Sun, 9 May 2010 19:22:52 -0700 (PDT) Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely Message-ID: <28506482.post@talk.nabble.com> Hi all, I'm attempting to query GenBank for all sequences' lengths for a given taxon. I'm using get_Stream_by_query(), but only to grab the species, length, and accession. The genus of interest has almost 500,000 GB entries, though, and my code hangs up at odd points in the info-gathering loop. (Often after only 300 or 400 iterations.) The problem is that $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back undefined. I've tried wrapping the next_seq portion of the code in an eval block, but to no avail. Is there a way to split a query into a bunch of small streams that aren't too much to ask? Or is there a way to pick up a dropped SeqIO stream? I think the connection is timing out and the stream is being lost. Any advice is greatly appreciated, as I'm fairly new to BioPerl. - bergeycm use Bio::DB::GenBank; use Bio::DB::Query::GenBank; # Get general things ready to go for querying GenBank my %options; $options{'-maxids'} = '500000'; # There are presently 460,184 sequences $options{'-db'} = 'nucleotide'; $options{'-query'} = "Pongo [ORGN]"; # Orangutans my $query_obj = Bio::DB::Query::GenBank->new(%options); my $total = $query_obj->count; my $gb_obj = Bio::DB::GenBank->new(); my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); # Restrict info to just what I'll be using. No sequence necessary. my $builder = $stream_obj->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('species','length','accession'); my $c = 0; for (1 .. $total) { eval { my $seq_obj = $stream_obj->next_seq; my $flavor = $seq_obj->species; print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", $seq_obj->length, "\t", $seq_obj->accession, "\n"; }; if ($@) { print $!, '\n'; } # Pause for a little over a third of a second select(undef, undef, undef, 0.35); $c++; } -- View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From robert.bradbury at gmail.com Mon May 10 01:38:09 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 10 May 2010 01:38:09 -0400 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <28506482.post@talk.nabble.com> References: <28506482.post@talk.nabble.com> Message-ID: I don't know whether this is related or not. But the last time I tried to fetch a moderately large genome (NS_000198 for *Podospera anserina*) it failed [1]. It takes a *very* long time and eventually springs an "Out of Memory" error. This is on a Pentium IV Prescott which only has a 4GB address space (configured for 3GB for user programs) and after running a long strace on the perl process it seemed that what was happening was that it was never properly returning and merging memory from the sequence chunks which were being fetched. The final program address was brk(0xafd8c000) or 2,950,217,728 which is probably the maximum amount of data space a user program can have considering that one needs room for the stack. After that the mmap2() calls started failing with ENOMEM. If Bio::DB::GenBank::Query is intelligent enough to only fetch just the requested fields you should be ok. But if it fetches the entire GenBank record and simply throws away the sequence information and you are running into large sequences (say a big chunk of a chromosome) and this ends up hitting the memory/swap space limits on your machine that could be a problem. If the program is running for a long time I'd be inclined to check my system logs to see if one is running out of memory/swap. You can also watch the process using ps to determine if the VSZ grows continuously. I think I mentioned this before on the BioPerl list but never had a clear understanding of what was going on and may not have filed a bug report. I think I eventually worked around it, perhaps by fetching the offending (large) sequence using wget or a browser. Robert 1. Given that NS_000198 is only ~7MB (4.6 million actual bases) the BioPerl memory management has to be really poor in merging/reusing if the fetch uses ~3GB. From bhakti.dwivedi at gmail.com Mon May 10 11:22:41 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Mon, 10 May 2010 11:22:41 -0400 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface Message-ID: Does anyone know why the blast results vary for a query sequence when search is conducted using a web-based interface versus a Command line interface? For example, my web-based blast top hits do not match the top hits of the command line blast (blastcl3). I am using the default settings in both. not sure why the results are different Even if the hit is there, the e-value, bit score etc are different for the same hsp regions identified within the hit. is there a difference in the blast algorithm? or is it the database? Thanks! From cjfields at illinois.edu Mon May 10 12:28:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:28:15 -0500 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: References: Message-ID: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> The default web-based parameters differ than those via blastcl3, so if you are using the defaults for both they may differ somewhat. chris On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: > Does anyone know why the blast results vary for a query sequence when search > is conducted using a web-based interface versus a Command line interface? > > For example, my web-based blast top hits do not match the top hits of the > command line blast (blastcl3). I am using the default settings in both. > not sure why the results are different Even if the hit is there, the > e-value, bit score etc are different for the same hsp regions identified > within the hit. is there a difference in the blast algorithm? or is it the > database? > > Thanks! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 12:31:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:31:15 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: References: <28506482.post@talk.nabble.com> Message-ID: On May 10, 2010, at 12:38 AM, Robert Bradbury wrote: > I don't know whether this is related or not. But the last time I tried to > fetch a moderately large genome (NS_000198 for *Podospera anserina*) it > failed [1]. It takes a *very* long time and eventually springs an "Out of > Memory" error. This is on a Pentium IV Prescott which only has a 4GB > address space (configured for 3GB for user programs) and after running a > long strace on the perl process it seemed that what was happening was that > it was never properly returning and merging memory from the sequence chunks > which were being fetched. The final program address was brk(0xafd8c000) or > 2,950,217,728 which is probably the maximum amount of data space a user > program can have considering that one needs room for the stack. After that > the mmap2() calls started failing with ENOMEM. That's odd. What OS? > If Bio::DB::GenBank::Query is intelligent enough to only fetch just the > requested fields you should be ok. But if it fetches the entire GenBank > record and simply throws away the sequence information and you are running > into large sequences (say a big chunk of a chromosome) and this ends up > hitting the memory/swap space limits on your machine that could be a > problem. Yes, that may happen, as (at the moment) we push everything into memory; there are no lazy or DB-linked Seq instances, at least not yet. Very large sequences take a lot of time (object instantiation) and a lot of memory. To tell the truth, that seems to be the default of most toolkits, but we have recently talked about possible ways to deal with it, just need the tuits for it (as with anything). The other alternative is to pull the sequences down locally as a raw text file. This can still be done within BioPerl, just using Bio::DB::EUtilities: my $in = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nuccore', -email => 'cjfields at bioperl.org', -rettype => 'gbwithparts', -id => 'NS_000198'); $in->get_Response(-file => "$id.gb"); > If the program is running for a long time I'd be inclined to check my system > logs to see if one is running out of memory/swap. You can also watch the > process using ps to determine if the VSZ grows continuously. > > I think I mentioned this before on the BioPerl list but never had a clear > understanding of what was going on and may not have filed a bug report. I > think I eventually worked around it, perhaps by fetching the offending > (large) sequence using wget or a browser. You can still file a bug on it; does help with keeping track (just reporting it here doesn't help much, it gets lost in the shuffle). > Robert > > 1. Given that NS_000198 is only ~7MB (4.6 million actual bases) the BioPerl > memory management has to be really poor in merging/reusing if the fetch uses > ~3GB. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l BioPerl stores everything in memory, but I've worked with 4.6Mbp genomes quite a bit on my MB Pro. However, the default mode for Bio;:DB::GenBank is to pull down everything using 'gbwithparts'. This file is much larger doing so (sequence is ~34Mbp, file is ~51 MB). Maybe that's the problem? If you can please file a bug report, along with the relevant information. That helps us determine the best course of action. chris From cjfields at illinois.edu Mon May 10 12:32:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:32:43 -0500 Subject: [Bioperl-l] Read/write round-tripping Was: Re: New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE6639B.6060004@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> <4BE6639B.6060004@gmail.com> Message-ID: <4B47AB3F-3190-4ACC-8235-8F5D6DBE7DC6@illinois.edu> If there is dynamic ID assignment I would assume you can't compare them between runs, so using is_deeply() won't work as advertised since we already know the ID will change between runs anyway, it's a self-fulfilling prophecy. Also, is_deeply() here is inspecting the SF::Collection blessed hash directly (the _btree is a tied DB_File hash), not sure that's what you want either. So at this point I would have to ask myself: 1) Is the dynamic ID assignment a bug (e.g. should we be using a fixed ID of some sort)? If not, we can't expect these to match across runs, so is_deeply won't work. 2) Would it make more sense to explicitly inspect the handled objects (SF::Collection) directly via method calls? For instance, if I want to see whether a set of features falls within a region, is that reproducible between runs? Either way, I'm not sure what using Test::Deeply would gain you, as it's still meant to inspect complex data structures, just with a bit more sugar than Test::More and is_deeply(). Per #2 above, I would be more explicit in inspecting the SF::Collection: my $collection = $contig->get_features_collection; # check that IDs in SF::Collection conform to a regex using like() # inspect other things about the collection... chris On May 9, 2010, at 2:26 AM, Florent Angly wrote: > Chris, > > I've thought some more on the problem and I now agree with you that round-tripping at the object-level is more powerful. > > It has the problem that some objects are given IDs dynamically every time, which means that identical input files won't have an identical object. > >> is_deeply( $obj_out , $obj_in , 'deep compare' ); > >> not ok 1 - deep compare >> # Failed test 'deep compare' >> # at ./test_roundtrip.pl line 33. >> # Structures begin differing at: >> # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '56438592' >> # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '54980512' >> 1..1 >> # Looks like you failed 1 test of 1. > > > And when I re-run this again: > >> not ok 1 - deep compare >> # Failed test 'deep compare' >> # at ./test_roundtrip.pl line 33. >> # Structures begin differing at: >> # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '47763264' >> # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '46305184' >> 1..1 >> # Looks like you failed 1 test of 1. > > Note how the value of _btree changes everytime. > > Maybe using Test::Deep would be a good approach (http://search.cpan.org/~fdaly/Test-Deep-0.106/lib/Test/Deep.pod): >> Where it becomes more interesting is in allowing you to do something besides simple exact comparisons. With strings, the |eq| operator checks that 2 strings are exactly equal but sometimes that's not what you want. When you don't know exactly what the string should be but you do know some things about how it should look, |eq| is no good and you must use pattern matching instead. Test::Deep provides pattern matching for complex data structures > > Florent > > > > > On 09/05/10 10:02, Chris Fields wrote: >> Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. >> >> chris >> >> On May 8, 2010, at 6:47 PM, Chris Fields wrote: >> >> >>> To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. >>> >>> chris >>> >>> On May 8, 2010, at 6:34 AM, Florent Angly wrote: >>> >>> >>>> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >>>> >>>> It looks like the Bio::SeqIO modules tests could use it as well. >>>> >>>> Cheers, >>>> >>>> Florent >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 12:58:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:58:07 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <28506482.post@talk.nabble.com> References: <28506482.post@talk.nabble.com> Message-ID: <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> 500000 sequences is way too many to request, even in a loop. Under most circumstances this is breaking NCBI's eutils policies: http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements so don't be too surprised this is failing (this would be around 1000 queried of 500 sequences per query). You could try pulling down the raw sequence via batch entrez or using Bio::DB::EUtilities (which should die if an error occurs). chris On May 9, 2010, at 9:22 PM, bergeycm wrote: > > Hi all, > > I'm attempting to query GenBank for all sequences' lengths for a given > taxon. I'm using get_Stream_by_query(), but only to grab the species, > length, and accession. The genus of interest has almost 500,000 GB entries, > though, and my code hangs up at odd points in the info-gathering loop. > (Often after only 300 or 400 iterations.) The problem is that > $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back > undefined. > > I've tried wrapping the next_seq portion of the code in an eval block, but > to no avail. Is there a way to split a query into a bunch of small streams > that aren't too much to ask? Or is there a way to pick up a dropped SeqIO > stream? I think the connection is timing out and the stream is being lost. > Any advice is greatly appreciated, as I'm fairly new to BioPerl. > > - bergeycm > > > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > > # Get general things ready to go for querying GenBank > my %options; > $options{'-maxids'} = '500000'; # There are presently 460,184 sequences > $options{'-db'} = 'nucleotide'; > $options{'-query'} = "Pongo [ORGN]"; # Orangutans > > > my $query_obj = Bio::DB::Query::GenBank->new(%options); > my $total = $query_obj->count; > > my $gb_obj = Bio::DB::GenBank->new(); > my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); > > # Restrict info to just what I'll be using. No sequence necessary. > my $builder = $stream_obj->sequence_builder(); > $builder->want_none(); > $builder->add_wanted_slot('species','length','accession'); > > my $c = 0; > > for (1 .. $total) { > eval { > my $seq_obj = $stream_obj->next_seq; > my $flavor = $seq_obj->species; > print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", > $seq_obj->length, "\t", $seq_obj->accession, "\n"; > }; > > if ($@) { > print $!, '\n'; > } > > # Pause for a little over a third of a second > select(undef, undef, undef, 0.35); > > $c++; > } > > > > -- > View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 13:07:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 12:07:00 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> References: <28506482.post@talk.nabble.com> <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> Message-ID: <58E399D4-A884-4DC1-A5C6-8B0CBDDB173A@illinois.edu> (addendum added, sent too early) On May 10, 2010, at 11:58 AM, Chris Fields wrote: > 500000 sequences is way too many to request, even in a loop. Under most circumstances this is breaking NCBI's eutils policies: > > http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements > > so don't be too surprised this is failing (this would be around 1000 queried of 500 sequences per query). > > You could try pulling down the raw sequence via batch entrez or using Bio::DB::EUtilities (which should die if an error occurs). But you may still run into issues with eutils at some point, particularly if running this at peak times. > > chris > > On May 9, 2010, at 9:22 PM, bergeycm wrote: > >> >> Hi all, >> >> I'm attempting to query GenBank for all sequences' lengths for a given >> taxon. I'm using get_Stream_by_query(), but only to grab the species, >> length, and accession. The genus of interest has almost 500,000 GB entries, >> though, and my code hangs up at odd points in the info-gathering loop. >> (Often after only 300 or 400 iterations.) The problem is that >> $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back >> undefined. >> >> I've tried wrapping the next_seq portion of the code in an eval block, but >> to no avail. Is there a way to split a query into a bunch of small streams >> that aren't too much to ask? Or is there a way to pick up a dropped SeqIO >> stream? I think the connection is timing out and the stream is being lost. >> Any advice is greatly appreciated, as I'm fairly new to BioPerl. >> >> - bergeycm >> >> >> >> use Bio::DB::GenBank; >> use Bio::DB::Query::GenBank; >> >> >> # Get general things ready to go for querying GenBank >> my %options; >> $options{'-maxids'} = '500000'; # There are presently 460,184 sequences >> $options{'-db'} = 'nucleotide'; >> $options{'-query'} = "Pongo [ORGN]"; # Orangutans >> >> >> my $query_obj = Bio::DB::Query::GenBank->new(%options); >> my $total = $query_obj->count; >> >> my $gb_obj = Bio::DB::GenBank->new(); >> my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); >> >> # Restrict info to just what I'll be using. No sequence necessary. >> my $builder = $stream_obj->sequence_builder(); >> $builder->want_none(); >> $builder->add_wanted_slot('species','length','accession'); >> >> my $c = 0; >> >> for (1 .. $total) { >> eval { >> my $seq_obj = $stream_obj->next_seq; >> my $flavor = $seq_obj->species; >> print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", >> $seq_obj->length, "\t", $seq_obj->accession, "\n"; >> }; >> >> if ($@) { >> print $!, '\n'; >> } >> >> # Pause for a little over a third of a second >> select(undef, undef, undef, 0.35); >> >> $c++; >> } >> >> >> >> -- >> View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Mon May 10 13:14:36 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Mon, 10 May 2010 18:14:36 +0100 Subject: [Bioperl-l] Bio::Align - alignment by position? In-Reply-To: References: Message-ID: <003701caf064$441c4660$cc54d320$%yin@ucd.ie> Hi, When you use $aln->slice(), there is a third optional parameter to keep gap-only columns in newly created slice, e.g. $aln2=$aln->slice(20,30,1); By defining the third parameter, you can keep gap-only sub sequences. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From bhakti.dwivedi at gmail.com Mon May 10 14:35:37 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Mon, 10 May 2010 14:35:37 -0400 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> References: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> Message-ID: Thanks Chris! I changed few parameter values in blastcl3 and now the results are same. Any particular reason to set the default differently in web-based and command-line blast search? Bhakti On Mon, May 10, 2010 at 12:28 PM, Chris Fields wrote: > The default web-based parameters differ than those via blastcl3, so if you > are using the defaults for both they may differ somewhat. > > chris > > On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: > > > Does anyone know why the blast results vary for a query sequence when > search > > is conducted using a web-based interface versus a Command line interface? > > > > For example, my web-based blast top hits do not match the top hits of > the > > command line blast (blastcl3). I am using the default settings in both. > > not sure why the results are different Even if the hit is there, the > > e-value, bit score etc are different for the same hsp regions identified > > within the hit. is there a difference in the blast algorithm? or is it > the > > database? > > > > Thanks! > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 10 15:47:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 14:47:56 -0500 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: References: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> Message-ID: you would need to ask NCBI that. chris On May 10, 2010, at 1:35 PM, Bhakti Dwivedi wrote: > Thanks Chris! I changed few parameter values in blastcl3 and now the > results are same. Any particular reason to set the default differently in > web-based and command-line blast search? > > Bhakti > > > > On Mon, May 10, 2010 at 12:28 PM, Chris Fields wrote: > >> The default web-based parameters differ than those via blastcl3, so if you >> are using the defaults for both they may differ somewhat. >> >> chris >> >> On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: >> >>> Does anyone know why the blast results vary for a query sequence when >> search >>> is conducted using a web-based interface versus a Command line interface? >>> >>> For example, my web-based blast top hits do not match the top hits of >> the >>> command line blast (blastcl3). I am using the default settings in both. >>> not sure why the results are different Even if the hit is there, the >>> e-value, bit score etc are different for the same hsp regions identified >>> within the hit. is there a difference in the blast algorithm? or is it >> the >>> database? >>> >>> Thanks! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dimitark at bii.a-star.edu.sg Mon May 10 22:03:51 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 11 May 2010 10:03:51 +0800 Subject: [Bioperl-l] StandAloneFasta and Too many open files Message-ID: <4BE8BB07.3040407@bii.a-star.edu.sg> Hi guys, yesterday i got the following error: 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' from the following code: ------------ my $ssout="my_seq_out.txt"; print "SS:$tquery:\n:$tseq:\n"; my @sargs=( 'q' => '', 'E' => '1', 'w' => '100', 'O' => "$ssout", 'program' => "ssearch36", ); my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); $fac_ss->library($tmpseq); my @sreport=$fac_ss->run($tqtmp); foreach my $sr (@sreport){ while(my $result=$sr->next_result){ while(my $hit=$result->next_hit){ while(my $hsp=$hit->next_hsp){ my $iden=$hsp->frac_identical; $rv3=$iden; # print "IDEN:$iden:$rv1\n"; } } } } -------------------- I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. Next i commented out the above code and rewrote my script into the following: -------------- my $ssout="my_seq_out.txt"; my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); system(@sargs) == 0 or die "system @sargs failed: $!"; my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); while(my $result=$sreport->next_result){ # print Dumper($result); while(my $hit=$result->next_hit){ while(my $hsp=$hit->next_hsp){ my $iden=$hsp->frac_identical; $rv3=$iden; # print "IDEN:$iden:$rv1\n"; } } } --------------- Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. Ok that is it. I just wanted to inform about my experience and to report the problem. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From cjfields at illinois.edu Mon May 10 23:04:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 22:04:12 -0500 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <4BE8BB07.3040407@bii.a-star.edu.sg> References: <4BE8BB07.3040407@bii.a-star.edu.sg> Message-ID: <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: > Hi guys, > yesterday i got the following error: > > 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' > > from the following code: > ------------ > my $ssout="my_seq_out.txt"; > print "SS:$tquery:\n:$tseq:\n"; > my @sargs=( > 'q' => '', > 'E' => '1', > 'w' => '100', > 'O' => "$ssout", > 'program' => "ssearch36", > ); > my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); > $fac_ss->library($tmpseq); > my @sreport=$fac_ss->run($tqtmp); > > foreach my $sr (@sreport){ > while(my $result=$sr->next_result){ > while(my $hit=$result->next_hit){ > while(my $hsp=$hit->next_hsp){ > my $iden=$hsp->frac_identical; > $rv3=$iden; > # print "IDEN:$iden:$rv1\n"; > } > } > } > } > -------------------- > I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. > I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. > Next i commented out the above code and rewrote my script into the following: > -------------- > my $ssout="my_seq_out.txt"; > my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); > system(@sargs) == 0 or die "system @sargs failed: $!"; > > my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); > while(my $result=$sreport->next_result){ > # print Dumper($result); > while(my $hit=$result->next_hit){ > while(my $hsp=$hit->next_hsp){ > > my $iden=$hsp->frac_identical; > $rv3=$iden; > # print "IDEN:$iden:$rv1\n"; > } > } > } > --------------- > Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. > > I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. > > Ok that is it. I just wanted to inform about my experience and to report the problem. > > Cheers > Dimitar Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. chris From cjfields at illinois.edu Mon May 10 23:57:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 22:57:18 -0500 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> References: <4BE8BB07.3040407@bii.a-star.edu.sg> <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> Message-ID: <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> Addendum to that last post. On May 10, 2010, at 10:04 PM, Chris Fields wrote: > On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: > >> Hi guys, >> yesterday i got the following error: >> >> 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' >> >> from the following code: >> ------------ >> my $ssout="my_seq_out.txt"; >> print "SS:$tquery:\n:$tseq:\n"; >> my @sargs=( >> 'q' => '', >> 'E' => '1', >> 'w' => '100', >> 'O' => "$ssout", >> 'program' => "ssearch36", >> ); >> my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); >> $fac_ss->library($tmpseq); >> my @sreport=$fac_ss->run($tqtmp); >> >> foreach my $sr (@sreport){ >> while(my $result=$sr->next_result){ >> while(my $hit=$result->next_hit){ >> while(my $hsp=$hit->next_hsp){ >> my $iden=$hsp->frac_identical; >> $rv3=$iden; >> # print "IDEN:$iden:$rv1\n"; >> } >> } >> } >> } >> -------------------- >> I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. >> I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. >> Next i commented out the above code and rewrote my script into the following: >> -------------- >> my $ssout="my_seq_out.txt"; >> my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); >> system(@sargs) == 0 or die "system @sargs failed: $!"; >> >> my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); >> while(my $result=$sreport->next_result){ >> # print Dumper($result); >> while(my $hit=$result->next_hit){ >> while(my $hsp=$hit->next_hsp){ >> >> my $iden=$hsp->frac_identical; >> $rv3=$iden; >> # print "IDEN:$iden:$rv1\n"; >> } >> } >> } >> --------------- >> Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. >> >> I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. >> >> Ok that is it. I just wanted to inform about my experience and to report the problem. >> >> Cheers >> Dimitar > > > Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. > > chris Dimitar, I think Peter had answered this before, might indicate the problem is actually using the 'O' option in output. We can look at possibly just capturing STDOUT instead, but we may not support the use of 'O' if it's as buggy as indicated. http://groups.google.com/group/bioperl-l/msg/25c17748d1ac6ef4 chris From dimitark at bii.a-star.edu.sg Tue May 11 00:24:13 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 11 May 2010 12:24:13 +0800 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> References: <4BE8BB07.3040407@bii.a-star.edu.sg> <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> Message-ID: <4BE8DBED.2000209@bii.a-star.edu.sg> Hi Chris, thank you for the information. I checked it out. I wrote you and the list about that as well. To you on 16.04.2010 and to the list on 23.04.2010. There i explained that i modified the module. Now i pass it the '0' option but this option is not passed to the actual program executed by system. I just add my desired output with "> $output" to the parameter line passed to system. In the email mentioned above i attached the modified version of the module. I was digging again a bit about the module. I found that - line(359): ----------- unless( $outfile ) { open(FASTARUN, "$para |") || $self->throw($@);#original $object=Bio::SearchIO->new(-fh=>\*FASTARUN, #original -format=>"fasta");#original } else { ------------ And here another one when the 'O' is used - line(371): --------- $object = Bio::SearchIO->new(-file=>$self->O, -format=>"fasta"); ---------- May be the problem is here. Because i didnt see anywhere a 'close' for these filehandles. I can test and tell if i was right. Cheers Dimitar On 05/11/2010 11:57 AM, Chris Fields wrote: > Addendum to that last post. > > On May 10, 2010, at 10:04 PM, Chris Fields wrote: > > >> On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: >> >> >>> Hi guys, >>> yesterday i got the following error: >>> >>> 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' >>> >>> from the following code: >>> ------------ >>> my $ssout="my_seq_out.txt"; >>> print "SS:$tquery:\n:$tseq:\n"; >>> my @sargs=( >>> 'q' => '', >>> 'E' => '1', >>> 'w' => '100', >>> 'O' => "$ssout", >>> 'program' => "ssearch36", >>> ); >>> my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); >>> $fac_ss->library($tmpseq); >>> my @sreport=$fac_ss->run($tqtmp); >>> >>> foreach my $sr (@sreport){ >>> while(my $result=$sr->next_result){ >>> while(my $hit=$result->next_hit){ >>> while(my $hsp=$hit->next_hsp){ >>> my $iden=$hsp->frac_identical; >>> $rv3=$iden; >>> # print "IDEN:$iden:$rv1\n"; >>> } >>> } >>> } >>> } >>> -------------------- >>> I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. >>> I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. >>> Next i commented out the above code and rewrote my script into the following: >>> -------------- >>> my $ssout="my_seq_out.txt"; >>> my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq> $ssout"); >>> system(@sargs) == 0 or die "system @sargs failed: $!"; >>> >>> my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); >>> while(my $result=$sreport->next_result){ >>> # print Dumper($result); >>> while(my $hit=$result->next_hit){ >>> while(my $hsp=$hit->next_hsp){ >>> >>> my $iden=$hsp->frac_identical; >>> $rv3=$iden; >>> # print "IDEN:$iden:$rv1\n"; >>> } >>> } >>> } >>> --------------- >>> Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. >>> >>> I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. >>> >>> Ok that is it. I just wanted to inform about my experience and to report the problem. >>> >>> Cheers >>> Dimitar >>> >> >> Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. >> >> chris >> > Dimitar, > > I think Peter had answered this before, might indicate the problem is actually using the 'O' option in output. We can look at possibly just capturing STDOUT instead, but we may not support the use of 'O' if it's as buggy as indicated. > > http://groups.google.com/group/bioperl-l/msg/25c17748d1ac6ef4 > > chris > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From heikki.lehvaslaiho at gmail.com Tue May 11 01:40:14 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Tue, 11 May 2010 08:40:14 +0300 Subject: [Bioperl-l] Github possibilities Message-ID: FYI http://chem-bla-ics.blogspot.com/2010/05/github-simplifies-code-review-and.html -Heikki From heikki.lehvaslaiho at gmail.com Tue May 11 01:43:42 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Tue, 11 May 2010 08:43:42 +0300 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> Message-ID: Thanks Razi and Chris, Blast parsing works again beautifully. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 10 May 2010 03:39, Chris Fields wrote: > Ok, that's fine. It may be something off with revision numbers when using > svn with github (git doesn't have incremental revisions, but a SHA). > Committed the patch to dev svn, in r16970. > > chris > > On May 9, 2010, at 6:48 PM, Razi Khaja wrote: > > > I checked out bioperl-live from github: > > svn checkout http://svn.github.com/bioperl/bioperl-live.git > > > > I just checked it out again, a few seconds ago and by default I got > revision > > 11326. > > Razi > > > > > > On Sun, May 9, 2010 at 5:30 PM, Chris Fields > wrote: > > > >> Then something is wrong, as current trunk is at r16969. Where are you > >> pulling your code from? Our only working anon. server is the sync'ed > github > >> one. > >> > >> chris > >> > >> On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > >> > >>> Hi Chris, > >>> The patch is against the main trunk. I checked out version 11326 of > the > >>> repository today. > >>> Razi > >>> > >>> > >>> On Sun, May 9, 2010 at 4:43 PM, Chris Fields > >> wrote: > >>> > >>>> If the patch is against main trunk it isn't a problem, otherwise the > >> diff > >>>> should be vs. that code. > >>>> > >>>> chris > >>>> > >>>> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > >>>> > >>>>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > >>>>> Can someone advise an appropriate way to have this patch applied, > given > >>>> that > >>>>> it is an amendment to a previous patch? > >>>>> Thanks > >>>>> Razi > >>>>> > >>>>> > >>>>> ---------- Forwarded message ---------- > >>>>> From: Heikki Lehvaslaiho > >>>>> Date: Wed, May 5, 2010 at 2:11 AM > >>>>> Subject: Re: [Bioperl-l] BLAST parsing broken > >>>>> To: Razi Khaja > >>>>> > >>>>> > >>>>> Hi Raja, > >>>>> > >>>>> Thanks for trying to fix this. > >>>>> > >>>>> I am attaching an example output file to this message. I just tested > >>>> again > >>>>> that master from git repository fails to get query ID, but the > previous > >>>>> version works. > >>>>> > >>>>> bala ~/src/bioperl-live> git checkout master > >>>>> Previous HEAD position was 5e278f5... Robson's patch for buggy > blastpgp > >>>>> output > >>>>> Switched to branch 'master' > >>>>> > >>>>> When I started using the latest mpiBLAST code a few months ago I did > >>>> compare > >>>>> the 0 output from it to standard NCBI blast and they were identical. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Also, I've noticed a discrepancy between within bioperl blast > parsing > >>>> that > >>>>> I have not had time to work on. Would you be interested in having a > >> look? > >>>>> > >>>>> I am creating output from mpiBLAST in 0 format and then converting it > >>>> into > >>>>> tab-delimited 8 format. I am unable to get 100% similarity for all > >> cases > >>>>> when I compare the conversion to the output straight from mpiBLAST in > >>>> format > >>>>> 8. Sometimes the mismatch and gap values are off by one. > >>>>> > >>>>> I am attaching a script that does the conversion. It is the same one > I > >>>> was > >>>>> using when I noticed the problem above. I was going to put the code > >> into > >>>>> bioperl but that got delayed when I noticed the discrepancies. > >>>>> > >>>>> > >>>>> Cheers, > >>>>> > >>>>> > >>>>> -Heikki > >>>>> > >>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>> > >>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>>> #4216 > >>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>> > >>>>> > >>>>> > >>>>> On 4 May 2010 20:55, Razi Khaja wrote: > >>>>> > >>>>>> That is odd. Heikki, do you have a blast output file that produces > >> this > >>>>>> error? > >>>>>> Could you attach the file and either send to the list or myself (if > >> the > >>>>>> list > >>>>>> does not accept attachments). > >>>>>> Thanks, > >>>>>> Razi > >>>>>> > >>>>>> > >>>>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > > >>>>>> wrote: > >>>>>> > >>>>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that > >> (in > >>>>>> svn > >>>>>>> of course, until the migration is complete). > >>>>>>> > >>>>>>> chris > >>>>>>> > >>>>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>>>>>> > >>>>>>>> Chris, > >>>>>>>> > >>>>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >>>>>> normal > >>>>>>>> blast output. $result->query_name returns now undef. > >>>>>>>> > >>>>>>>> (Using the anonymous git now). This change still works: > >>>>>>>> > >>>>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>>>> Author: cjfields > >>>>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>>>>>> > >>>>>>>> Robson's patch for buggy blastpgp output > >>>>>>>> > >>>>>>>> But this does not: > >>>>>>>> > >>>>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>>>>>> Author: cjfields > >>>>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>>>>>> > >>>>>>>> [bug 3031] > >>>>>>>> > >>>>>>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>>>>>> > >>>>>>>> That makes it easy to find the diffs: > >>>>>>>> > >>>>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>>>>>> index 378023a..6f7eeeb 100644 > >>>>>>>> --- a/Bio/SearchIO/blast.pm > >>>>>>>> +++ b/Bio/SearchIO/blast.pm > >>>>>>>> @@ -209,6 +209,7 @@ BEGIN { > >>>>>>>> > >>>>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>>>>>> 'BlastOutput_version' => > >>>>>> 'RESULT-algorithm_version', > >>>>>>>> + 'BlastOutput_algorithm-reference' => > >>>>>>> 'RESULT-algorithm_reference', > >>>>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>>>>>> @@ -504,6 +505,26 @@ sub next_result { > >>>>>>>> } > >>>>>>>> ); > >>>>>>>> } > >>>>>>>> + # parse the BLAST algorithm reference > >>>>>>>> + elsif(/^Reference:\s+(.*)$/) { > >>>>>>>> + # want to preserve newlines for the BLAST algorithm > >>>>>>> reference > >>>>>>>> + my $algorithm_reference = "$1\n"; > >>>>>>>> + $_ = $self->_readline; > >>>>>>>> + # while the current line, does not match an empty > line, > >> a > >>>>>>> RID:, > >>>>>>>> or a Database:, we are still looking at the > >>>>>>>> + # algorithm_reference, append it to what we parsed so > >> far > >>>>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ > >> /^Database:/) > >>>> { > >>>>>>>> + $algorithm_reference .= "$_"; > >>>>>>>> + $_ = $self->_readline; > >>>>>>>> + } > >>>>>>>> + # if we exited the while loop, we saw an empty line, > a > >>>>>> RID:, > >>>>>>> or > >>>>>>>> a Database:, so push it back > >>>>>>>> + $self->_pushback($_); > >>>>>>>> + $self->element( > >>>>>>>> + { > >>>>>>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>>>>>> + 'Data' => $algorithm_reference > >>>>>>>> + } > >>>>>>>> + ); > >>>>>>>> + } > >>>>>>>> # added Windows workaround for bug 1985 > >>>>>>>> elsif (/^(Searching|Results from round)/) { > >>>>>>>> next unless $1 =~ /Results from round/; > >>>>>>>> > >>>>>>>> > >>>>>>>> I am not sure why reference parsing messes things up. Maybe it > eats > >>>> too > >>>>>>> many > >>>>>>>> lines from the result file. > >>>>>>>> > >>>>>>>> Yours, > >>>>>>>> > >>>>>>>> -Heikki > >>>>>>>> > >>>>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>>>>> > >>>>>>>> Computational Bioscience Research Centre (CBRC), Building #2, > Office > >>>>>>> #4216 > >>>>>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>> >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cmb433 at nyu.edu Sun May 9 19:40:48 2010 From: cmb433 at nyu.edu (bergeycm) Date: Sun, 9 May 2010 16:40:48 -0700 (PDT) Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely Message-ID: <28506482.post@talk.nabble.com> Hi all, I'm attempting to query GenBank for all sequences' lengths for a given taxon. I'm using get_Stream_by_query(), but only to grab the species, length, and accession. The genus of interest has almost 500,000 GB entries, though, and my code hangs up at odd points in the info-gathering loop. (Often after only 300 or 400 iterations.) The problem is that $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back undefined. I've tried wrapping the next_seq portion of the code in an eval block, but to no avail. Is there a way to split a query into a bunch of small streams that aren't too much to ask? Or is there a way to pick up a dropped SeqIO stream? I think the connection is timing out and the stream is being lost. Any advice is greatly appreciated, as I'm fairly new to BioPerl. - bergeycm use Bio::DB::GenBank; use Bio::DB::Query::GenBank; # Get general things ready to go for querying GenBank my %options; $options{'-maxids'} = '500000'; # There are presently 460,184 sequences $options{'-db'} = 'nucleotide'; $options{'-query'} = "Pongo [ORGN]"; # Orangutans my $query_obj = Bio::DB::Query::GenBank->new(%options); my $total = $query_obj->count; my $gb_obj = Bio::DB::GenBank->new(); my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); # Restrict info to just what I'll be using. No sequence necessary. my $builder = $stream_obj->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('species','length','accession'); my $c = 0; for (1 .. $total) { eval { my $seq_obj = $stream_obj->next_seq; my $flavor = $seq_obj->species; print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", $seq_obj->length, "\t", $seq_obj->accession, "\n"; }; if ($@) { print $!, '\n'; } # Pause for a little over a third of a second select(undef, undef, undef, 0.35); $c++; } -- View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From sudeep.mehrotra at mail.mcgill.ca Tue May 11 09:40:07 2010 From: sudeep.mehrotra at mail.mcgill.ca (Sudeep Mehrotra) Date: Tue, 11 May 2010 09:40:07 -0400 Subject: [Bioperl-l] [Fwd: Re: Modules in Bio:Tree] Message-ID: <4BE95E37.3060702@mail.mcgill.ca> Hello Jason, Your suggestion worked. Thanks. I have two format (NEXUS and NEWICK) for the same tree. I want to obtain a "clade list" in other words is there a way to obtain the leaves which are members of a clade. For example,part of NEXUS file has following entry: other entries ....... 655 Deinococcus_geothermalis, 656 Deinococcus_radiodurans, 657 Thermus_thermophilus, 658 Thermus_sp. ; other entries........ (((((655,656)[])[])[],(((657,658)[])[])[])[])[])[])[]); From the tree I can observe that 657 and 658 are members of a subclade and 655 and 656 are member of another subclade and both these belong to one clade. I want to get this membership information. I tried looking for a module in Bio::Tree but not able to find any. In Bio::NEXUS package there is a module "walk" which I thought would work for me, but it does not. Also, the Bio::NEXUS package is just not working for me. From the documentation the input file they are using it different from what I have. Is there any way I get the membership information as shown earlier. Cheers -- Sudeep Mehrotra (Ph.D. Candidate) McGill University and Genome Quebec Innovation Center -------------- next part -------------- An embedded message was scrubbed... From: Jason Stajich Subject: Re: Modules in Bio:Tree Date: Wed, 5 May 2010 18:45:41 -0400 Size: 5420 URL: From amackey at virginia.edu Tue May 11 17:26:50 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 11 May 2010 17:26:50 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug Message-ID: Hi Zerui (and others), I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, specifically in this code: lines: 1170: (-start => int ($loc->start / 3 ) +1, 1171: -end => int ($loc->end / 3 ) +1, both of those lines should look like: int (($loc->start - 1) / 3) + 1 otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) There is also a problem when mapping exon coordinates that are outside/after the CDS region (instead of getting undefined locations, you continue to get peptide coordinates, but they are invalid, larger than the protein length). Dennis and fringy -- this may affect the SNPtab.pl script I wrote for you, as it uses this module to calculate codons for SNPs. -Aaron P.S. a script the demonstrates the problem: use Bio::Coordinate::GeneMapper; my $mapper = Bio::Coordinate::GeneMapper ->new( -in => "chr", -out => "propeptide", -exons => [ Bio::Location::Simple ->new( -start => 101, -end => 109 ), Bio::Location::Simple ->new( -start => 201, -end => 221 ), ], -cds => Bio::Location::Simple ->new(-start => 101, -end => 209), ); print join("\t", "chr", "aa"), "\n"; for my $pos (99..111,199..211) { my $res = $mapper->map( Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => 1)); my $start = $res->start; $start = "NA" unless defined $start; my $end = $res->end; $end = "NA" unless defined $end; print join("\t", $pos, $start), "\n"; } From cjfields at illinois.edu Tue May 11 18:31:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 May 2010 17:31:17 -0500 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: Message-ID: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Aaron, Do we want to write this up as a set of tests to add to the bioperl test suite? We can probably add it after the github migration tomorrow. chris On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > Hi Zerui (and others), > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > specifically in this code: > > lines: > 1170: (-start => int ($loc->start / 3 ) +1, > 1171: -end => int ($loc->end / 3 ) +1, > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > There is also a problem when mapping exon coordinates that are outside/after > the CDS region (instead of getting undefined locations, you continue to get > peptide coordinates, but they are invalid, larger than the protein length). > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for you, > as it uses this module to calculate codons for SNPs. > > -Aaron > > P.S. a script the demonstrates the problem: > > use Bio::Coordinate::GeneMapper; > > my $mapper = > Bio::Coordinate::GeneMapper > ->new( -in => "chr", > -out => "propeptide", > -exons => [ Bio::Location::Simple > ->new( -start => 101, > -end => 109 ), > Bio::Location::Simple > ->new( -start => 201, > -end => 221 ), > ], > -cds => Bio::Location::Simple > ->new(-start => 101, -end => 209), > ); > > > print join("\t", "chr", "aa"), "\n"; > for my $pos (99..111,199..211) { > my $res = $mapper->map( > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => 1)); > my $start = $res->start; $start = "NA" unless defined $start; > my $end = $res->end; $end = "NA" unless defined $end; > print join("\t", $pos, $start), "\n"; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From amackey at virginia.edu Tue May 11 18:40:11 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 11 May 2010 18:40:11 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Hi Chris, I was hoping Heikki might take up the cause and investigate further -- let's give him a chance to respond. But it's a frightening bug if it's really been that way for all this time ... -Aaron On Tue, May 11, 2010 at 6:31 PM, Chris Fields wrote: > Aaron, > > Do we want to write this up as a set of tests to add to the bioperl test > suite? We can probably add it after the github migration tomorrow. > > chris > > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > > > Hi Zerui (and others), > > > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > > specifically in this code: > > > > lines: > > 1170: (-start => int ($loc->start / 3 ) +1, > > 1171: -end => int ($loc->end / 3 ) +1, > > > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > > > There is also a problem when mapping exon coordinates that are > outside/after > > the CDS region (instead of getting undefined locations, you continue to > get > > peptide coordinates, but they are invalid, larger than the protein > length). > > > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for > you, > > as it uses this module to calculate codons for SNPs. > > > > -Aaron > > > > P.S. a script the demonstrates the problem: > > > > use Bio::Coordinate::GeneMapper; > > > > my $mapper = > > Bio::Coordinate::GeneMapper > > ->new( -in => "chr", > > -out => "propeptide", > > -exons => [ Bio::Location::Simple > > ->new( -start => 101, > > -end => 109 ), > > Bio::Location::Simple > > ->new( -start => 201, > > -end => 221 ), > > ], > > -cds => Bio::Location::Simple > > ->new(-start => 101, -end => 209), > > ); > > > > > > print join("\t", "chr", "aa"), "\n"; > > for my $pos (99..111,199..211) { > > my $res = $mapper->map( > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => > 1)); > > my $start = $res->start; $start = "NA" unless defined $start; > > my $end = $res->end; $end = "NA" unless defined $end; > > print join("\t", $pos, $start), "\n"; > > } > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed May 12 00:15:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 May 2010 23:15:54 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow Message-ID: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Just a friendly reminder that we'll freeze the dev subversion repository tomorrow prior to migration to github. The migration will take about an hour, during which all bioperl github repos will be replaced with the full repos, and devs added. The test repos will be removed around that time (Heikki, will that be a problem?). chris From heikki.lehvaslaiho at gmail.com Wed May 12 00:23:07 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Wed, 12 May 2010 07:23:07 +0300 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Message-ID: No problem at all. Go ahead. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 07:15, Chris Fields wrote: > Just a friendly reminder that we'll freeze the dev subversion repository > tomorrow prior to migration to github. The migration will take about an > hour, during which all bioperl github repos will be replaced with the full > repos, and devs added. The test repos will be removed around that time > (Heikki, will that be a problem?). > > chris From heikki.lehvaslaiho at gmail.com Wed May 12 06:23:03 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Wed, 12 May 2010 13:23:03 +0300 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Outch. I'll definitely have a look. Strange that none of the tests have picked this up... -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 01:40, Aaron Mackey wrote: > Hi Chris, > > I was hoping Heikki might take up the cause and investigate further -- > let's > give him a chance to respond. But it's a frightening bug if it's really > been that way for all this time ... > > -Aaron > > On Tue, May 11, 2010 at 6:31 PM, Chris Fields > wrote: > > > Aaron, > > > > Do we want to write this up as a set of tests to add to the bioperl test > > suite? We can probably add it after the github migration tomorrow. > > > > chris > > > > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > > > > > Hi Zerui (and others), > > > > > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > > > specifically in this code: > > > > > > lines: > > > 1170: (-start => int ($loc->start / 3 ) +1, > > > 1171: -end => int ($loc->end / 3 ) +1, > > > > > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > > > > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect > peptide > > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > > > > > There is also a problem when mapping exon coordinates that are > > outside/after > > > the CDS region (instead of getting undefined locations, you continue to > > get > > > peptide coordinates, but they are invalid, larger than the protein > > length). > > > > > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for > > you, > > > as it uses this module to calculate codons for SNPs. > > > > > > -Aaron > > > > > > P.S. a script the demonstrates the problem: > > > > > > use Bio::Coordinate::GeneMapper; > > > > > > my $mapper = > > > Bio::Coordinate::GeneMapper > > > ->new( -in => "chr", > > > -out => "propeptide", > > > -exons => [ Bio::Location::Simple > > > ->new( -start => 101, > > > -end => 109 ), > > > Bio::Location::Simple > > > ->new( -start => 201, > > > -end => 221 ), > > > ], > > > -cds => Bio::Location::Simple > > > ->new(-start => 101, -end => 209), > > > ); > > > > > > > > > print join("\t", "chr", "aa"), "\n"; > > > for my $pos (99..111,199..211) { > > > my $res = $mapper->map( > > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => > > 1)); > > > my $start = $res->start; $start = "NA" unless defined $start; > > > my $end = $res->end; $end = "NA" unless defined $end; > > > print join("\t", $pos, $start), "\n"; > > > } > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed May 12 12:24:49 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:24:49 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <4BEAD562.1010702@cornell.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: <97B3DF77-C657-4E7C-8298-529F474E1FA5@illinois.edu> Yup, haven't started the migration yet (I'm taking down some crontab scripts used for prior github updates, nightly builds). Then I'll announce before freezing the repo. chris On May 12, 2010, at 11:20 AM, Robert Buels wrote: > The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. > > R > > Heikki Lehvaslaiho wrote: >> No problem at all. Go ahead. >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +966 545 595 849 office: +966 2 808 2429 >> Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 >> 4700 King Abdullah University of Science and Technology (KAUST) >> Thuwal 23955-6900, Kingdom of Saudi Arabia >> On 12 May 2010 07:15, Chris Fields wrote: >>> Just a friendly reminder that we'll freeze the dev subversion repository >>> tomorrow prior to migration to github. The migration will take about an >>> hour, during which all bioperl github repos will be replaced with the full >>> repos, and devs added. The test repos will be removed around that time >>> (Heikki, will that be a problem?). >>> >>> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Wed May 12 12:20:50 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 12 May 2010 09:20:50 -0700 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Message-ID: <4BEAD562.1010702@cornell.edu> The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. R Heikki Lehvaslaiho wrote: > No problem at all. Go ahead. > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > On 12 May 2010 07:15, Chris Fields wrote: > >> Just a friendly reminder that we'll freeze the dev subversion repository >> tomorrow prior to migration to github. The migration will take about an >> hour, during which all bioperl github repos will be replaced with the full >> repos, and devs added. The test repos will be removed around that time >> (Heikki, will that be a problem?). >> >> chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed May 12 12:43:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:43:42 -0500 Subject: [Bioperl-l] dev.open-bio.org SVN is now read-only Message-ID: Just like the subject says, switched the repo to a read only status. I'm starting the github migration now. chris From thomas.sharpton at gmail.com Wed May 12 12:45:22 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 12 May 2010 09:45:22 -0700 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <4BEAD562.1010702@cornell.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: Sorry if I screwed things up - updated before checking this email tread. -T On May 12, 2010, at 9:20 AM, Robert Buels wrote: > The SVN repository is not frozen yet, driveby_bot just say 16984 go > into svn from Thomas Sharpton. > > R > > Heikki Lehvaslaiho wrote: >> No problem at all. Go ahead. >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +966 545 595 849 office: +966 2 808 2429 >> Computational Bioscience Research Centre (CBRC), Building #2, >> Office #4216 >> 4700 King Abdullah University of Science and Technology (KAUST) >> Thuwal 23955-6900, Kingdom of Saudi Arabia >> On 12 May 2010 07:15, Chris Fields wrote: >>> Just a friendly reminder that we'll freeze the dev subversion >>> repository >>> tomorrow prior to migration to github. The migration will take >>> about an >>> hour, during which all bioperl github repos will be replaced with >>> the full >>> repos, and devs added. The test repos will be removed around that >>> time >>> (Heikki, will that be a problem?). >>> >>> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed May 12 12:47:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:47:36 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: <08E7C628-D914-43C0-AB3D-E8FC41A144DC@illinois.edu> No problem, just froze the repo and rsynced to my local machine, so your commit made it just under the wire. chris On May 12, 2010, at 11:45 AM, Thomas Sharpton wrote: > Sorry if I screwed things up - updated before checking this email tread. > > -T > > On May 12, 2010, at 9:20 AM, Robert Buels wrote: > >> The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. >> >> R >> >> Heikki Lehvaslaiho wrote: >>> No problem at all. Go ahead. >>> -Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +966 545 595 849 office: +966 2 808 2429 >>> Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 >>> 4700 King Abdullah University of Science and Technology (KAUST) >>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>> On 12 May 2010 07:15, Chris Fields wrote: >>>> Just a friendly reminder that we'll freeze the dev subversion repository >>>> tomorrow prior to migration to github. The migration will take about an >>>> hour, during which all bioperl github repos will be replaced with the full >>>> repos, and devs added. The test repos will be removed around that time >>>> (Heikki, will that be a problem?). >>>> >>>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maizemu at gmail.com Wed May 12 13:12:28 2010 From: maizemu at gmail.com (Christopher Bottoms) Date: Wed, 12 May 2010 12:12:28 -0500 Subject: [Bioperl-l] Citing CPAN modules in scientific publications Message-ID: Dear BioPerlers, I am working on a publication which would be impossible without the use of several CPAN modules. I appreciate the work authors and maintainers have put into these modules and would like to acknowledge them by citing their work. I was thinking of a format such as Author(s), Maintainer(s) *Module::Name* [ http://search.cpan.org/dist/Module-Name] A reference for File::Slurp would appear thus: Uri Guttman, Dave Rolsky *File::Slurp* [ http://search.cpan.org/dist/File-Slurp] I guess that I could instead mention authors in an acknowledgment section. I noticed a large acknowledgment section in the BioPerl paper ( http://genome.cshlp.org/content/12/10/1611.full). Thanks for your time, Christopher Bottoms (molecules) From greg at ebi.ac.uk Wed May 12 14:16:53 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Wed, 12 May 2010 19:16:53 +0100 Subject: [Bioperl-l] BioPerl for indexing quality score files Message-ID: Hi all, I'm wondering if anyone has tried using BioPerl to index sequence quality score files? The files I'm looking at tend to look like Fasta files, but with numbers (between 0 and 99) and spaces instead of sequence strings. Something like: --- >chr1 0 20 20 20 50 99 99 99 99 30 30 20 20 10 10 0 0 0 0 --- (An example for Chimpanzee can be found here, as the file 'panTro2.quals.fa.gz': http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ ) I'm currently using a home-brewed file indexing system to access subsets of these quality scores, but it's kind of slow and (probably) buggy. I'd much rather use something like Bio::DB::Fasta, but (without having actually tried it) I expect it wouldn't be too happy with these not-quite-fasta format quality files. Has anyone run into a similar situation and found a solution using Bioperl (or something else)? I'd be happy to hack around a bit to get this to work, if necessary; if anyone could provide pointers on where to start, I'd be much obliged. Cheers, Greg PS - it's great to see the GitHub migration moving along so swiftly! I'll be *much* more likely to start bug-hunting and patch-submitting with the code living there now. :) From greg at ebi.ac.uk Wed May 12 14:26:26 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Wed, 12 May 2010 19:26:26 +0100 Subject: [Bioperl-l] BioPerl for indexing quality score files In-Reply-To: References: Message-ID: Ok, I need to shame myself with a huge "RTFM" for this one -- http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/DB/Qual.pm Sorry for the spam. Still happy about the GitHub, though! greg On 12 May 2010 19:16, Gregory Jordan wrote: > Hi all, > > I'm wondering if anyone has tried using BioPerl to index sequence quality > score files? The files I'm looking at tend to look like Fasta files, but > with numbers (between 0 and 99) and spaces instead of sequence strings. > Something like: > --- > >chr1 > 0 20 20 20 50 99 99 99 99 30 30 20 20 10 10 0 0 0 0 > --- > (An example for Chimpanzee can be found here, as the file > 'panTro2.quals.fa.gz': > http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ ) > > I'm currently using a home-brewed file indexing system to access subsets of > these quality scores, but it's kind of slow and (probably) buggy. I'd much > rather use something like Bio::DB::Fasta, but (without having actually tried > it) I expect it wouldn't be too happy with these not-quite-fasta format > quality files. > > Has anyone run into a similar situation and found a solution using Bioperl > (or something else)? > > I'd be happy to hack around a bit to get this to work, if necessary; if > anyone could provide pointers on where to start, I'd be much obliged. > > Cheers, > Greg > > PS - it's great to see the GitHub migration moving along so swiftly! I'll > be *much* more likely to start bug-hunting and patch-submitting with the > code living there now. :) > From cjfields at illinois.edu Wed May 12 14:48:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 13:48:53 -0500 Subject: [Bioperl-l] GitHub migration complete Message-ID: All, The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. I've added collaborators to almost all repositories; unfortunately, GitHub decided to remove 'copy permissions' for adding collaborators just recently, so we'll have to manually add each in to each repo until that is resolved (from what I hear, should be soon). In the meantime, if you are a bioperl developer and aren't listed as a github collaborator please sign up for a github account, add SSH keys, and let me know your github user name. I'll add you to bioperl-live and any other repos you want (please let me know which ones!). I'll be doing a few last-minute house-cleaning bits (adding post-receive hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. Let me know how it goes! chris From David.Messina at sbc.su.se Wed May 12 15:59:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 12 May 2010 21:59:14 +0200 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: Thanks, Chris! Clone and commit are working here. Dave From Kevin.M.Brown at asu.edu Wed May 12 16:06:38 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 12 May 2010 13:06:38 -0700 Subject: [Bioperl-l] Citing CPAN modules in scientific publications In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> Wouldn't the format of the citation actually be dictated by the publication the paper was going to be in? E.g. the APA guide sets the format to be: Jones, D. F. (2002). The Mental Measurement Tester (Version 3.2) [Computer software]. Fort Lauderdale, FL: Nova Southeastern University. Retrieved July 22, 2007. Available from http://www.buros.com/ Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Christopher Bottoms > Sent: Wednesday, May 12, 2010 10:12 AM > To: bioperl-l List > Subject: [Bioperl-l] Citing CPAN modules in scientific publications > > Dear BioPerlers, > > I am working on a publication which would be impossible > without the use of > several CPAN modules. I appreciate the work authors and > maintainers have put > into these modules and would like to acknowledge them by > citing their work. > > I was thinking of a format such as > Author(s), Maintainer(s) *Module::Name* [ > http://search.cpan.org/dist/Module-Name] > > > A reference for File::Slurp would appear thus: > > Uri Guttman, Dave Rolsky *File::Slurp* [ > http://search.cpan.org/dist/File-Slurp] > > > I guess that I could instead mention authors in an > acknowledgment section. I > noticed a large acknowledgment section in the BioPerl paper ( > http://genome.cshlp.org/content/12/10/1611.full). > > Thanks for your time, > Christopher Bottoms (molecules) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Wed May 12 16:35:27 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 12 May 2010 15:35:27 -0500 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> On May 12, 2010, at 1:48 PM, Chris Fields wrote: > The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. You mean clones? ;) Thanks Chris!! This is *awesome*. I'm really glad we're in git now and very much appreciate all your work on this. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Wed May 12 17:34:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 16:34:39 -0500 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> References: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> Message-ID: On May 12, 2010, at 3:35 PM, Jay Hannah wrote: > On May 12, 2010, at 1:48 PM, Chris Fields wrote: >> The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. > > You mean clones? ;) > > Thanks Chris!! This is *awesome*. I'm really glad we're in git now and very much appreciate all your work on this. > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah Yes, that was svn slipping in there... chris From maj at fortinbras.us Wed May 12 21:44:09 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 12 May 2010 21:44:09 -0400 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: <77C82E975CC24860AA16EE537E270FBD@NewLife> awesome job, Chris- MAJ (what's git again? Oh never mind...) ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Wednesday, May 12, 2010 2:48 PM Subject: [Bioperl-l] GitHub migration complete > All, > > The migration to github is now essentially complete, minus a few small > house-keeping details. Please let me know if there are problems with > checkouts. > > I've added collaborators to almost all repositories; unfortunately, GitHub > decided to remove 'copy permissions' for adding collaborators just recently, > so we'll have to manually add each in to each repo until that is resolved > (from what I hear, should be soon). In the meantime, if you are a bioperl > developer and aren't listed as a github collaborator please sign up for a > github account, add SSH keys, and let me know your github user name. I'll add > you to bioperl-live and any other repos you want (please let me know which > ones!). > > I'll be doing a few last-minute house-cleaning bits (adding post-receive > hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. Let > me know how it goes! > > chris > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maizemu at gmail.com Wed May 12 23:27:47 2010 From: maizemu at gmail.com (Christopher Bottoms) Date: Wed, 12 May 2010 22:27:47 -0500 Subject: [Bioperl-l] Citing CPAN modules in scientific publications In-Reply-To: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> Message-ID: Thanks. I was also wondering about listing the maintainer. I'm guessing not, since the maintainer can add herself (or himself) to the list of authors if she felt that she had contributed enough to warrant it. On Wed, May 12, 2010 at 3:06 PM, Kevin Brown wrote: > Wouldn't the format of the citation actually be dictated by the > publication the paper was going to be in? E.g. the APA guide sets the > format to be: > > Jones, D. F. (2002). The Mental Measurement Tester (Version 3.2) > [Computer software]. > Fort Lauderdale, FL: Nova Southeastern University. Retrieved > July 22, 2007. > Available from http://www.buros.com/ > > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Christopher Bottoms > > Sent: Wednesday, May 12, 2010 10:12 AM > > To: bioperl-l List > > Subject: [Bioperl-l] Citing CPAN modules in scientific publications > > > > Dear BioPerlers, > > > > I am working on a publication which would be impossible > > without the use of > > several CPAN modules. I appreciate the work authors and > > maintainers have put > > into these modules and would like to acknowledge them by > > citing their work. > > > > I was thinking of a format such as > > Author(s), Maintainer(s) *Module::Name* [ > > http://search.cpan.org/dist/Module-Name] > > > > > > A reference for File::Slurp would appear thus: > > > > Uri Guttman, Dave Rolsky *File::Slurp* [ > > http://search.cpan.org/dist/File-Slurp] > > > > > > I guess that I could instead mention authors in an > > acknowledgment section. I > > noticed a large acknowledgment section in the BioPerl paper ( > > http://genome.cshlp.org/content/12/10/1611.full). > > > > Thanks for your time, > > Christopher Bottoms (molecules) > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki.lehvaslaiho at gmail.com Thu May 13 02:11:40 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 09:11:40 +0300 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: <77C82E975CC24860AA16EE537E270FBD@NewLife> References: <77C82E975CC24860AA16EE537E270FBD@NewLife> Message-ID: It works. Bliss. Worth mentioning now on the list that the latest instructions are in http://www.bioperl.org/wiki/Using_Git I've recommitted the the two changes I did on the experimental repo. I had a small problem when editing the README text file: git was not showing differences between the original file and my edits. It kept saying that bala ~/src/bioperl-live> git diff README diff --git a/README b/README index 03685a8..8e20592 100644 Binary files a/README and b/README differ The reason, of course, was that a hard to detect binary character had slipped in to my edit. Just so that you know when this happens to you... -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 13 May 2010 04:44, Mark A. Jensen wrote: > awesome job, Chris- MAJ > (what's git again? Oh never mind...) > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Wednesday, May 12, 2010 2:48 PM > Subject: [Bioperl-l] GitHub migration complete > > > > All, >> >> The migration to github is now essentially complete, minus a few small >> house-keeping details. Please let me know if there are problems with >> checkouts. >> >> I've added collaborators to almost all repositories; unfortunately, GitHub >> decided to remove 'copy permissions' for adding collaborators just recently, >> so we'll have to manually add each in to each repo until that is resolved >> (from what I hear, should be soon). In the meantime, if you are a bioperl >> developer and aren't listed as a github collaborator please sign up for a >> github account, add SSH keys, and let me know your github user name. I'll >> add you to bioperl-live and any other repos you want (please let me know >> which ones!). >> >> I'll be doing a few last-minute house-cleaning bits (adding post-receive >> hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. >> Let me know how it goes! >> >> chris >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki.lehvaslaiho at gmail.com Thu May 13 02:20:51 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 09:20:51 +0300 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Just a thumbs up. Aaron's fix works. It problem seems to be limited to where he spotted it. I am working on refreshing my memory how the code work - it has been quite a few years since I wrote it - and will commit better tests. As of getting values outseide the defined region, that is a feature rather than a bug. The idea was to be able to ask what would the new coordinate be if the feature extended beyond the known limits. The is the capability of Bio::Coordinate::ExtrapolatingPair that is used here. That class also has a method strict that can be used to prevent extrapolating, but the code to access that has not been written into GeneMapper. I'll see if I can get it to work. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 13:23, Heikki Lehvaslaiho wrote: > Outch. I'll definitely have a look. > > Strange that none of the tests have picked this up... > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > > On 12 May 2010 01:40, Aaron Mackey wrote: > >> Hi Chris, >> >> I was hoping Heikki might take up the cause and investigate further -- >> let's >> give him a chance to respond. But it's a frightening bug if it's really >> been that way for all this time ... >> >> -Aaron >> >> On Tue, May 11, 2010 at 6:31 PM, Chris Fields >> wrote: >> >> > Aaron, >> > >> > Do we want to write this up as a set of tests to add to the bioperl test >> > suite? We can probably add it after the github migration tomorrow. >> > >> > chris >> > >> > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: >> > >> > > Hi Zerui (and others), >> > > >> > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, >> > > specifically in this code: >> > > >> > > lines: >> > > 1170: (-start => int ($loc->start / 3 ) +1, >> > > 1171: -end => int ($loc->end / 3 ) +1, >> > > >> > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 >> > > >> > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect >> peptide >> > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) >> > > >> > > There is also a problem when mapping exon coordinates that are >> > outside/after >> > > the CDS region (instead of getting undefined locations, you continue >> to >> > get >> > > peptide coordinates, but they are invalid, larger than the protein >> > length). >> > > >> > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for >> > you, >> > > as it uses this module to calculate codons for SNPs. >> > > >> > > -Aaron >> > > >> > > P.S. a script the demonstrates the problem: >> > > >> > > use Bio::Coordinate::GeneMapper; >> > > >> > > my $mapper = >> > > Bio::Coordinate::GeneMapper >> > > ->new( -in => "chr", >> > > -out => "propeptide", >> > > -exons => [ Bio::Location::Simple >> > > ->new( -start => 101, >> > > -end => 109 ), >> > > Bio::Location::Simple >> > > ->new( -start => 201, >> > > -end => 221 ), >> > > ], >> > > -cds => Bio::Location::Simple >> > > ->new(-start => 101, -end => 209), >> > > ); >> > > >> > > >> > > print join("\t", "chr", "aa"), "\n"; >> > > for my $pos (99..111,199..211) { >> > > my $res = $mapper->map( >> > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => >> > 1)); >> > > my $start = $res->start; $start = "NA" unless defined $start; >> > > my $end = $res->end; $end = "NA" unless defined $end; >> > > print join("\t", $pos, $start), "\n"; >> > > } >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From remi.planel at free.fr Thu May 13 05:08:58 2010 From: remi.planel at free.fr (Remi) Date: Thu, 13 May 2010 11:08:58 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast Message-ID: <4BEBC1AA.2020908@free.fr> Hi all, I'm using Bio::Tools::Run::StandAloneBlastPlus and trying to run a remote blast using this code : /my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'nr', -remote => '1', ); my $result = $fac->blastp( -query => 'P12996.fasta', -outfile => 'out.bls', ); /but I got an error message : "BLAST Database error: Protein BLAST database './nr' does not exist in the NCBI servers". But if I'm modifying directly the value of $fac->{'_db_path'} like : /$fac->{'_db_path'} = 'nr';/ it's working. Is that a Bug or am I missing something ? Thanks, R?mi From maj at fortinbras.us Thu May 13 07:17:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 13 May 2010 07:17:55 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast In-Reply-To: <4BEBC1AA.2020908@free.fr> References: <4BEBC1AA.2020908@free.fr> Message-ID: <1A1631149DEF4B9080E5D4D5851F4587@NewLife> Hi R?mi Looks like a bug-- can you report it via http://bugzilla.bioperl.org? Just enter what you've written here-- I appreciate it- Mark ----- Original Message ----- From: "Remi" To: "BioPerl List" Sent: Thursday, May 13, 2010 5:08 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast Hi all, I'm using Bio::Tools::Run::StandAloneBlastPlus and trying to run a remote blast using this code : /my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'nr', -remote => '1', ); my $result = $fac->blastp( -query => 'P12996.fasta', -outfile => 'out.bls', ); /but I got an error message : "BLAST Database error: Protein BLAST database './nr' does not exist in the NCBI servers". But if I'm modifying directly the value of $fac->{'_db_path'} like : /$fac->{'_db_path'} = 'nr';/ it's working. Is that a Bug or am I missing something ? Thanks, R?mi _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed May 12 16:10:36 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 12 May 2010 22:10:36 +0200 Subject: [Bioperl-l] Ohloh update Message-ID: <32ED5B44-061D-4634-9E5C-72E313E1A58C@sbc.su.se> Hi everyone, Ohloh account probably needs to be changed to point to our Github repo. I'd be happy to do it if someone adds me on there. Otherwise, could one of the admins check into that when they get a chance? Also, I notice it hasn't registered any commits since March 15th ? hopefully the repo change will wake it up or we may need to contact one of their admins again. Can anyone think of other external sites pointing to BioPerl which need updating, too? Dave From jay at jays.net Thu May 13 08:42:41 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 07:42:41 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <201005130328.o4D3S8Fs011865@portal.open-bio.org> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> Message-ID: ------- Comment #3 from cjfields at bioperl.org 2010-05-12 23:28 EST ------- > Ouch, that's a bit nasty. Taking advantage of git move and doing this on a > topic branch (topic/bug_3077) on github. I plan on cleaning up the 'jhannah' branch (renaming it 'topic/bug_2515', asking people for their input, merging to master). I plan on cleaning up the 'yapc10hackathon' branch. I can't remember what Robert and I left in there after YAPC last year. Should most of the other branches be deleted? If a branch hasn't been changed in more than a year and no one intends to jump into it in the coming year what purpose does it serve? Old tags can hang out forever, but shouldn't our branch list be tidy? (Specifically I would argue that old release number tags should hang out forever, but I don't see the point in any other ancient tags continuing to exist if their purpose isn't documented anywhere.) Are we serious about emulating this branching model? http://nvie.com/git-model If so then we need to create a 'develop' branch and only the release manager should touch 'master' and yahoos like me should be branching off of 'develop' instead, right? Counter argument: Since 'master' is the default branch and we want to encourage doc patches and typo corrections from the world making trivial contributions as easy as possible for everyone, I would think that using 'master' as the daily headstream would be better. So 'topic/bug_####' for each non-trivial Bugzilla ticket, and release managers can work their magic in 'release-#-#' branches. (Release branches old enough that there's no way we're going to patch them any more are deleted, and only the tag remains). Thoughts? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah P.S. My "up to date" OS X 10.6.3 machines both had git 1.5.3.1 on them. Upgrading to git 1.7.1 makes branch checkouts simpler. jhannah at minijaysnet~/src/bioperl-live$ git branch -a * master remotes/origin/HEAD -> origin/master remotes/origin/TRY_featureio_refactor remotes/origin/TRY_gff_refactor remotes/origin/TRY_locatableseq_refactor remotes/origin/anydbm-branch remotes/origin/bioperl remotes/origin/bioperl-branch-1-5-1 remotes/origin/bioperl-live remotes/origin/branch-06 remotes/origin/branch-07 remotes/origin/branch-07-ensembl-120 remotes/origin/branch-1-0-0 remotes/origin/branch-1-2 remotes/origin/branch-1-2-collection remotes/origin/branch-1-4 remotes/origin/branch-1-5-2 remotes/origin/branch-1-6 remotes/origin/branch-ensembl-m1 remotes/origin/branch-experimental remotes/origin/featann_rollback remotes/origin/internal-branch-pre-delete-06-tag remotes/origin/jhannah remotes/origin/lightweight_feature_branch remotes/origin/master remotes/origin/ontology-cache remotes/origin/release-0-04-bug remotes/origin/restriction-refactor remotes/origin/stable-0-05 remotes/origin/stable-0-05-new remotes/origin/steve_chervitz remotes/origin/topic/bug_3077 remotes/origin/yapc10hackathon jhannah at minijaysnet~/src/bioperl-live$ git tag after-05-06-merge after-05-06-merge-2 after004 before-05-to-06-merge before-05-to-06-trunk bioperl-06-1 bioperl-061-pre1 bioperl-1-0-0 bioperl-1-0-alpha bioperl-1-0-alpha2-rc bioperl-1-2-1-rc1 bioperl-1-6-0_001 bioperl-1-6-0_002 bioperl-1-6-0_003 bioperl-1-6-0_004 bioperl-1-6-0_005 bioperl-1-6-0_006 bioperl-1-6-RC1 bioperl-1-6-RC2 bioperl-1-6-RC2_15306 bioperl-1-6-RC3 bioperl-1-6-RC3_15392 bioperl-1-6-RC4 bioperl-devel-1-1-1 bioperl-devel-1-3-01 bioperl-devel-1-3-02 bioperl-devel-1-3-03 bioperl-devel-1-3-04 bioperl-release-1-0-0 bioperl-release-1-0-1 bioperl-release-1-0-2 bioperl-release-1-1-0 bioperl-release-1-2-0 bioperl-release-1-2-1 bioperl-release-1-2-2 bioperl-release-1-2-3 bioperl-release-1-4-0 bioperl-release-1-5-0 bioperl-release-1-5-0-rc1 bioperl-release-1-5-0-rc2 bioperl-release-1-5-1 bioperl-release-1-5-1-rc4 bioperl-release-1-5-2 bioperl-release-1-5-2-patch1 bioperl-release-1-5-2-patch2 bioperl-release-1-6 bioperl-release-1-6-1 bioperl-run-release-1-2-0 for_gmod_0_003 gbrowse_1_65 join-0-04-to-0-05 lightweight_feature ontology-fix1 ontology-overhaul-end ontology-overhaul-start prerelease-06 release-0-04-1 release-0-04-2 release-0-04-3 release-0-04-4 release-0-05 release-0-05-1 release-0-7-0 release-0-7-1 release-0-7-2 release-0-9-0 release-0-9-2 release-0-9-3 release-06 release-06-2 release-1_01 release-ensembl-06 snapshot-at-head-of-07-branch start tag-ensembl-stable-061 From cjfields at illinois.edu Thu May 13 09:49:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 08:49:19 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> Message-ID: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> On May 13, 2010, at 7:42 AM, Jay Hannah wrote: > ------- Comment #3 from cjfields at bioperl.org 2010-05-12 23:28 EST ------- >> Ouch, that's a bit nasty. Taking advantage of git move and doing this on a >> topic branch (topic/bug_3077) on github. > > I plan on cleaning up the 'jhannah' branch (renaming it 'topic/bug_2515', asking people for their input, merging to master). > > I plan on cleaning up the 'yapc10hackathon' branch. I can't remember what Robert and I left in there after YAPC last year. > > Should most of the other branches be deleted? If a branch hasn't been changed in more than a year and no one intends to jump into it in the coming year what purpose does it serve? Old tags can hang out forever, but shouldn't our branch list be tidy? (Specifically I would argue that old release number tags should hang out forever, but I don't see the point in any other ancient tags continuing to exist if their purpose isn't documented anywhere.) I would say err on the safe side and keep the ones we're unsure of, but a cleanup would be nice. We could adopt what Moose has done and move branches we're unsure of to something like 'attic'. > Are we serious about emulating this branching model? > > http://nvie.com/git-model > > If so then we need to create a 'develop' branch and only the release manager should touch 'master' and yahoos like me should be branching off of 'develop' instead, right? > > Counter argument: Since 'master' is the default branch and we want to encourage doc patches and typo corrections from the world making trivial contributions as easy as possible for everyone, I would think that using 'master' as the daily headstream would be better. So 'topic/bug_####' for each non-trivial Bugzilla ticket, and release managers can work their magic in 'release-#-#' branches. (Release branches old enough that there's no way we're going to patch them any more are deleted, and only the tag remains). ... > Thoughts? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > P.S. My "up to date" OS X 10.6.3 machines both had git 1.5.3.1 on them. Upgrading to git 1.7.1 makes branch checkouts simpler. Moose has a 'stable' branch that release managers (the cabal) pull into from 'master' for releases. It's just a matter of semantics, what name we use for active development branches and what to use for stable releases; for us, the 'develop' and 'master' from that link could be (respectively) 'master' and 'stable'. 'hotfixes' would be bug fixes, and 'feature branches' would be just that, new features to be added. As for bug fixes, it would be much nicer to have most changes beyond very simple ones (including all bug fixes) relegated to branches that can be merged in. This sequesters any changes to the branch, where they can be tested prior to a merge. Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. chris From jay at jays.net Thu May 13 10:38:20 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 09:38:20 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: So, like this? Flow diagram: http://biodoc.ist.unomaha.edu/~jhannah/tmp/branches.png master (git and github default) Trivial changes committed directly here. topic/bug_#### One branch per non-trivial Bugzilla ticket topic/jhannah_crazy_idea Branches for unstable/unfinished work stable Release manager pulls from master to stable periodically (all tests are passing, etc.) release-#-#-# Pulled from stable, pushed to CPAN attic/* Any branch with no activity for 1 year I like it. > Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? I'm fine with attic/ and just leaving stuff in there until 2050. Then we should probably delete them. :) My understanding is that by default commits that have no pointers to them (branches or tags or subsequent commits) are subject to cleanup/prune. I think this means that if someone, 10 years ago, committed 3 times to the branch "jhannah_crazy_idea" and that branch is deleted, then those 3 commits may be removed (gone forever) by git cleanup/prune. This is a feature or a crime against humanity depending on who you ask. It can be disabled in a normal repo, I don't know about github. > Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. As I collect clues I'll be brain dumping everything I think I know onto the wiki. This is a crazy busy week for me though. :( Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Thu May 13 11:00:05 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 10:00:05 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: On May 13, 2010, at 8:49 AM, Chris Fields wrote: > Saying that, we could adopt a workflow policy that allows deletion of any merged branch. Right. Except for release-* branches, which are never merged anywhere. A release is a branch while it's being prepared and tweaked. Once perfect, it is tagged and pushed to CPAN. At that point the branch can be deleted since we can never push that release number to CPAN again (even if we wanted to). The tag remains forever. Or am I mistaken? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From shalabh.sharma7 at gmail.com Thu May 13 11:07:26 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 13 May 2010 11:07:26 -0400 Subject: [Bioperl-l] parsing blast report with long description Message-ID: Hi All, I need some help in parsing blast output. I have a inhouse database that contain sequences with really long description. >SMPL_IDI_1105131728043 /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV So my blast report looks like this: ..... ..... >SMPL_IDI_1105131728043 /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 6887/Open Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 Length = 213 Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix adjust. Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) ..... ..... (note that the tag "TI_1000008216887" is splitting in two lines). I am using SeqIO to parse this report. What i am doing is parsing the description field again to get all the tags. like .... .... my $desc = $hit->description; my @f = split('/',$desc); for(my $i = 0;$i < scalar @f;$i++){ print OUT "$f[$i]\t";} ..... ..... *I am getting the perfect parsed report but the field with TI_1000008216887 has a space **TI_100000821 6887 *. I would really appreciate if anyone can help me out. Thanks Shalabh Sharma From joshpk105 at gmail.com Thu May 13 10:42:28 2010 From: joshpk105 at gmail.com (Katz) Date: Thu, 13 May 2010 07:42:28 -0700 (PDT) Subject: [Bioperl-l] RemoteBlast Message-ID: <54674635-db43-413c-8c96-0d214f1b978d@l31g2000yqm.googlegroups.com> Is there anyway to differentiate between the three different ncbi blastn? Right now I'm using RemoteBlast as follows: Bio::Tools::Run::RemoteBlast->new(-prog => 'blastn', -data => 'nr', - expect => '1e-5', -readmethod => 'SearchIO'); then blasting my files. However, this is auto using megablastn and i need to use regular blastn. Thx, Josh From hlapp at drycafe.net Thu May 13 11:43:47 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 11:43:47 -0400 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> On May 13, 2010, at 9:49 AM, Chris Fields wrote: > Re: deletion of branches, I'm only really in support of deleting > feature branches that have been merged back to 'master' or another > branch (e.g. only removed using 'git branch -d foo'). I agree. > Older subversion release branches don't tend to fall into that > category, in that we had merged or cherry-picked changes from svn > trunk to them, not vice versa; they were never merged back to > trunk. Deletion in this case would be somewhat history-revising, > correct? I wouldn't call it history-revising. I also think it's OK to delete release branches that are no longer supported, iff we have a tag for the release itself. That's different from counting inactivity. A branch may lie dormant for a year or longer until someone has time to pick it back up again - I don't see the harm in keeping those around. > Saying that, we could adopt a workflow policy that allows deletion > of any merged branch. All this suggests coming up with a good > 'Contributing' document. That would be highly useful. I'll also voice a word of caution here though - I find it kind of ironic that the switch to git, which is supposed to make contribution *easier*, very often leads subsequently to complex commit/pull/push/branching workflows being instituted for projects that take pages and pages to document, a lot of time to ingest, and discipline to follow - it seems to be very easy and tempting to go overboard with this. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Thu May 13 12:01:05 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 11:01:05 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: On May 13, 2010, at 10:43 AM, Hilmar Lapp wrote: > On May 13, 2010, at 9:49 AM, Chris Fields wrote: >> Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. > > That would be highly useful. I'll also voice a word of caution here though - I find it kind of ironic that the switch to git, which is supposed to make contribution *easier*, very often leads subsequently to complex commit/pull/push/branching workflows being instituted for projects that take pages and pages to document, a lot of time to ingest, and discipline to follow - it seems to be very easy and tempting to go overboard with this. I'm happy to comply with whatever the policy is. If that policy is "everything trivial in master, non-trivial in topic/FOO, release manager will figure out everything else" that's fine with me. A branch cleanup would be nice. Or I'll just close my eyes. :) I'm embarrassed that I left unfinished business in branches in 2009. I'm fishing for a consensus on a contribution policy. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From heikki.lehvaslaiho at gmail.com Thu May 13 12:48:14 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 19:48:14 +0300 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: I second Hilmar. Let's try to keep this simple. While for most people just beginning to use git this discussion seems confusing and the structures complex, things really are pretty simple. I expect most of the branches to live only in developers copies of the repo. They are created when work starts on the new bug or a feature, merged to master when work is done, and removed immediately or soon after that. Most of the work is done in the master and only the release managers touch the stable and release branches. See Jay's flow diagram. Work flow for this is (while calling 'git status' all the time): git branch $new git checkout $new # work git commit git commit ... git checkout master git merge $new git push ... git branch -d $new -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 13 May 2010 18:43, Hilmar Lapp wrote: > > On May 13, 2010, at 9:49 AM, Chris Fields wrote: > > Re: deletion of branches, I'm only really in support of deleting feature >> branches that have been merged back to 'master' or another branch (e.g. only >> removed using 'git branch -d foo'). >> > > I agree. > > > Older subversion release branches don't tend to fall into that category, >> in that we had merged or cherry-picked changes from svn trunk to them, not >> vice versa; they were never merged back to trunk. Deletion in this case >> would be somewhat history-revising, correct? >> > > I wouldn't call it history-revising. I also think it's OK to delete release > branches that are no longer supported, iff we have a tag for the release > itself. > > That's different from counting inactivity. A branch may lie dormant for a > year or longer until someone has time to pick it back up again - I don't see > the harm in keeping those around. > > > Saying that, we could adopt a workflow policy that allows deletion of any >> merged branch. All this suggests coming up with a good 'Contributing' >> document. >> > > That would be highly useful. I'll also voice a word of caution here though > - I find it kind of ironic that the switch to git, which is supposed to make > contribution *easier*, very often leads subsequently to complex > commit/pull/push/branching workflows being instituted for projects that take > pages and pages to document, a lot of time to ingest, and discipline to > follow - it seems to be very easy and tempting to go overboard with this. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 13 17:41:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 16:41:35 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: On May 13, 2010, at 11:48 AM, Heikki Lehvaslaiho wrote: > I second Hilmar. Let's try to keep this simple. > > While for most people just beginning to use git this discussion seems > confusing and the structures complex, things really are pretty simple. > > I expect most of the branches to live only in developers copies of the repo. > They are created when work starts on the new bug or a feature, merged to > master when work is done, and removed immediately or soon after that. Most > of the work is done in the master and only the release managers touch the > stable and release branches. See Jay's flow diagram. Right, many branches will occur locally. And I'm not suggesting that we strictly follow a particular pattern; I would rather not enforce that upon devs who already have a productive pattern set. I think this would act more as a suggested method of development, something that has been demonstrated to work well for other large projects (and something I'll be following). What I would really like to promote is using branches for making code changes, even ones that are only a few commits or so (and even if they are only local ones not pushed to github). Branches are cheap. > Work flow for this is (while calling 'git status' all the time): > > git branch $new > git checkout $new > # work > git commit > git commit > ... > git checkout master > git merge $new > git push > ... > git branch -d $new > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia Yes, that's essentially the basic workflow, maybe with a preliminary 'git pull' to sync to the latest. chris > On 13 May 2010 18:43, Hilmar Lapp wrote: > >> >> On May 13, 2010, at 9:49 AM, Chris Fields wrote: >> >> Re: deletion of branches, I'm only really in support of deleting feature >>> branches that have been merged back to 'master' or another branch (e.g. only >>> removed using 'git branch -d foo'). >>> >> >> I agree. >> >> >> Older subversion release branches don't tend to fall into that category, >>> in that we had merged or cherry-picked changes from svn trunk to them, not >>> vice versa; they were never merged back to trunk. Deletion in this case >>> would be somewhat history-revising, correct? >>> >> >> I wouldn't call it history-revising. I also think it's OK to delete release >> branches that are no longer supported, iff we have a tag for the release >> itself. >> >> That's different from counting inactivity. A branch may lie dormant for a >> year or longer until someone has time to pick it back up again - I don't see >> the harm in keeping those around. >> >> >> Saying that, we could adopt a workflow policy that allows deletion of any >>> merged branch. All this suggests coming up with a good 'Contributing' >>> document. >>> >> >> That would be highly useful. I'll also voice a word of caution here though >> - I find it kind of ironic that the switch to git, which is supposed to make >> contribution *easier*, very often leads subsequently to complex >> commit/pull/push/branching workflows being instituted for projects that take >> pages and pages to document, a lot of time to ingest, and discipline to >> follow - it seems to be very easy and tempting to go overboard with this. >> >> -hilmar >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu May 13 17:56:11 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 14:56:11 -0700 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: <4BEC757B.5030407@cornell.edu> OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. Rob From jay at jays.net Thu May 13 18:00:21 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 17:00:21 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <4BEC757B.5030407@cornell.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: <7BA7535D-AE97-4827-8B86-91C24842BAED@jays.net> On May 13, 2010, at 4:56 PM, Robert Buels wrote: > OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. > > I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. master++ Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From rmb32 at cornell.edu Thu May 13 18:13:52 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 15:13:52 -0700 Subject: [Bioperl-l] move ancient branches to attic Message-ID: <4BEC79A0.5000505@cornell.edu> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. Note that there are still tags for all the old releases, so those won't be lost. Thoughts? Rob From jay at jays.net Thu May 13 18:22:30 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 17:22:30 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC79A0.5000505@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> Message-ID: On May 13, 2010, at 5:13 PM, Robert Buels wrote: > To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. > > Note that there are still tags for all the old releases, so those won't be lost. Sounds generous to me. proceed++ Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From hlapp at drycafe.net Thu May 13 18:46:00 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 18:46:00 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC79A0.5000505@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> Message-ID: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Why? What is the gain from deleting branches that you don't know whether they are dead or not? -hilmar On May 13, 2010, at 6:13 PM, Robert Buels wrote: > To clean up branches, I propose to deleting branches (merged or not) > whose head is older than Jan 1, 2006, and moving branches to attic/ > whose head is older than Jan 1, 2009. > > Note that there are still tags for all the old releases, so those > won't be lost. > > Thoughts? > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From rmb32 at cornell.edu Thu May 13 19:05:06 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 16:05:06 -0700 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <4BEC85A2.50401@cornell.edu> The gain is to avoid having useless things hanging around. Every time somebody has to read through a list of 50 branches to find the maybe 5 that are useful, it's time lost. In other word, it's the same gain that you get from cleaning off your desk, so that you can see where you put things. Rob Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether > they are dead or not? > > -hilmar > > On May 13, 2010, at 6:13 PM, Robert Buels wrote: > >> To clean up branches, I propose to deleting branches (merged or not) >> whose head is older than Jan 1, 2006, and moving branches to attic/ >> whose head is older than Jan 1, 2009. >> >> Note that there are still tags for all the old releases, so those >> won't be lost. >> >> Thoughts? >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 13 19:07:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 18:07:31 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <4BEC757B.5030407@cornell.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: 'master'. That's more in lone with other repos. chris On May 13, 2010, at 4:56 PM, Robert Buels wrote: > OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. > > I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu May 13 20:27:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:27:22 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC85A2.50401@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> Message-ID: <77C06787-B381-43AA-8F5A-74331866C495@illinois.edu> Let's go through and check which branches are specifically merged back to trunk and delete those first, then list the ones that aren't or we're unsure of. If needed we can move those to an 'attic', like Moose. chris On May 13, 2010, at 6:05 PM, Robert Buels wrote: > The gain is to avoid having useless things hanging around. Every time somebody has to read through a list of 50 branches to find the maybe 5 that are useful, it's time lost. > > In other word, it's the same gain that you get from cleaning off your desk, so that you can see where you put things. > > Rob > > > Hilmar Lapp wrote: >> Why? What is the gain from deleting branches that you don't know whether they are dead or not? >> -hilmar >> On May 13, 2010, at 6:13 PM, Robert Buels wrote: >>> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. >>> >>> Note that there are still tags for all the old releases, so those won't be lost. >>> >>> Thoughts? >>> >>> Rob >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu May 13 20:28:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:28:30 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: <6757E1DD-5712-4894-8EAF-52F5F902D348@illinois.edu> On May 13, 2010, at 9:38 AM, Jay Hannah wrote: > So, like this? > > Flow diagram: > http://biodoc.ist.unomaha.edu/~jhannah/tmp/branches.png > > master > (git and github default) Trivial changes committed directly here. > topic/bug_#### > One branch per non-trivial Bugzilla ticket > topic/jhannah_crazy_idea > Branches for unstable/unfinished work > stable > Release manager pulls from master to stable periodically (all tests are passing, etc.) > release-#-#-# > Pulled from stable, pushed to CPAN > attic/* > Any branch with no activity for 1 year > > I like it. Yes, something along those lines. >> Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? > > I'm fine with attic/ and just leaving stuff in there until 2050. Then we should probably delete them. :) > > My understanding is that by default commits that have no pointers to them (branches or tags or subsequent commits) are subject to cleanup/prune. I think this means that if someone, 10 years ago, committed 3 times to the branch "jhannah_crazy_idea" and that branch is deleted, then those 3 commits may be removed (gone forever) by git cleanup/prune. > > This is a feature or a crime against humanity depending on who you ask. It can be disabled in a normal repo, I don't know about github. I don't think this is disabled in github (e.g. one can still delete branches). Duke Leto suggested the only real way to prevent history revising commits would be to do a pre-commit hook, which is not supported right now in github. >> Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. > > As I collect clues I'll be brain dumping everything I think I know onto the wiki. This is a crazy busy week for me though. :( > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah No problem. chris From cjfields at illinois.edu Thu May 13 20:41:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:41:57 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> It would be nice to at least designate them as outdated in some respect, and organize them along those lines. chris On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether they are dead or not? > > -hilmar > > On May 13, 2010, at 6:13 PM, Robert Buels wrote: > >> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. >> >> Note that there are still tags for all the old releases, so those won't be lost. >> >> Thoughts? >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu May 13 20:55:01 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 20:55:01 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> Message-ID: On May 13, 2010, at 8:41 PM, Chris Fields wrote: > It would be nice to at least designate them as outdated in some > respect, and organize them along those lines. I agree. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Thu May 13 21:04:02 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 21:04:02 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC85A2.50401@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> Message-ID: <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> On May 13, 2010, at 7:05 PM, Robert Buels wrote: > The gain is to avoid having useless things hanging around. Every > time somebody has to read through a list of 50 branches to find the > maybe 5 that are useful, it's time lost. > > In other word, it's the same gain that you get from cleaning off > your desk, so that you can see where you put things. Hold on - that's not a good comparison is it? First off, this being git, the "main" repo is not your desk. You can have your desk and wipe it clean of all branches and tags that have ever existed, without affecting, or imposing this on, anyone else. Second, why would you *want* to look through all those branches? This being git, you create branches all the time and merge them back, on your own repo, right? Where in this workflow are you browsing through the 50 branches of the "main" repo all the time? Third, and maybe I'm just too old, but moving to git because branching and having your own clone exactly the way you want it is so easy, only to subsequently delete most of the branches on the "main" repo for primarily aesthetic reasons just doesn't make much sense to me, honestly. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From heikki.lehvaslaiho at gmail.com Fri May 14 06:41:22 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Fri, 14 May 2010 13:41:22 +0300 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: Yep. master. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 14 May 2010 02:07, Chris Fields wrote: > 'master'. That's more in lone with other repos. > > chris > > On May 13, 2010, at 4:56 PM, Robert Buels wrote: > > > OK then, decision time, which is the main devel branch, 'master' or > 'develop'? I need to merge in a few small bugfixes. > > > > I vote for 'master', since it's slightly simpler for new devs, with > releases being constructed in branches off of that. > > > > Rob > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From heikki.lehvaslaiho at gmail.com Fri May 14 06:45:50 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Fri, 14 May 2010 13:45:50 +0300 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> Message-ID: Rob, If you think is important, do a survay and create a nice wiki page explaing these braches to everyone. Then we can discuss if some of them are best deleted. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 14 May 2010 04:04, Hilmar Lapp wrote: > > On May 13, 2010, at 7:05 PM, Robert Buels wrote: > > The gain is to avoid having useless things hanging around. Every time >> somebody has to read through a list of 50 branches to find the maybe 5 that >> are useful, it's time lost. >> >> In other word, it's the same gain that you get from cleaning off your >> desk, so that you can see where you put things. >> > > > Hold on - that's not a good comparison is it? First off, this being git, > the "main" repo is not your desk. You can have your desk and wipe it clean > of all branches and tags that have ever existed, without affecting, or > imposing this on, anyone else. > > Second, why would you *want* to look through all those branches? This being > git, you create branches all the time and merge them back, on your own repo, > right? Where in this workflow are you browsing through the 50 branches of > the "main" repo all the time? > > Third, and maybe I'm just too old, but moving to git because branching and > having your own clone exactly the way you want it is so easy, only to > subsequently delete most of the branches on the "main" repo for primarily > aesthetic reasons just doesn't make much sense to me, honestly. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Fri May 14 09:32:04 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 14 May 2010 08:32:04 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether they are dead or not? If our branch list was clean they wouldn't dupe up when I go to merge in other people's contributions. You don't find large lists of probably dead things annoying? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah jhannah at cplreynoldslpt:~/src/bioperl-live$ git remote add vinanna git://github.com/vinanna/bioperl-live.gitjhannah at cplreynoldslpt:~/src/bioperl-live$ git fetch vinanna remote: Counting objects: 18, done. remote: Compressing objects: 100% (9/9), done. remote: Total 10 (delta 8), reused 0 (delta 0) Unpacking objects: 100% (10/10), done. >From git://github.com/vinanna/bioperl-live * [new branch] TRY_featureio_refactor -> vinanna/TRY_featureio_refactor * [new branch] TRY_gff_refactor -> vinanna/TRY_gff_refactor * [new branch] TRY_locatableseq_refactor -> vinanna/TRY_locatableseq_refactor * [new branch] anydbm-branch -> vinanna/anydbm-branch * [new branch] bioperl -> vinanna/bioperl * [new branch] bioperl-branch-1-5-1 -> vinanna/bioperl-branch-1-5-1 * [new branch] bioperl-live -> vinanna/bioperl-live * [new branch] branch-06 -> vinanna/branch-06 * [new branch] branch-07 -> vinanna/branch-07 * [new branch] branch-07-ensembl-120 -> vinanna/branch-07-ensembl-120 * [new branch] branch-1-0-0 -> vinanna/branch-1-0-0 * [new branch] branch-1-2 -> vinanna/branch-1-2 * [new branch] branch-1-2-collection -> vinanna/branch-1-2-collection * [new branch] branch-1-4 -> vinanna/branch-1-4 * [new branch] branch-1-5-2 -> vinanna/branch-1-5-2 * [new branch] branch-1-6 -> vinanna/branch-1-6 * [new branch] branch-ensembl-m1 -> vinanna/branch-ensembl-m1 * [new branch] branch-experimental -> vinanna/branch-experimental * [new branch] featann_rollback -> vinanna/featann_rollback * [new branch] internal-branch-pre-delete-06-tag -> vinanna/internal-branch-pre-delete-06-tag * [new branch] lightweight_feature_branch -> vinanna/lightweight_feature_branch * [new branch] master -> vinanna/master * [new branch] ontology-cache -> vinanna/ontology-cache * [new branch] release-0-04-bug -> vinanna/release-0-04-bug * [new branch] restriction-refactor -> vinanna/restriction-refactor * [new branch] stable-0-05 -> vinanna/stable-0-05 * [new branch] stable-0-05-new -> vinanna/stable-0-05-new * [new branch] steve_chervitz -> vinanna/steve_chervitz * [new branch] topic/bug_2515 -> vinanna/topic/bug_2515 jhannah at cplreynoldslpt:~/src/bioperl-live$ git branch -a * master remotes/origin/HEAD -> origin/master remotes/origin/TRY_featureio_refactor remotes/origin/TRY_gff_refactor remotes/origin/TRY_locatableseq_refactor remotes/origin/anydbm-branch remotes/origin/bioperl remotes/origin/bioperl-branch-1-5-1 remotes/origin/bioperl-live remotes/origin/branch-06 remotes/origin/branch-07 remotes/origin/branch-07-ensembl-120 remotes/origin/branch-1-0-0 remotes/origin/branch-1-2 remotes/origin/branch-1-2-collection remotes/origin/branch-1-4 remotes/origin/branch-1-5-2 remotes/origin/branch-1-6 remotes/origin/branch-ensembl-m1 remotes/origin/branch-experimental remotes/origin/featann_rollback remotes/origin/internal-branch-pre-delete-06-tag remotes/origin/jhannah remotes/origin/lightweight_feature_branch remotes/origin/master remotes/origin/ontology-cache remotes/origin/release-0-04-bug remotes/origin/restriction-refactor remotes/origin/stable-0-05 remotes/origin/stable-0-05-new remotes/origin/steve_chervitz remotes/origin/topic/bug_2515 remotes/origin/yapc10hackathon remotes/vinanna/TRY_featureio_refactor remotes/vinanna/TRY_gff_refactor remotes/vinanna/TRY_locatableseq_refactor remotes/vinanna/anydbm-branch remotes/vinanna/bioperl remotes/vinanna/bioperl-branch-1-5-1 remotes/vinanna/bioperl-live remotes/vinanna/branch-06 remotes/vinanna/branch-07 remotes/vinanna/branch-07-ensembl-120 remotes/vinanna/branch-1-0-0 remotes/vinanna/branch-1-2 remotes/vinanna/branch-1-2-collection remotes/vinanna/branch-1-4 remotes/vinanna/branch-1-5-2 remotes/vinanna/branch-1-6 remotes/vinanna/branch-ensembl-m1 remotes/vinanna/branch-experimental remotes/vinanna/featann_rollback remotes/vinanna/internal-branch-pre-delete-06-tag remotes/vinanna/lightweight_feature_branch remotes/vinanna/master remotes/vinanna/ontology-cache remotes/vinanna/release-0-04-bug remotes/vinanna/restriction-refactor remotes/vinanna/stable-0-05 remotes/vinanna/stable-0-05-new remotes/vinanna/steve_chervitz remotes/vinanna/topic/bug_2515 From cjfields at illinois.edu Fri May 14 09:47:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 08:47:05 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> Message-ID: <2309AD4D-9FEA-4463-A4FD-519F0FCA2639@illinois.edu> To me, this is more a problem with the way forks currently work in github, via automatically dup-ing all branches vs allowing a single branch ('master', for instance). In fairness, that makes sense if they're implementing this the way I think, in order to conserve space. There are other small issues on github that should be worked out, for instance the automatic addition of all collabs with pull requests, since these go to bioperl-guts now. At least, I got a dup email from the last pull request. Some fixes are supposedly being planned for group-like accounts, just don't know when they'll appear. But I think the overall benefits of github outweigh some of the bumps in the road we're seeing. chris On May 14, 2010, at 8:32 AM, Jay Hannah wrote: > On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: >> Why? What is the gain from deleting branches that you don't know whether they are dead or not? > > If our branch list was clean they wouldn't dupe up when I go to merge in other people's contributions. > > You don't find large lists of probably dead things annoying? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > > > jhannah at cplreynoldslpt:~/src/bioperl-live$ git remote add vinanna git://github.com/vinanna/bioperl-live.gitjhannah at cplreynoldslpt:~/src/bioperl-live$ git fetch vinanna > remote: Counting objects: 18, done. > remote: Compressing objects: 100% (9/9), done. > remote: Total 10 (delta 8), reused 0 (delta 0) > Unpacking objects: 100% (10/10), done. >> From git://github.com/vinanna/bioperl-live > * [new branch] TRY_featureio_refactor -> vinanna/TRY_featureio_refactor > * [new branch] TRY_gff_refactor -> vinanna/TRY_gff_refactor > * [new branch] TRY_locatableseq_refactor -> vinanna/TRY_locatableseq_refactor > * [new branch] anydbm-branch -> vinanna/anydbm-branch > * [new branch] bioperl -> vinanna/bioperl > * [new branch] bioperl-branch-1-5-1 -> vinanna/bioperl-branch-1-5-1 > * [new branch] bioperl-live -> vinanna/bioperl-live > * [new branch] branch-06 -> vinanna/branch-06 > * [new branch] branch-07 -> vinanna/branch-07 > * [new branch] branch-07-ensembl-120 -> vinanna/branch-07-ensembl-120 > * [new branch] branch-1-0-0 -> vinanna/branch-1-0-0 > * [new branch] branch-1-2 -> vinanna/branch-1-2 > * [new branch] branch-1-2-collection -> vinanna/branch-1-2-collection > * [new branch] branch-1-4 -> vinanna/branch-1-4 > * [new branch] branch-1-5-2 -> vinanna/branch-1-5-2 > * [new branch] branch-1-6 -> vinanna/branch-1-6 > * [new branch] branch-ensembl-m1 -> vinanna/branch-ensembl-m1 > * [new branch] branch-experimental -> vinanna/branch-experimental > * [new branch] featann_rollback -> vinanna/featann_rollback > * [new branch] internal-branch-pre-delete-06-tag -> vinanna/internal-branch-pre-delete-06-tag > * [new branch] lightweight_feature_branch -> vinanna/lightweight_feature_branch > * [new branch] master -> vinanna/master > * [new branch] ontology-cache -> vinanna/ontology-cache > * [new branch] release-0-04-bug -> vinanna/release-0-04-bug > * [new branch] restriction-refactor -> vinanna/restriction-refactor > * [new branch] stable-0-05 -> vinanna/stable-0-05 > * [new branch] stable-0-05-new -> vinanna/stable-0-05-new > * [new branch] steve_chervitz -> vinanna/steve_chervitz > * [new branch] topic/bug_2515 -> vinanna/topic/bug_2515 > jhannah at cplreynoldslpt:~/src/bioperl-live$ git branch -a > * master > remotes/origin/HEAD -> origin/master > remotes/origin/TRY_featureio_refactor > remotes/origin/TRY_gff_refactor > remotes/origin/TRY_locatableseq_refactor > remotes/origin/anydbm-branch > remotes/origin/bioperl > remotes/origin/bioperl-branch-1-5-1 > remotes/origin/bioperl-live > remotes/origin/branch-06 > remotes/origin/branch-07 > remotes/origin/branch-07-ensembl-120 > remotes/origin/branch-1-0-0 > remotes/origin/branch-1-2 > remotes/origin/branch-1-2-collection > remotes/origin/branch-1-4 > remotes/origin/branch-1-5-2 > remotes/origin/branch-1-6 > remotes/origin/branch-ensembl-m1 > remotes/origin/branch-experimental > remotes/origin/featann_rollback > remotes/origin/internal-branch-pre-delete-06-tag > remotes/origin/jhannah > remotes/origin/lightweight_feature_branch > remotes/origin/master > remotes/origin/ontology-cache > remotes/origin/release-0-04-bug > remotes/origin/restriction-refactor > remotes/origin/stable-0-05 > remotes/origin/stable-0-05-new > remotes/origin/steve_chervitz > remotes/origin/topic/bug_2515 > remotes/origin/yapc10hackathon > remotes/vinanna/TRY_featureio_refactor > remotes/vinanna/TRY_gff_refactor > remotes/vinanna/TRY_locatableseq_refactor > remotes/vinanna/anydbm-branch > remotes/vinanna/bioperl > remotes/vinanna/bioperl-branch-1-5-1 > remotes/vinanna/bioperl-live > remotes/vinanna/branch-06 > remotes/vinanna/branch-07 > remotes/vinanna/branch-07-ensembl-120 > remotes/vinanna/branch-1-0-0 > remotes/vinanna/branch-1-2 > remotes/vinanna/branch-1-2-collection > remotes/vinanna/branch-1-4 > remotes/vinanna/branch-1-5-2 > remotes/vinanna/branch-1-6 > remotes/vinanna/branch-ensembl-m1 > remotes/vinanna/branch-experimental > remotes/vinanna/featann_rollback > remotes/vinanna/internal-branch-pre-delete-06-tag > remotes/vinanna/lightweight_feature_branch > remotes/vinanna/master > remotes/vinanna/ontology-cache > remotes/vinanna/release-0-04-bug > remotes/vinanna/restriction-refactor > remotes/vinanna/stable-0-05 > remotes/vinanna/stable-0-05-new > remotes/vinanna/steve_chervitz > remotes/vinanna/topic/bug_2515 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Fri May 14 09:56:48 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 14 May 2010 09:56:48 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> Message-ID: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> On May 14, 2010, at 9:32 AM, Jay Hannah wrote: > You don't find large lists of probably dead things annoying? Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. As an analogy, Google Mail keeps all your dead email (email you delete). Forever. Not because they think most of what you delete you shouldn't have deleted, but because it costs so little, and can be so efficiently managed for the few things that you do decide to recover a year later that it's not worth for you as a user to spend any brain cycles on which emails you should physically delete and which you should only "archive". Likewise, I don't see the gain that outweighs the brain cycles and careful consideration that would have to go into deciding which branches to delete, which ones to move into an "attic", and which ones to keep around. If you don't want to see them, simply clone and wipe them away. Life can be so easy :-) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Fri May 14 10:20:22 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 14 May 2010 09:20:22 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> Message-ID: <0C1AE8D4-70F5-427E-9429-B59156587E19@jays.net> On May 14, 2010, at 8:56 AM, Hilmar Lapp wrote: > On May 14, 2010, at 9:32 AM, Jay Hannah wrote: >> You don't find large lists of probably dead things annoying? > > Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. > > As an analogy, Google Mail keeps all your dead email (email you delete). Forever. OK. So our policy is that our branch list is an ever-growing pile of probably-dead things that we all ignore. A couple of them might be alive and useful at any given moment in time, but only if whoever created them is still around and cares and happens to remember what the point was. Understood. Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Fri May 14 11:34:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 10:34:41 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> Message-ID: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> On May 14, 2010, at 8:56 AM, Hilmar Lapp wrote: > > On May 14, 2010, at 9:32 AM, Jay Hannah wrote: > >> You don't find large lists of probably dead things annoying? > > > Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. > > As an analogy, Google Mail keeps all your dead email (email you delete). Forever. Not because they think most of what you delete you shouldn't have deleted, but because it costs so little, and can be so efficiently managed for the few things that you do decide to recover a year later that it's not worth for you as a user to spend any brain cycles on which emails you should physically delete and which you should only "archive". > > Likewise, I don't see the gain that outweighs the brain cycles and careful consideration that would have to go into deciding which branches to delete, which ones to move into an "attic", and which ones to keep around. If you don't want to see them, simply clone and wipe them away. Life can be so easy :-) > > -hilmar I tend to fall in the middle here, in that it would be nice to clean out feature branches that have been merged back in and relegate all older branches to an attic. Moving branches is as easy as 'git branch -m foo attic/foo'. I'm not in favor of removing branches that haven't been merged back, unless they're deemed unnecessary by the core devs. re: removing feature branches, this is something we have talked about doing in the past on svn, but is a bit trickier at the moment as the git repo doesn't currently indicate if/when specific svn branches were merged to HEAD. We still have read-only access to our svn repo to determine that if needed. So far, though, I haven't seen much in the way of indicating what some regard as 'feature' (removable) vs 'attic' (old but retained). That discussion needs to happen on list. chris From hlapp at drycafe.net Fri May 14 12:56:54 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 14 May 2010 12:56:54 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> Message-ID: <69D4619C-F21E-4FAE-B56F-C2F3B323EFD6@drycafe.net> On May 14, 2010, at 11:34 AM, Chris Fields wrote: > it would be nice to clean out feature branches that have been merged > back in Agreed, if the case is clear. > and relegate all older branches to an attic. Moving branches is as > easy as 'git branch -m foo attic/foo'. That's easy enough too and doesn't lose anything, hence no need to spend time on making sure it might not be a mistake. > I'm not in favor of removing branches that haven't been merged > back, unless they're deemed unnecessary by the core devs. Agreed, except I would remove the conditional. I'd rather spend that time on coding ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From subodhs at iastate.edu Fri May 14 12:24:21 2010 From: subodhs at iastate.edu (Srivastava, Subodh K [AGRON]) Date: Fri, 14 May 2010 11:24:21 -0500 Subject: [Bioperl-l] running perl script Message-ID: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> hi, I am running a perl script and getting error like: Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. How to set the path for this? the other related scripts are working in same directory. I am running; perl, v5.8.8 built for x86_64-linux-thread-multi thank you subodh ************************************* G-302 Agronomy Hall Iowa State University Ames, IA -50010 From rmb32 at cornell.edu Fri May 14 14:38:10 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 11:38:10 -0700 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> Message-ID: <4BED9892.5070408@cornell.edu> At the PDX hackathon last night, I was talking about this problem with a git expert, and he gave me a little tutorial on how git thinks about and keep branches and tags. Each of these things is just a special case of a 'ref', which is just a reference to the end of some piece of the commit graph. If you run git ls-remote http://github.com/bioperl/bioperl-live.git you can see all the refs we currently have in our bioperl-live repo, which are all in either /refs/heads (which are our branches), or /refs/tags (our tags). Now, it turns out you can have arbitrary things in here in addition to heads and tags. I copied one of the old branches to /refs/archives/branch-ensembl-m1 to demonstrate this. Now, it doesn't show up in normal workflow listings, but it's not deleted. If somebody wanted to resurrect it, they could move or copy it into /refs/heads (where it would show up as as an active branch again). To copy a branch into archives/, git push origin origin/:refs/archives/ To *move* a branch into archives/ git push origin origin/:refs/archives/ \ :refs/heads/ The first part of that second part of that push has nothing on the left side of the colon, which pushes a 'null' to refs/heads/, which deletes it. You can have an arbitrary number of these kinds of commands in each push invocation. So, there's a good mechanism for archiving our old branches. Rob From pat.boutet at gmail.com Fri May 14 15:14:36 2010 From: pat.boutet at gmail.com (Patrick Boutet) Date: Fri, 14 May 2010 13:14:36 -0600 Subject: [Bioperl-l] running perl script In-Reply-To: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> Message-ID: On Fri, May 14, 2010 at 10:24 AM, Srivastava, Subodh K [AGRON] < subodhs at iastate.edu> wrote: > hi, > I am running a perl script and getting error like: > > Can't locate Bio/Perl.pm in @INC (@INC contains: > /home/subodhs/SHORE_map/SHOREmap_release_1.1 > /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl > /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl > /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at > /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. > > How to set the path for this? > the other related scripts are working in same directory. > > I am running; perl, v5.8.8 built for x86_64-linux-thread-multi > > thank you > subodh > ************************************* > G-302 > Agronomy Hall > Iowa State University > Ames, IA -50010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Now I'm still new at this but I'll try and be helpful, first where is bioperl installed? System wide or local to your home directory? Do you have root access? What type of shell are you using? Because it seems like you might have to set your shells PERL5LIB variable to check the directory where bioperl is installed. Patrick Boutet From cjfields at illinois.edu Fri May 14 15:23:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 14:23:31 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BED9892.5070408@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> Message-ID: On May 14, 2010, at 1:38 PM, Robert Buels wrote: > At the PDX hackathon last night, I was talking about this problem with a git expert, and he gave me a little tutorial on how git thinks about and keep branches and tags. > > Each of these things is just a special case of a 'ref', which is just a reference to the end of some piece of the commit graph. If you run > > git ls-remote http://github.com/bioperl/bioperl-live.git > > you can see all the refs we currently have in our bioperl-live repo, which are all in either /refs/heads (which are our branches), or /refs/tags (our tags). > > Now, it turns out you can have arbitrary things in here in addition to heads and tags. I copied one of the old branches to /refs/archives/branch-ensembl-m1 to demonstrate this. Now, it doesn't show up in normal workflow listings, but it's not deleted. If somebody wanted to resurrect it, they could move or copy it into /refs/heads (where it would show up as as an active branch again). > > To copy a branch into archives/, > > git push origin origin/:refs/archives/ > > To *move* a branch into archives/ > > git push origin origin/:refs/archives/ \ > :refs/heads/ > > The first part of that second part of that push has nothing on the left side of the colon, which pushes a 'null' to refs/heads/, which deletes it. You can have an arbitrary number of these kinds of commands in each push invocation. > > So, there's a good mechanism for archiving our old branches. > > Rob That's a nice alternative to an attic, and less visible. On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. chris From rmb32 at cornell.edu Fri May 14 18:56:49 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 15:56:49 -0700 Subject: [Bioperl-l] BioPerl for indexing quality score files In-Reply-To: References: Message-ID: <4BEDD531.8050502@cornell.edu> Gregory Jordan wrote: > Ok, I need to shame myself with a huge "RTFM" for this one -- We still like you, Greg. Come hang out in #bioperl, where we can make fun of you properly. ;-) Rob From rmb32 at cornell.edu Fri May 14 19:01:50 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 16:01:50 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> Message-ID: <4BEDD65E.9070702@cornell.edu> Chris Fields wrote: > On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. OK, here are all our current branches, I will go through them in order of last-modified date. 1998-12-11 bioperl 1999-02-19 release-0-04-bug 1999-04-13 bioperl-live 1999-04-13 stable-0-05 2000-01-27 branch-ensembl-m1 2000-02-07 internal-branch-pre-delete-06-tag 2000-03-22 stable-0-05-new 2001-02-19 branch-06 2001-11-14 branch-07-ensembl-120 2001-12-28 steve_chervitz 2002-01-16 branch-07 2002-10-22 branch-1-0-0 2003-07-07 branch-1-2-collection 2003-10-13 branch-1-2 2004-10-20 ontology-cache 2005-04-14 branch-1-4 2006-01-11 bioperl-branch-1-5-1 2006-08-14 branch-experimental 2007-02-14 branch-1-5-2 2007-08-28 featann_rollback 2007-11-07 lightweight_feature_branch Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. 2009-06-17 restriction-refactor Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f 2009-07-16 topic/bug_2515 proposal: keep, jhannah "working" ;-) 2009-08-13 TRY_gff_refactor proposal: delete, git claims it is merged 2009-08-13 TRY_locatableseq_refactor proposal: delete, git claims it is merged 2009-09-29 branch-1-6 keep, 1.6 maint branch i think. 2009-10-14 anydbm-branch keep, MAJ working. MAJ, maybe you should move this to topic/ ? 2010-01-31 TRY_featureio_refactor keep, but looks dead. cjfields, maybe you want to delete it? 2010-05-12 topic/bug_3077 delete, git claims it is merged. Please review, and I'll do the work if people agree. Rob From jason at bioperl.org Fri May 14 19:54:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 14 May 2010 16:54:30 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDD65E.9070702@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> Message-ID: <4BEDE2B6.3010307@bioperl.org> lightweight_feature_branch was my test built with a feature type that is based on arrays instead of hashes got 25+% speedup I believe - have to go back to the archives to see what I claimed was speedup... =) I think that Bio::SeqFeature::Slim might be at least one speedup by Lincoln for Gbrowse that addresses some of the speed problem, though I think it still isn't array-based for data storage. -j Robert Buels wrote, On 5/14/10 4:01 PM: > Chris Fields wrote: >> On a related note, going through, it appears the git conversion >> didn't track merges back to trunk. For instance, I know the >> featann_rollback was merged to trunk but it's not showing up. I know >> svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came >> into play), so it may be hard to actually find true merges w/o that. > > OK, here are all our current branches, I will go through them in order > of last-modified date. > > 1998-12-11 bioperl > 1999-02-19 release-0-04-bug > 1999-04-13 bioperl-live > 1999-04-13 stable-0-05 > 2000-01-27 branch-ensembl-m1 > 2000-02-07 internal-branch-pre-delete-06-tag > 2000-03-22 stable-0-05-new > 2001-02-19 branch-06 > 2001-11-14 branch-07-ensembl-120 > 2001-12-28 steve_chervitz > 2002-01-16 branch-07 > 2002-10-22 branch-1-0-0 > 2003-07-07 branch-1-2-collection > 2003-10-13 branch-1-2 > 2004-10-20 ontology-cache > 2005-04-14 branch-1-4 > 2006-01-11 bioperl-branch-1-5-1 > 2006-08-14 branch-experimental > 2007-02-14 branch-1-5-2 > 2007-08-28 featann_rollback > 2007-11-07 lightweight_feature_branch > > Proposal: move the above to refs/archive and not worry any further > about them. Maybe we can throw them out in 2020. > > 2009-06-17 restriction-refactor > > Proposal: delete, looks like it was merged in > a2cb40e6c9c7da4f776dbb72a0266f54320fa37f > > 2009-07-16 topic/bug_2515 > proposal: keep, jhannah "working" ;-) > > 2009-08-13 TRY_gff_refactor > proposal: delete, git claims it is merged > > 2009-08-13 TRY_locatableseq_refactor > proposal: delete, git claims it is merged > > 2009-09-29 branch-1-6 > keep, 1.6 maint branch i think. > > 2009-10-14 anydbm-branch > keep, MAJ working. MAJ, maybe you should move this to topic/ ? > > 2010-01-31 TRY_featureio_refactor > keep, but looks dead. cjfields, maybe you want to delete it? > > 2010-05-12 topic/bug_3077 > delete, git claims it is merged. > > Please review, and I'll do the work if people agree. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri May 14 23:41:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 22:41:18 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDD65E.9070702@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> Message-ID: <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> On May 14, 2010, at 6:01 PM, Robert Buels wrote: > Chris Fields wrote: >> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. > > OK, here are all our current branches, I will go through them in order of last-modified date. > > 1998-12-11 bioperl > 1999-02-19 release-0-04-bug > 1999-04-13 bioperl-live > 1999-04-13 stable-0-05 > 2000-01-27 branch-ensembl-m1 > 2000-02-07 internal-branch-pre-delete-06-tag > 2000-03-22 stable-0-05-new > 2001-02-19 branch-06 > 2001-11-14 branch-07-ensembl-120 > 2001-12-28 steve_chervitz > 2002-01-16 branch-07 > 2002-10-22 branch-1-0-0 > 2003-07-07 branch-1-2-collection > 2003-10-13 branch-1-2 > 2004-10-20 ontology-cache > 2005-04-14 branch-1-4 > 2006-01-11 bioperl-branch-1-5-1 > 2006-08-14 branch-experimental > 2007-02-14 branch-1-5-2 > 2007-08-28 featann_rollback > 2007-11-07 lightweight_feature_branch > > Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. Just as long as we know they are there. Rob, can you document the archive set up on the wiki so we don't forget it? I deleted the featann_rollback branch. That was a feature branch (no pun intended) to rollback overloading and a host of other changes introduced to bioperl just before the 1.5 release. It was merged a few years ago in svn. > 2009-06-17 restriction-refactor > > Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f This may have been Mark's refactoring, so yes, delete. > 2009-08-13 TRY_gff_refactor > proposal: delete, git claims it is merged > > 2009-08-13 TRY_locatableseq_refactor > proposal: delete, git claims it is merged I deleted these. The primary goal of TRY_gff_refactor was to work in GFF3 work, but that may rely on FeatureIO so will have to be done in stages. At some point, if we do a larger scale refactoring of GFF for GFF3 compat we can make another branch. TRY_locatableseq_refactor will be obsoleted once GSoC starts. > 2009-09-29 branch-1-6 > keep, 1.6 maint branch i think. Yes. I will probably work on another set of merges from to 1.6 soon to bring it up to speed, maybe for one last 1.6 release. > 2009-10-14 anydbm-branch > keep, MAJ working. MAJ, maybe you should move this to topic/ ? > > 2010-01-31 TRY_featureio_refactor > keep, but looks dead. cjfields, maybe you want to delete it? Yes. I've deleted this, as FeatureIO is on it's own. > 2010-05-12 topic/bug_3077 > delete, git claims it is merged. That's already deleted. Maybe needs to be pruned locally? > Please review, and I'll do the work if people agree. > > Rob Good start! chris From cjfields at illinois.edu Fri May 14 23:45:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 22:45:07 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDE2B6.3010307@bioperl.org> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <4BEDE2B6.3010307@bioperl.org> Message-ID: <34DFCB4E-2048-4A62-AE9C-06CBF900D38A@illinois.edu> This was moved into bioperl-dev at some point: http://github.com/bioperl/bioperl-dev/tree/master/Bio/SeqFeature/ Might be obsolete as well. chris On May 14, 2010, at 6:54 PM, Jason Stajich wrote: > lightweight_feature_branch was my test built with a feature type that is based on arrays instead of hashes got 25+% speedup I believe - have to go back to the archives to see what I claimed was speedup... =) > > I think that Bio::SeqFeature::Slim might be at least one speedup by Lincoln for Gbrowse that addresses some of the speed problem, though I think it still isn't array-based for data storage. > > -j > > Robert Buels wrote, On 5/14/10 4:01 PM: >> Chris Fields wrote: >>> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. >> >> OK, here are all our current branches, I will go through them in order of last-modified date. >> >> 1998-12-11 bioperl >> 1999-02-19 release-0-04-bug >> 1999-04-13 bioperl-live >> 1999-04-13 stable-0-05 >> 2000-01-27 branch-ensembl-m1 >> 2000-02-07 internal-branch-pre-delete-06-tag >> 2000-03-22 stable-0-05-new >> 2001-02-19 branch-06 >> 2001-11-14 branch-07-ensembl-120 >> 2001-12-28 steve_chervitz >> 2002-01-16 branch-07 >> 2002-10-22 branch-1-0-0 >> 2003-07-07 branch-1-2-collection >> 2003-10-13 branch-1-2 >> 2004-10-20 ontology-cache >> 2005-04-14 branch-1-4 >> 2006-01-11 bioperl-branch-1-5-1 >> 2006-08-14 branch-experimental >> 2007-02-14 branch-1-5-2 >> 2007-08-28 featann_rollback >> 2007-11-07 lightweight_feature_branch >> >> Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. >> >> 2009-06-17 restriction-refactor >> >> Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f >> >> 2009-07-16 topic/bug_2515 >> proposal: keep, jhannah "working" ;-) >> >> 2009-08-13 TRY_gff_refactor >> proposal: delete, git claims it is merged >> >> 2009-08-13 TRY_locatableseq_refactor >> proposal: delete, git claims it is merged >> >> 2009-09-29 branch-1-6 >> keep, 1.6 maint branch i think. >> >> 2009-10-14 anydbm-branch >> keep, MAJ working. MAJ, maybe you should move this to topic/ ? >> >> 2010-01-31 TRY_featureio_refactor >> keep, but looks dead. cjfields, maybe you want to delete it? >> >> 2010-05-12 topic/bug_3077 >> delete, git claims it is merged. >> >> Please review, and I'll do the work if people agree. >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Sat May 15 10:27:48 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 09:27:48 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) Message-ID: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> I wrote some tests and merged and deleted branch topic/bug_2515. Bio::SeqIO::gbxml is now in master. Thanks to Ryan Golhar for the contribution! Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah bioperl-live$ perl -I. t/SeqIO/gbxml.t 1..14 ok 1 - use Bio::SeqIO::gbxml; ok 2 - The object isa Bio::SeqIO ok 3 - molecule ok 4 - alphabet ok 5 - primary_id ok 6 - display_id ok 7 - version ok 8 - is_circular ok 9 - description ok 10 - sequence ok 11 - classification ok 12 - feat - clone_lib ok 13 - feat - db_xref ok 14 - feat - lab_host From jay at jays.net Sat May 15 10:57:54 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 09:57:54 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> Message-ID: On May 15, 2010, at 9:34 AM, Chris Fields wrote: > Can you add something to the Changes file for this? You can make a new section for bug fixes or new features at the top, and we can worry about versions later. > > I'll add in the recent bug fix I made as well. Pushed. Feel free to discard any of that you don't like. HTH, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Sat May 15 11:46:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 May 2010 10:46:16 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> Message-ID: <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> Thanks Jay. I'll add a bit in myself for bug 3077. Not sure if we'll pursue another point release yet, but it would be nice to get changes out prior to any major structural reorganization. chris On May 15, 2010, at 9:57 AM, Jay Hannah wrote: > On May 15, 2010, at 9:34 AM, Chris Fields wrote: >> Can you add something to the Changes file for this? You can make a new section for bug fixes or new features at the top, and we can worry about versions later. >> >> I'll add in the recent bug fix I made as well. > > Pushed. Feel free to discard any of that you don't like. HTH, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > From jay at jays.net Sat May 15 14:08:35 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 13:08:35 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> Message-ID: On May 15, 2010, at 10:46 AM, Chris Fields wrote: > Thanks Jay. I'll add a bit in myself for bug 3077. Not sure if we'll pursue another point release yet, but it would be nice to get changes out prior to any major structural reorganization. Is there a list whose completion will mark the push of 1.6.2 to CPAN? The Changes file says this now: Bugs to be addressed: http://bugzilla.open-bio.org specific bugs intended for the next CPAN release series highlighted in BUGS But I don't understand what 'highlighted in BUGS' means. I also don't know what a 'point release' is. :) Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From David.Messina at sbc.su.se Sat May 15 15:34:58 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 May 2010 21:34:58 +0200 Subject: [Bioperl-l] parsing blast report with long description In-Reply-To: References: Message-ID: Shalabh, Could you please file a bug report on this at bugzilla.open-bio.org? Please include a description (pasting this email will do) and most importantly a test script and sample blast output file which reproduces the problem. We will need those in order to be able to diagnose and fix the problem. Thanks! Dave On May 13, 2010, at 5:07 PM, shalabh sharma wrote: > Hi All, > I need some help in parsing blast output. > I have a inhouse database that contain sequences with really long > description. > >> SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open > Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - > 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV > > So my blast report looks like this: > > ..... > ..... >> SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 > 6887/Open Ocean/Galapagos Islands/134 miles NE of > Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 > m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > Length = 213 > > Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix > adjust. > Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) > ..... > ..... > > (note that the tag "TI_1000008216887" is splitting in two lines). > > I am using SeqIO to parse this report. What i am doing is parsing the > description field again to get all the tags. like > .... > .... > my $desc = $hit->description; > my @f = split('/',$desc); > for(my $i = 0;$i < scalar > @f;$i++){ print OUT "$f[$i]\t";} > ..... > ..... > > > *I am getting the perfect parsed report but the field with TI_1000008216887 > has a space **TI_100000821 6887 *. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun May 16 11:14:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 10:14:25 -0500 Subject: [Bioperl-l] GenomeeTools Message-ID: Anyone used GenomeTools? I'm thinking of setting up some C bindings to it. It has a C-based GFF3 parser, among other goodies. http://genometools.org/index.html chris From cjfields at illinois.edu Sun May 16 12:16:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 11:16:11 -0500 Subject: [Bioperl-l] Bio-FeatureIO Message-ID: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> All, Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. chris From jay at jays.net Sun May 16 13:32:57 2010 From: jay at jays.net (Jay Hannah) Date: Sun, 16 May 2010 12:32:57 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 11:16 AM, Chris Fields wrote: > Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. I'm curious about how this works in terms of git storage. Does this mean that the separate Bio-FeatureIO repo will have the entire history of BioPerl inside it? (Making git clones of Bio-FeatureIO 189MB?) In the recent past I have attempted pulling certain files across git repos before, and ended up with the full history of repo1 inside repo2. I'm unclear if this is just how life is, or if I did it wrong. You could, of course, always just cp text files in, but then you lose the history of those files. Is there some way to get all the history of a handful of files from massive repo1 into tiny repo2 without making repo1 massive? I don't know if any of these considerations are important for the eventual de-monolithification of BioPerl, I was just generally curious. git does that to me. :) Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Sun May 16 14:18:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 13:18:24 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 12:32 PM, Jay Hannah wrote: > On May 16, 2010, at 11:16 AM, Chris Fields wrote: >> Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. > > I'm curious about how this works in terms of git storage. > > Does this mean that the separate Bio-FeatureIO repo will have the entire history of BioPerl inside it? (Making git clones of Bio-FeatureIO 189MB?) > > In the recent past I have attempted pulling certain files across git repos before, and ended up with the full history of repo1 inside repo2. I'm unclear if this is just how life is, or if I did it wrong. > > You could, of course, always just cp text files in, but then you lose the history of those files. > > Is there some way to get all the history of a handful of files from massive repo1 into tiny repo2 without making repo1 massive? > > I don't know if any of these considerations are important for the eventual de-monolithification of BioPerl, I was just generally curious. git does that to me. :) > > Thanks, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I'm just planning on having something to the effect of 'Bio-FeatureIO is a set of modules developed by author X that once was part of bioperl-live, but was removed at point XYZ to significantly refactor the code,' then point back to bioperl-live if anyone is interested in software archaeology. Not sure we would need to go beyond that. chris From jay at jays.net Sun May 16 14:47:42 2010 From: jay at jays.net (Jay Hannah) Date: Sun, 16 May 2010 13:47:42 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 1:18 PM, Chris Fields wrote: > I'm just planning on having something to the effect of 'Bio-FeatureIO is a set of modules developed by author X that once was part of bioperl-live, but was removed at point XYZ to significantly refactor the code,' then point back to bioperl-live if anyone is interested in software archaeology. Not sure we would need to go beyond that. Gotcha. That certainly solves the problem. :) So maybe in 2020 we'll be pushing 30 independent github repos to PAUSE all citing the bioperl-live repo for historical digging prior to their emancipation. To jhannah in the year 2020: You are NOT too old for dirt bikes. Keep riding! :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From fs5 at sanger.ac.uk Mon May 17 04:38:18 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 17 May 2010 09:38:18 +0100 Subject: [Bioperl-l] parsing blast report with long description In-Reply-To: References: Message-ID: <1274085498.5288.30.camel@deskpro15336.dynamic.sanger.ac.uk> I think you should try to avoid those long IDs anyway, especially because you have spaces in there too and this may cause problems further down the line as many programs will use a pattern like />(\S+)/ as the identifier. I would build a small database for your files and use unique database identifiers in your FASTA files. That will make it easier in the future to collect, for example, all sequences from a certain region etc. If you want to avoid that you could have two file: one FASTA files using numbers as IDs and a file where you map those numbers to sample descriptions, i.e. a simple flat-file database. Frank On Thu, 2010-05-13 at 11:07 -0400, shalabh sharma wrote: > Hi All, > I need some help in parsing blast output. > I have a inhouse database that contain sequences with really long > description. > > >SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open > Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - > 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV > > So my blast report looks like this: > > ..... > ..... > >SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 > 6887/Open Ocean/Galapagos Islands/134 miles NE of > Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 > m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > Length = 213 > > Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix > adjust. > Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) > ..... > ..... > > (note that the tag "TI_1000008216887" is splitting in two lines). > > I am using SeqIO to parse this report. What i am doing is parsing the > description field again to get all the tags. like > .... > .... > my $desc = $hit->description; > my @f = split('/',$desc); > for(my $i = 0;$i < scalar > @f;$i++){ print OUT "$f[$i]\t";} > ..... > ..... > > > *I am getting the perfect parsed report but the field with TI_1000008216887 > has a space **TI_100000821 6887 *. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From fs5 at sanger.ac.uk Mon May 17 04:41:51 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 17 May 2010 09:41:51 +0100 Subject: [Bioperl-l] running perl script In-Reply-To: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> Message-ID: <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> why are you requiring "Bio::Perl"? You would normally use somethink specific in the BioPerl bundle, like Bio::Seq or whatever. Can you show some of your script? Frank On Fri, 2010-05-14 at 11:24 -0500, Srivastava, Subodh K [AGRON] wrote: > hi, > I am running a perl script and getting error like: > > Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. > > How to set the path for this? > the other related scripts are working in same directory. > > I am running; perl, v5.8.8 built for x86_64-linux-thread-multi > > thank you > subodh > ************************************* > G-302 > Agronomy Hall > Iowa State University > Ames, IA -50010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Mon May 17 08:26:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 07:26:20 -0500 Subject: [Bioperl-l] running perl script In-Reply-To: <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <63D0BEDA-27F7-48AB-ABE8-1F39B09B349A@illinois.edu> Frank, Bio::Perl is the generic user module for very simple tasks. See here: http://github.com/bioperl/bioperl-live/blob/master/Bio/Perl.pm Subodh, you need to make sure the modules are in your perl library path. See the following link, under 'INSTALLING BIOPERL IN A PERSONAL MODULE AREA': http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix chris On May 17, 2010, at 3:41 AM, Frank Schwach wrote: > why are you requiring "Bio::Perl"? You would normally use somethink > specific in the BioPerl bundle, like Bio::Seq or whatever. Can you show > some of your script? > Frank > > > On Fri, 2010-05-14 at 11:24 -0500, Srivastava, Subodh K [AGRON] wrote: >> hi, >> I am running a perl script and getting error like: >> >> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. >> >> How to set the path for this? >> the other related scripts are working in same directory. >> >> I am running; perl, v5.8.8 built for x86_64-linux-thread-multi >> >> thank you >> subodh >> ************************************* >> G-302 >> Agronomy Hall >> Iowa State University >> Ames, IA -50010 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ross at cuhk.edu.hk Mon May 17 08:42:35 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 17 May 2010 20:42:35 +0800 Subject: [Bioperl-l] extracting genbank content Message-ID: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Dear all, When there are more than one genbank records in a file, except by splitting the file into separate records, what can I do to transverse the records? $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); $seqobj=$obj->next_seq(); Do I just use another $obj->next_seq() so it will point to another record? Thanks for your advice. From amackey at virginia.edu Mon May 17 09:51:31 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Mon, 17 May 2010 09:51:31 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: On Thu, May 13, 2010 at 2:20 AM, Heikki Lehvaslaiho < heikki.lehvaslaiho at gmail.com> wrote: > > As of getting values outseide the defined region, that is a feature rather > than a bug. The idea was to be able to ask what would the new coordinate be > if the feature extended beyond the known limits. The is the capability of > Bio::Coordinate::ExtrapolatingPair that is used here. That class also has a > method strict that can be used to prevent extrapolating, but the code to > access that has not been written into GeneMapper. I'll see if I can get it > to work. > > I had this same thought/expectation, but that in fact is not what's going on. There is no place in the GeneMapper code where the CDS end coordinate is being used, only the begin coordinate. The implicit assumption is that the CDS ends at the last exon. >From the perspective of the translate/revtranslate methods, an extrapolating pair does not make sense (at least to me) -- just as a CDS coordinate is undefined within an intron, so too would I expect a CDS coordinate to be undefined in an UTR or intragenic region. Alternatively, it would be nice (in general) to be able to check whether the provided mapping is an extrapolation or not. -Aaron From David.Messina at sbc.su.se Mon May 17 09:56:35 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 17 May 2010 15:56:35 +0200 Subject: [Bioperl-l] extracting genbank content In-Reply-To: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: Hi Ross, > Do I just use another $obj->next_seq() so it will point to another record? Yes. The common approach is to use a while loop: my $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); while(my $seqobj = $obj->next_seq) { # do stuff with $seqobj } For more details, see the SeqIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SeqIO Dave From cjfields at illinois.edu Mon May 17 12:36:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 11:36:37 -0500 Subject: [Bioperl-l] extracting genbank content In-Reply-To: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: <9952EA98-248E-41B8-9816-A3A01EC6ADFE@illinois.edu> Depends on what you need to do. If you are just interested in pulling out certain bits of data from each record, using SeqIO is a good option. But if you want to access the records as a flat database (not iteration, but indexed for fast access), use Bio::Index::GenBank or Bio::DB::Flat to make a simple flat file database and access them by ID. chris On May 17, 2010, at 7:42 AM, Ross KK Leung wrote: > Dear all, > > > > When there are more than one genbank records in a file, except by splitting > the file into separate records, what can I do to transverse the records? > > > > $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); > > > $seqobj=$obj->next_seq(); > > > > Do I just use another $obj->next_seq() so it will point to another record? > > > > Thanks for your advice. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Mon May 17 12:50:21 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 May 2010 09:50:21 -0700 Subject: [Bioperl-l] GenomeeTools In-Reply-To: References: Message-ID: <4BF173CD.8020600@cornell.edu> I haven't used GenomeTools but I've used GenomeThreader, one of Gordon's other tools. Rob Chris Fields wrote: > Anyone used GenomeTools? I'm thinking of setting up some C bindings to it. It has a C-based GFF3 parser, among other goodies. > > http://genometools.org/index.html > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Mon May 17 20:15:13 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 May 2010 17:15:13 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> Message-ID: <4BF1DC11.6030402@cornell.edu> OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches Rob Chris Fields wrote: > On May 14, 2010, at 6:01 PM, Robert Buels wrote: > >> Chris Fields wrote: >>> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. >> OK, here are all our current branches, I will go through them in order of last-modified date. >> >> 1998-12-11 bioperl >> 1999-02-19 release-0-04-bug >> 1999-04-13 bioperl-live >> 1999-04-13 stable-0-05 >> 2000-01-27 branch-ensembl-m1 >> 2000-02-07 internal-branch-pre-delete-06-tag >> 2000-03-22 stable-0-05-new >> 2001-02-19 branch-06 >> 2001-11-14 branch-07-ensembl-120 >> 2001-12-28 steve_chervitz >> 2002-01-16 branch-07 >> 2002-10-22 branch-1-0-0 >> 2003-07-07 branch-1-2-collection >> 2003-10-13 branch-1-2 >> 2004-10-20 ontology-cache >> 2005-04-14 branch-1-4 >> 2006-01-11 bioperl-branch-1-5-1 >> 2006-08-14 branch-experimental >> 2007-02-14 branch-1-5-2 >> 2007-08-28 featann_rollback >> 2007-11-07 lightweight_feature_branch >> >> Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. > > Just as long as we know they are there. Rob, can you document the archive set up on the wiki so we don't forget it? > > I deleted the featann_rollback branch. That was a feature branch (no pun intended) to rollback overloading and a host of other changes introduced to bioperl just before the 1.5 release. It was merged a few years ago in svn. > >> 2009-06-17 restriction-refactor >> >> Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f > > This may have been Mark's refactoring, so yes, delete. > >> 2009-08-13 TRY_gff_refactor >> proposal: delete, git claims it is merged >> >> 2009-08-13 TRY_locatableseq_refactor >> proposal: delete, git claims it is merged > > I deleted these. The primary goal of TRY_gff_refactor was to work in GFF3 work, but that may rely on FeatureIO so will have to be done in stages. At some point, if we do a larger scale refactoring of GFF for GFF3 compat we can make another branch. TRY_locatableseq_refactor will be obsoleted once GSoC starts. > >> 2009-09-29 branch-1-6 >> keep, 1.6 maint branch i think. > > Yes. I will probably work on another set of merges from to 1.6 soon to bring it up to speed, maybe for one last 1.6 release. > >> 2009-10-14 anydbm-branch >> keep, MAJ working. MAJ, maybe you should move this to topic/ ? >> >> 2010-01-31 TRY_featureio_refactor >> keep, but looks dead. cjfields, maybe you want to delete it? > > Yes. I've deleted this, as FeatureIO is on it's own. > >> 2010-05-12 topic/bug_3077 >> delete, git claims it is merged. > > That's already deleted. Maybe needs to be pruned locally? > >> Please review, and I'll do the work if people agree. >> >> Rob > > Good start! > > chris > > From jay at jays.net Mon May 17 20:35:33 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 17 May 2010 19:35:33 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BF1DC11.6030402@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> Message-ID: <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> On May 17, 2010, at 7:15 PM, Robert Buels wrote: > OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches Thank you!! git pull --prune and suddenly I feel clean again! :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From amackey at virginia.edu Mon May 17 20:42:17 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Mon, 17 May 2010 20:42:17 -0400 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: <20100518001029.CD8644229D@smtp1.rs.github.com> References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: I probably missed some prior discussion of this, but any chance that the new commit messages can actually include the (unified, possibly truncated-for-length) diff of the changes? My own 2 cents is that community-wide visual skims of the diffs provide a valuable spot-check for typo's and other think-o's. Plus it gives me an indication of how major the change was. A corollary -- might there be an RSS feed by which I could subscribe to such diffs, rather than get emails about them? Since the emails are sent from "noreply", I already have to step out of the normal email flow to respond to a diff, might as well go whole hog and remove them from my email consciousness entirely, and place them with the other various information streams in my RSS reader. Thanks, -Aaron On Mon, May 17, 2010 at 8:10 PM, wrote: > Branch: refs/archives/heads/branch-1-0-0 > Home: http://github.com/bioperl/bioperl-live > > Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 > > http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 > Author: sac > Date: 2002-10-22 (Tue, 22 Oct 2002) > > Changed paths: > M Bio/SearchIO/Writer/HitTableWriter.pm > > Log Message: > ----------- > Added frame to the column map. > > svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > From jay at jays.net Mon May 17 21:10:56 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 17 May 2010 20:10:56 -0500 Subject: [Bioperl-l] 319a6e: Added frame to the column map. In-Reply-To: References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > I probably missed some prior discussion of this, but any chance that the new > commit messages can actually include the (unified, possibly > truncated-for-length) diff of the changes? I'm 5 years behind the cool-kids curve on this stuff. :) I just discovered SVN::Notify for $work[0]. By default it kicks out really pretty color HTML diffs of every change. I assume there's an equivalent for git? You could always click to github. It's color HTML diffs are very pretty. That commit for example: http://github.com/bioperl/bioperl-live/commit/319a6e Plus all the other github shiny -- comment specific lines of the commit, or the commit itself, etc. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Mon May 17 21:35:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 20:35:21 -0500 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> Aaron, We can do either, though setting up diffs will take a bit more work (will have to set up a post-receive URL to a CGI script to process this). RSS is quite a bit easier: http://github.com/bioperl/bioperl-live/commits/master.atom Replace 'bioperl-live' with any of the other repos for repo-specific RSS commits. The links go to the commits where you can also make in-line notes/comments by clicking in the diff code, or simple comments at the bottom. Those comments are then passed on to bioperl-guts-l for everyone to see. Example here: http://github.com/bioperl/bioperl-live/commit/c86c048c96786f8517ae1ad1fc5e5823eecf52c3 and the relevant bioperl-guts-l posts: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031259.html http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031260.html chris On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > I probably missed some prior discussion of this, but any chance that the new > commit messages can actually include the (unified, possibly > truncated-for-length) diff of the changes? > > My own 2 cents is that community-wide visual skims of the diffs provide a > valuable spot-check for typo's and other think-o's. Plus it gives me an > indication of how major the change was. > > A corollary -- might there be an RSS feed by which I could subscribe to such > diffs, rather than get emails about them? Since the emails are sent from > "noreply", I already have to step out of the normal email flow to respond to > a diff, might as well go whole hog and remove them from my email > consciousness entirely, and place them with the other various information > streams in my RSS reader. > > Thanks, > > -Aaron > > On Mon, May 17, 2010 at 8:10 PM, wrote: > >> Branch: refs/archives/heads/branch-1-0-0 >> Home: http://github.com/bioperl/bioperl-live >> >> Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 >> >> http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 >> Author: sac >> Date: 2002-10-22 (Tue, 22 Oct 2002) >> >> Changed paths: >> M Bio/SearchIO/Writer/HitTableWriter.pm >> >> Log Message: >> ----------- >> Added frame to the column map. >> >> svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 >> >> >> _______________________________________________ >> Bioperl-guts-l mailing list >> Bioperl-guts-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Tue May 18 03:16:52 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 00:16:52 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> Message-ID: <4BF23EE4.6020704@cornell.edu> We may want to do the same for our tags as well. Our github download page is fairly disastrous. See: http://github.com/bioperl/bioperl-live/downloads It's not clear that a similar date-cutoff policy would work for tags. Pretty much all of these things were before my time, I don't know what most of them are. Does someone with more history than me have some thoughts as to what should stay on that download page? The rest of the tags could be archived. Rob Jay Hannah wrote: > On May 17, 2010, at 7:15 PM, Robert Buels wrote: >> OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches > > Thank you!! git pull --prune and suddenly I feel clean again! :) > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > From bpcwhite at gmail.com Tue May 18 05:49:29 2010 From: bpcwhite at gmail.com (Bryan White) Date: Tue, 18 May 2010 02:49:29 -0700 (PDT) Subject: [Bioperl-l] distance Message-ID: Hello, I am trying to create a simple program to show me the distance between taxa on a given tree. However, I am having trouble getting the bioperl code to work. Here is the code that I am using: -------- #! /usr/bin/perl use strict; use warnings; use Bio::Tree::Draw::Cladogram; use Bio::TreeIO; #use Bio::TreeFunctionsI; my $node1 = 'homo_sapiens'; my $node2 = 'murinae'; my $input = new Bio::TreeIO('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree = $input->next_tree; my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); my $distance = $tree->distance(-nodes => \@nodes); #print $distance; -------- And here is the error message I receive: ------------- EXCEPTION ------------- MSG: Must provide 2 nodes STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ Bio/Tree/TreeFunctionsI.pm:811 STACK toplevel ./phylo.pl:19 ------------------------------------- It seems that the nodes are not being read into the @nodes variable. Any help in figuring this out would be appreciated. Thanks, Bryan From biopython at maubp.freeserve.co.uk Tue May 18 06:07:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 May 2010 11:07:15 +0100 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BF23EE4.6020704@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> Message-ID: On Tue, May 18, 2010 at 8:16 AM, Robert Buels wrote: > We may want to do the same for our tags as well. ?Our github download page > is fairly disastrous. ?See: > > http://github.com/bioperl/bioperl-live/downloads > > It's not clear that a similar date-cutoff policy would work for tags. Pretty > much all of these things were before my time, I don't know what most of them > are. > > Does someone with more history than me have some thoughts as to what should > stay on that download page? ?The rest of the tags could be archived. > > Rob Or just turn off the download feature in github. When you prepare a BioPerl release does it contain anything else not found in the repository (e.g. compiled documentation)? We have this for Biopython (compiled PDF and HTML docs) so we prefer to direct casual release downloads via the website not via the tag on github to ensure they get these extra files in the archive. Peter From adsj at novozymes.com Tue May 18 06:21:25 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Tue, 18 May 2010 12:21:25 +0200 Subject: [Bioperl-l] distance References: Message-ID: <87k4r11pei.fsf@topper.koldfront.dk> On Tue, 18 May 2010 02:49:29 -0700 (PDT), Bryan wrote: > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); I think you may have misunderstood the documentation of find_node(). You are supposed to give the fieldname after the dash, so what you want is: my @nodes = $tree->find_node(-id => 'Homo_sapiens','Murinae'); - if the field you want to match on is 'id'. Also, I don't think you can get find_node() to do 'OR'-searches , so you'll need to do something like this: = = = #!/usr/bin/perl use strict; use warnings; use Bio::TreeIO; my $input=Bio::TreeIO->new('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree=$input->next_tree; my ($node1)=$tree->find_node(-id=>'Homo_sapiens'); # this (arbitrarily) picks the first match my ($node2)=$tree->find_node(-id=>'Murinae'); # -"- my $distance=$tree->distance(-nodes=>[$node1, $node2]); print "$distance\n"; = = = It is much easier to help if you give an example of the input as well as the script. I constructed this stand-in for your newick file to test on: (Homo_sapiens:1.1,B:2.2,(C:3.3,Murinae:4.4):5.5); Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From David.Messina at sbc.su.se Tue May 18 06:50:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 12:50:52 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> Message-ID: <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> On May 18, 2010, at 12:07, Peter wrote: > Or just turn off the download feature in github. That might be the best solution, at least for now. The download page is somewhat unfriendly anyway ? the tag names are truncated, there's no way to sort, and the descriptions are, well, not so descriptive (they appear to be just the last commit message). Probably better to keep http://www.bioperl.org/wiki/Getting_BioPerl as our main distribution point for downloads. Dave From jun.yin at ucd.ie Tue May 18 07:15:14 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 18 May 2010 12:15:14 +0100 Subject: [Bioperl-l] distance In-Reply-To: <87k4r11pei.fsf@topper.koldfront.dk> References: <87k4r11pei.fsf@topper.koldfront.dk> Message-ID: <002d01caf67b$637c20d0$2a746270$%yin@ucd.ie> Hi, Bryan, Use Adam's code. The last sentence of my code was wrong. I made a wrong reference... Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Adam "Sj?gren" Sent: Tuesday, May 18, 2010 11:21 AM To: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] distance On Tue, 18 May 2010 02:49:29 -0700 (PDT), Bryan wrote: > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); I think you may have misunderstood the documentation of find_node(). You are supposed to give the fieldname after the dash, so what you want is: my @nodes = $tree->find_node(-id => 'Homo_sapiens','Murinae'); - if the field you want to match on is 'id'. Also, I don't think you can get find_node() to do 'OR'-searches , so you'll need to do something like this: = = = #!/usr/bin/perl use strict; use warnings; use Bio::TreeIO; my $input=Bio::TreeIO->new('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree=$input->next_tree; my ($node1)=$tree->find_node(-id=>'Homo_sapiens'); # this (arbitrarily) picks the first match my ($node2)=$tree->find_node(-id=>'Murinae'); # -"- my $distance=$tree->distance(-nodes=>[$node1, $node2]); print "$distance\n"; = = = It is much easier to help if you give an example of the input as well as the script. I constructed this stand-in for your newick file to test on: (Homo_sapiens:1.1,B:2.2,(C:3.3,Murinae:4.4):5.5); Best regards, Adam -- Adam Sj?gren adsj at novozymes.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From amackey at virginia.edu Tue May 18 07:26:17 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 18 May 2010 07:26:17 -0400 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> References: <20100518001029.CD8644229D@smtp1.rs.github.com> <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> Message-ID: Thanks for the info, and the thoroughness of your explanation! -Aaron On Mon, May 17, 2010 at 9:35 PM, Chris Fields wrote: > Aaron, > > We can do either, though setting up diffs will take a bit more work (will > have to set up a post-receive URL to a CGI script to process this). > > RSS is quite a bit easier: > > http://github.com/bioperl/bioperl-live/commits/master.atom > > Replace 'bioperl-live' with any of the other repos for repo-specific RSS > commits. The links go to the commits where you can also make in-line > notes/comments by clicking in the diff code, or simple comments at the > bottom. Those comments are then passed on to bioperl-guts-l for everyone to > see. Example here: > > > http://github.com/bioperl/bioperl-live/commit/c86c048c96786f8517ae1ad1fc5e5823eecf52c3 > > and the relevant bioperl-guts-l posts: > > http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031259.html > http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031260.html > > chris > > On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > > > I probably missed some prior discussion of this, but any chance that the > new > > commit messages can actually include the (unified, possibly > > truncated-for-length) diff of the changes? > > > > My own 2 cents is that community-wide visual skims of the diffs provide a > > valuable spot-check for typo's and other think-o's. Plus it gives me an > > indication of how major the change was. > > > > A corollary -- might there be an RSS feed by which I could subscribe to > such > > diffs, rather than get emails about them? Since the emails are sent from > > "noreply", I already have to step out of the normal email flow to respond > to > > a diff, might as well go whole hog and remove them from my email > > consciousness entirely, and place them with the other various information > > streams in my RSS reader. > > > > Thanks, > > > > -Aaron > > > > On Mon, May 17, 2010 at 8:10 PM, wrote: > > > >> Branch: refs/archives/heads/branch-1-0-0 > >> Home: http://github.com/bioperl/bioperl-live > >> > >> Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 > >> > >> > http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 > >> Author: sac > >> Date: 2002-10-22 (Tue, 22 Oct 2002) > >> > >> Changed paths: > >> M Bio/SearchIO/Writer/HitTableWriter.pm > >> > >> Log Message: > >> ----------- > >> Added frame to the column map. > >> > >> svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 > >> > >> > >> _______________________________________________ > >> Bioperl-guts-l mailing list > >> Bioperl-guts-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jun.yin at ucd.ie Tue May 18 07:07:43 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 18 May 2010 12:07:43 +0100 Subject: [Bioperl-l] distance In-Reply-To: References: Message-ID: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> Hi, Bryan, In your code: my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); First, You should specify the fieldname. The "fieldname" itself doesnot seem like a valid key. The default field name is "id". Second, the find_node method can only search for one specific term at one time. Third, distance method can only work on two nodes. So try this: my @nodes_human = $tree->find_node(-id => 'Homo_sapiens'); my @nodes_murinae=$tree->find_node(-id=>'Murinae'); my $distance = $tree->distance(-nodes => \($nodes_human[0],$nodes_murinae[0])); #Providing you only have one match for "Homo_sapiens" and " Murinae". Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bryan White Sent: Tuesday, May 18, 2010 10:49 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] distance Hello, I am trying to create a simple program to show me the distance between taxa on a given tree. However, I am having trouble getting the bioperl code to work. Here is the code that I am using: -------- #! /usr/bin/perl use strict; use warnings; use Bio::Tree::Draw::Cladogram; use Bio::TreeIO; #use Bio::TreeFunctionsI; my $node1 = 'homo_sapiens'; my $node2 = 'murinae'; my $input = new Bio::TreeIO('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree = $input->next_tree; my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); my $distance = $tree->distance(-nodes => \@nodes); #print $distance; -------- And here is the error message I receive: ------------- EXCEPTION ------------- MSG: Must provide 2 nodes STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ Bio/Tree/TreeFunctionsI.pm:811 STACK toplevel ./phylo.pl:19 ------------------------------------- It seems that the nodes are not being read into the @nodes variable. Any help in figuring this out would be appreciated. Thanks, Bryan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Tue May 18 08:47:10 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 07:47:10 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> Message-ID: <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> On May 18, 2010, at 5:50 AM, Dave Messina wrote: > > On May 18, 2010, at 12:07, Peter wrote: > >> Or just turn off the download feature in github. > > That might be the best solution, at least for now. > > The download page is somewhat unfriendly anyway ? the tag names are truncated, there's no way to sort, and the descriptions are, well, not so descriptive (they appear to be just the last commit message). > > Probably better to keep > > http://www.bioperl.org/wiki/Getting_BioPerl > > as our main distribution point for downloads. > > > Dave We can turn that off for now, though it is a nice feature. If we need a replacement link for downloads we can use the repo.or.cz mirror link, for example: http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip chris From David.Messina at sbc.su.se Tue May 18 08:53:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 14:53:29 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: On May 18, 2010, at 14:47, Chris Fields wrote: > http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz > http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. I'll go ahead and update the nightly build links on http://www.bioperl.org/wiki/Getting_BioPerl to point to those, then, unless there are objections. Dave From cjfields at illinois.edu Tue May 18 09:56:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 08:56:45 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> On May 18, 2010, at 7:53 AM, Dave Messina wrote: > > On May 18, 2010, at 14:47, Chris Fields wrote: > >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip > > > Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. > > > I'll go ahead and update the nightly build links on > > http://www.bioperl.org/wiki/Getting_BioPerl > > to point to those, then, unless there are objections. > > > Dave This link also still works, even with the 'Downloads' tab off: http://github.com/bioperl/bioperl-live/archives/master Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. 'build' really never applied either, but oh well... chris From biopython at maubp.freeserve.co.uk Tue May 18 09:57:50 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 May 2010 14:57:50 +0100 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: On Tue, May 18, 2010 at 1:53 PM, Dave Messina wrote: > > > On May 18, 2010, at 14:47, Chris Fields wrote: > >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip > > > Oh right, I forgot about the mirror. Silly me. :) So probably > unnecessary to make our own nightly snapshots then. > Just like what you'd get from the big "Download Source" button on github? Equivalent to visiting this page: http://github.com/bioperl/bioperl-live/archives/master Peter From cjfields at illinois.edu Tue May 18 10:03:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 09:03:46 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> Message-ID: On May 18, 2010, at 8:56 AM, Chris Fields wrote: > On May 18, 2010, at 7:53 AM, Dave Messina wrote: > >> >> On May 18, 2010, at 14:47, Chris Fields wrote: >> >>> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >>> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip >> >> >> Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. >> >> >> I'll go ahead and update the nightly build links on >> >> http://www.bioperl.org/wiki/Getting_BioPerl >> >> to point to those, then, unless there are objections. >> >> >> Dave > > This link also still works, even with the 'Downloads' tab off: > > http://github.com/bioperl/bioperl-live/archives/master > > Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. > > 'build' really never applied either, but oh well... > > chris Oh, and on the topic of annotated tags for downloads: http://github.com/blog/651-annotated-downloads chris From David.Messina at sbc.su.se Tue May 18 10:23:34 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 16:23:34 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> Message-ID: <075CC735-0573-4E79-975F-23AD61C41C72@sbc.su.se> On May 18, 2010, at 16:03, Chris Fields wrote: > > This link also still works, even with the 'Downloads' tab off: > > http://github.com/bioperl/bioperl-live/archives/master Ah, great, thanks Chris and Peter. > Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. > > 'build' really never applied either, but oh well... Righto ? done. 'Snapshots' it is. > Oh, and on the topic of annotated tags for downloads: > > http://github.com/blog/651-annotated-downloads Heh, how timely. :) Good, that will solve the description part of it nicely. Dave From jay at jays.net Tue May 18 10:32:47 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 18 May 2010 09:32:47 -0500 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: <20100518030511.59C314202D@smtp1.rs.github.com> References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: Hi Florent, Can you add a line to the /Changes please? New features are especially great to add to that file. :) If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. You also might want to set your git config so your email is valid in your commits. e.g.: $ git config user.name "Jay Hannah" $ git config user.email jay at jays.net (these end up in ~/.gitconfig) Thanks! Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah On May 17, 2010, at 10:05 PM, noreply at github.com wrote: > Branch: refs/heads/master > Home: http://github.com/bioperl/bioperl-live > > Commit: 87c530525da35a981e9f7b06134184f0adfae156 > http://github.com/bioperl/bioperl-live/commit/87c530525da35a981e9f7b06134184f0adfae156 > Author: Florent Angly > Date: 2010-05-17 (Mon, 17 May 2010) > > Changed paths: > M Bio/Assembly/IO.pm > M Bio/Assembly/IO/ace.pm > M t/Assembly/Assembly.t > > Log Message: > ----------- > Implemented the 454 Newbler ACE assembly variant > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l From florent.angly at gmail.com Tue May 18 11:11:40 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 18 May 2010 08:11:40 -0700 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: <4BF2AE2C.209@gmail.com> Good idea Jay! I did as you suggested. Florent On 18/05/10 07:32, Jay Hannah wrote: > Can you add a line to the /Changes please? New features are especially great to add to that file.:) > > If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. > > You also might want to set your git config so your email is valid in your commits. e.g.: > From bimber at wisc.edu Tue May 18 11:28:06 2010 From: bimber at wisc.edu (Ben Bimber) Date: Tue, 18 May 2010 10:28:06 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? Message-ID: this question is more of a general perl one than bioperl specific, so I hope it is appropriate for this list: I am writing code that has two steps. the first generates a large, complex hash describing mutations. it takes a fair amount of time to run this step. the second step uses this data to perform downstream calculations. for the purposes of writing/debugging this downstream code, it would save me a lot of time if i could run the first step once, then store this hash in something like the file system. this way I could quickly load it, when debugging the downstream code without waiting for the hash to be recreated. is there a 'best practice' way to do something like this? I could save a tab-delimited file, which is human readable, but does not represent the structure of the hash, so I would need code to re-parse it. I assume I could probably do something along the lines of dumping a JSON string, then read/decode it. this is easy, but not so human-readable. is there another option i'm not thinking of? what do others do in this sort of situation? thanks in advance. -Ben From cjfields at illinois.edu Tue May 18 11:31:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 10:31:14 -0500 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: On May 18, 2010, at 9:32 AM, Jay Hannah wrote: > Hi Florent, > > Can you add a line to the /Changes please? New features are especially great to add to that file. :) > > If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. Agreed (or, +1, depending on your taste). Also, I would really like to break the habit of committing everything straight to trunk and promote using branches more. Branches are cheap. Something like: # on master git checkout -b 'topic/feature_foo' # switches over to branch 'topic/feature_foo' # hack hack hack # make commits # add tests # add to Changes # make more commits # push to remote branch # merge to master git checkout master git merge 'topic/feature_foo' # test test test, etc, push to origin or similar. Of course, there would be more to it (handling merge conflicts, etc), just need to get a decent workflow document started up. Ah tuits, where are you? > You also might want to set your git config so your email is valid in your commits. e.g.: > > $ git config user.name "Jay Hannah" > $ git config user.email jay at jays.net > (these end up in ~/.gitconfig) > > Thanks! > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I think these are only set there if you use --global, correct? Otherwise it's repo-specific, would be in .git/ somewhere. chris From s.denaxas at gmail.com Tue May 18 11:41:01 2010 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Tue, 18 May 2010 16:41:01 +0100 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Hello, it all really depends on your definition of readable. YAML is readable but requires a parser ; XML is readable but is bloated and requires a code and a parser. You can directly dump the output from Data::Dumper and then eval() it back in a hash. I would think this is the cleanest way if you specifically want to dump a hash and re-generate it with no additional code. You can set the $Data::Dumper::Indent flag to control how readable the hash is. hope this helps, Spiros On Tue, May 18, 2010 at 4:28 PM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. ?the first generates a large, > complex hash describing mutations. ?it takes a fair amount of time to > run this step. ?the second step uses this data to perform downstream > calculations. ?for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. ?this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? ?I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. ?I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. ?this is easy, but not so > human-readable. ?is there another option i'm not thinking of? ?what do > others do in this sort of situation? > > thanks in advance. > > -Ben > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From adsj at novozymes.com Tue May 18 11:57:12 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Tue, 18 May 2010 17:57:12 +0200 Subject: [Bioperl-l] storing/retrieving a large hash on file system? References: Message-ID: <87zkzxmcdj.fsf@topper.koldfront.dk> On Tue, 18 May 2010 10:28:06 -0500, Ben wrote: > is there a 'best practice' way to do something like this? The only one I can think of is "Don't make up your own format unless you really, really have to". > I could save a tab-delimited file, which is human readable, but does > not represent the structure of the hash, so I would need code to > re-parse it. I assume I could probably do something along the lines of > dumping a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? I would use YAML or JSON if I had to look at it "by hand" or if it had to be somehow portable. I would prefer those over CSV, which hasn't necessarily got well-defined handling of special chars, whitespace etc. If speed is more important, I think the Storable module is quite a bit quicker, but the format is "binary". Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From sdavis2 at mail.nih.gov Tue May 18 12:09:38 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 18 May 2010 12:09:38 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > There are a number of solutions on CPAN, probably. This is one maybe off the beaten path, but it is getting a lot of press in the NoSQL database realm: http://1978th.net/tokyocabinet/ Sean From David.Messina at sbc.su.se Tue May 18 12:19:18 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 18:19:18 +0200 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Hi Ben, Storable should do the trick. http://search.cpan.org/~ams/Storable-2.21/ It allows you to save arbitrary perl data structures to disk and load them back in without needing to dump into another format and then parse it later. Dave From cjfields at illinois.edu Tue May 18 12:22:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 11:22:09 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On May 18, 2010, at 10:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > -Ben Would a simple DB_File tied hash work? chris From cjfields at illinois.edu Tue May 18 12:25:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 11:25:11 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: <87zkzxmcdj.fsf@topper.koldfront.dk> References: <87zkzxmcdj.fsf@topper.koldfront.dk> Message-ID: On May 18, 2010, at 10:57 AM, Adam Sj?gren wrote: > On Tue, 18 May 2010 10:28:06 -0500, Ben wrote: > >> is there a 'best practice' way to do something like this? > > The only one I can think of is "Don't make up your own format unless you > really, really have to". > >> I could save a tab-delimited file, which is human readable, but does >> not represent the structure of the hash, so I would need code to >> re-parse it. I assume I could probably do something along the lines of >> dumping a JSON string, then read/decode it. this is easy, but not so >> human-readable. is there another option i'm not thinking of? what do >> others do in this sort of situation? > > I would use YAML or JSON if I had to look at it "by hand" or if it had > to be somehow portable. I would prefer those over CSV, which hasn't > necessarily got well-defined handling of special chars, whitespace etc. > > If speed is more important, I think the Storable module is quite a bit > quicker, but the format is "binary". > > > Best regards, > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Yes, that in combination with a AnyDBM tied hash would work (essentially what Bio::SeqFeature::Collection is under the hood). chris From sdavis2 at mail.nih.gov Tue May 18 12:39:44 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 18 May 2010 12:39:44 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On Tue, May 18, 2010 at 12:09 PM, Sean Davis wrote: > > > On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: > >> this question is more of a general perl one than bioperl specific, so >> I hope it is appropriate for this list: >> >> I am writing code that has two steps. the first generates a large, >> complex hash describing mutations. it takes a fair amount of time to >> run this step. the second step uses this data to perform downstream >> calculations. for the purposes of writing/debugging this downstream >> code, it would save me a lot of time if i could run the first step >> once, then store this hash in something like the file system. this >> way I could quickly load it, when debugging the downstream code >> without waiting for the hash to be recreated. >> >> is there a 'best practice' way to do something like this? I could >> save a tab-delimited file, which is human readable, but does not >> represent the structure of the hash, so I would need code to re-parse >> it. I assume I could probably do something along the lines of dumping >> a JSON string, then read/decode it. this is easy, but not so >> human-readable. is there another option i'm not thinking of? what do >> others do in this sort of situation? >> >> thanks in advance. >> >> > There are a number of solutions on CPAN, probably. This is one maybe off > the beaten path, but it is getting a lot of press in the NoSQL database > realm: > > http://1978th.net/tokyocabinet/ > > Just to be clear, I am assuming that the problem at hand is storing a key/value pair and then retrieving it later. If what you are talking about is a multi-level hash data structure, then Data::Dumper might be the easiest way to go. Sorry for the confusion.... Sean From bimber at wisc.edu Tue May 18 12:47:33 2010 From: bimber at wisc.edu (Ben Bimber) Date: Tue, 18 May 2010 11:47:33 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Thanks for all the suggestions. Storable seems like the simplest route. This will save me hours of staring at my computer. -Ben On Tue, May 18, 2010 at 11:39 AM, Sean Davis wrote: > > > On Tue, May 18, 2010 at 12:09 PM, Sean Davis wrote: >> >> >> On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: >>> >>> this question is more of a general perl one than bioperl specific, so >>> I hope it is appropriate for this list: >>> >>> I am writing code that has two steps. ?the first generates a large, >>> complex hash describing mutations. ?it takes a fair amount of time to >>> run this step. ?the second step uses this data to perform downstream >>> calculations. ?for the purposes of writing/debugging this downstream >>> code, it would save me a lot of time if i could run the first step >>> once, then store this hash in something like the file system. ?this >>> way I could quickly load it, when debugging the downstream code >>> without waiting for the hash to be recreated. >>> >>> is there a 'best practice' way to do something like this? ?I could >>> save a tab-delimited file, which is human readable, but does not >>> represent the structure of the hash, so I would need code to re-parse >>> it. ?I assume I could probably do something along the lines of dumping >>> a JSON string, then read/decode it. ?this is easy, but not so >>> human-readable. ?is there another option i'm not thinking of? ?what do >>> others do in this sort of situation? >>> >>> thanks in advance. >>> >> >> There are a number of solutions on CPAN, probably.? This is one maybe off >> the beaten path, but it is getting a lot of press in the NoSQL database >> realm: >> >> http://1978th.net/tokyocabinet/ >> > > Just to be clear, I am assuming that the problem at hand is storing a > key/value pair and then retrieving it later.? If what you are talking about > is a multi-level hash data structure, then Data::Dumper might be the easiest > way to go. > > Sorry for the confusion.... > > Sean > > > From bosborne11 at verizon.net Tue May 18 12:00:06 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 18 May 2010 12:00:06 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> Ben, I've use Storable to do things like this, for example: use Storable; my %species = ( "Sc" => 4932, # Saccharomyces cerevisiae "Ec" => 83333, # Escherichia coli K12 "Hs" => 9606 # H. sapiens ); my ($help,$id,$name); GetOptions( "s=s" => \$name, "i=i" => \$id, "h" => \$help ); usage() if ($help || !$id || !$name); my $storedHash = $name . ".dump"; # create index for a directory of fasta files my $db = Bio::DB::Fasta->new($name, -makeid => \&make_my_id); # extract species-specific data from gene2accession unless (-e $storedHash) { my $ref; # extract species-specific information from gene2accession open MYIN,"gene2accession" or die "No gene2accession file\n"; while () { my @arr = split "\t",$_; if ($arr[0] == $species{$name} && $arr[9] =~ /\d+/ && $arr[10] =~ /\d+/) { ($ref->{$arr[1]}->{"start"}, $ref->{$arr[1]}->{"end"}, $ref->{$arr[1]}->{"strand"}, $ref->{$arr[1]}->{"id"}) = ($arr[9], $arr[10], $arr[11], $arr[7]); } } # save species-specific information using Storable store $ref, $storedHash; } # retrieve the species-specific data from a stored hash my $ref = retrieve($storedHash); Take away all the parsing details and you can see that it's simple, and that Storable exports store() and retrieve(). Make up a file name, "store" the hash reference. Brian O. On May 18, 2010, at 11:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > -Ben > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Tue May 18 12:06:54 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 18 May 2010 12:06:54 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? Message-ID: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> bioperl-l, Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. We want these to point to github, yes? I'll fix it if the answer is 'yes'. Brian O. From cjfields at illinois.edu Tue May 18 14:04:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 13:04:55 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> Message-ID: <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> Yes. chris On May 18, 2010, at 11:06 AM, Brian Osborne wrote: > bioperl-l, > > Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. > > We want these to point to github, yes? I'll fix it if the answer is 'yes'. > > Brian O. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Tue May 18 15:39:48 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 12:39:48 -0700 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: <4BF2ED04.2050106@cornell.edu> Chris Fields wrote: > Agreed (or, +1, depending on your taste). Also, I would really like to break the habit of committing everything straight to trunk and promote using branches more. Branches are cheap. I did some work on our git workflow at http://www.bioperl.org/wiki/Using_Git#Developing_BioPerl, but it still needs some more work. So, there's the start of the workflow document I think. Rob From rmb32 at cornell.edu Tue May 18 15:42:44 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 12:42:44 -0700 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> References: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> Message-ID: <4BF2EDB4.4060907@cornell.edu> Based on your description, you want to use either: Storable - if you want to load the whole hash into memory or AnyDBM - if you want to be able to look things up from the hash without loading the whole thing in memory Rob From David.Messina at sbc.su.se Tue May 18 16:16:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 22:16:14 +0200 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: <4BF2ED04.2050106@cornell.edu> References: <20100518030511.59C314202D@smtp1.rs.github.com> <4BF2ED04.2050106@cornell.edu> Message-ID: <2D6396F7-E478-4544-B26A-F8A5799F2039@sbc.su.se> Nice, Rob! > I did some work on our git workflow at http://www.bioperl.org/wiki/Using_Git#Developing_BioPerl, but it still needs some more work. > > So, there's the start of the workflow document I think. From bpcwhite at gmail.com Tue May 18 17:34:06 2010 From: bpcwhite at gmail.com (Bryan White) Date: Tue, 18 May 2010 14:34:06 -0700 (PDT) Subject: [Bioperl-l] distance In-Reply-To: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> References: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> Message-ID: <1a2c786f-07e6-4499-8dc9-19a8d4169653@u3g2000prl.googlegroups.com> Thanks guys, I got it working! Bryan On May 18, 4:07?am, Jun Yin wrote: > Hi, Bryan, > > In your code: > ? ? ? ? my @nodes = $tree->find_node(-fieldname => > 'Homo_sapiens','Murinae'); > > First, You should specify the fieldname. The "fieldname" itself doesnot seem > like a valid key. The default field name is "id". > Second, the find_node method can only search for one specific term at one > time. > Third, distance method can only work on two nodes. > > So try this: > > my @nodes_human = $tree->find_node(-id => 'Homo_sapiens'); > my @nodes_murinae=$tree->find_node(-id=>'Murinae'); > > my $distance = $tree->distance(-nodes => > \($nodes_human[0],$nodes_murinae[0])); #Providing you only have one match > for "Homo_sapiens" and " Murinae". > > Cheers, > Jun Yin > Ph.D.?student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > -----Original Message----- > From: bioperl-l-boun... at lists.open-bio.org > > [mailto:bioperl-l-boun... at lists.open-bio.org] On Behalf Of Bryan White > Sent: Tuesday, May 18, 2010 10:49 AM > To: bioper... at bioperl.org > Subject: [Bioperl-l] distance > > Hello, > > I am trying to create a simple program to show me the distance between > taxa on a given tree. However, I am having trouble getting the bioperl > code to work. Here is the code that I am using: > -------- > #! /usr/bin/perl > use strict; > use warnings; > use Bio::Tree::Draw::Cladogram; > use Bio::TreeIO; > #use Bio::TreeFunctionsI; > > my $node1 = 'homo_sapiens'; > my $node2 = 'murinae'; > my $input = new Bio::TreeIO('-format' => 'newick', > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-file' => 'tree_mammalia_newick.txt'); > > my $tree = $input->next_tree; > > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); > > my $distance = $tree->distance(-nodes => \@nodes); > > #print $distance; > > -------- > > And here is the error message I receive: > > ------------- EXCEPTION ------------- > MSG: Must provide 2 nodes > STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ > Bio/Tree/TreeFunctionsI.pm:811 > STACK toplevel ./phylo.pl:19 > ------------------------------------- > > It seems that the nodes are not being read into the @nodes variable. > Any help in figuring this out would be appreciated. > > Thanks, > Bryan > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________ Information from ESET Smart Security, version of virus signature > database 5099 (20100509) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > __________ Information from ESET Smart Security, version of virus signature > database 5099 (20100509) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Wed May 19 00:17:24 2010 From: hartzell at alerce.com (George Hartzell) Date: Tue, 18 May 2010 21:17:24 -0700 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: <19443.26196.893455.52821@gargle.gargle.HOWL> Ben Bimber writes: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? Someone early on in the thread said not to invent another format, and I concur with that whole heartedly. Your choice of words, "large complex hash" makes me worry that you have something more than a large single level hash with sensible keys. Hashes of references to hashes to references to lists to etc... give me hives. If you'ld like to put add a nice general purpose tool to your kit, think about putting it into a simple SQLite database. Put it into an SQLite db and talk to it via DBI and you get some really cool tricks: - you can store complex stuff, - get back the just the part you need, a column, several columns, or the result of a join among multiple tables, - add indexes to make it Go Fast. and in the cool tricks category - you can use SQLite's backup interface to build the database in memory (nice and fast) then quickly stream it out to a disk based file for persistence. - same trick in reverse, if you know you're going to do a reasonably large number of complex queries you can stream a database into memory and then run your queries quickly. - rtree indexes are cool. Going forward you can scale things up to big databases (Pg, Oracle), you can provide safe multiuser access, transactions, etc.... (NFS not withstanding), etc.... g. From avilella at gmail.com Wed May 19 04:36:25 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 19 May 2010 09:36:25 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Message-ID: Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. From jun.yin at ucd.ie Wed May 19 06:40:51 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Wed, 19 May 2010 11:40:51 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: <008101caf73f$c04973c0$40dc5b40$%yin@ucd.ie> Hi, Albert, Check this page for the BioPerl wrapper on next-gen sequencing results http://bioperl.org/wiki/HOWTO:Short-read_assemblies_with_BWA And, I don't think Bio::SimpleAlign works on assembly files. It is targeted at global alignment, e.g. clustalw output file. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Albert Vilella Sent: Wednesday, May 19, 2010 9:36 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From maj at fortinbras.us Wed May 19 09:34:01 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 09:34:01 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > Hi, > > I would like to know what would be the best way to generate a SAM/BAM file > with cDNA alignments against the human reference from a bunch of > Bio::SimpleAlign > cDNA multiple sequence alignment objects. > > Considering I've got a way to map the cDNAs to chromosome coordinates, > how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 > human > coordinates? > > As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads > assemblies. > Should I be using some other tool existing not in bioperl? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed May 19 09:59:03 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 09:59:03 -0400 Subject: [Bioperl-l] out of memory issue In-Reply-To: References: Message-ID: Hi Shalabh and all, Sorry to comment on an old thread, but Dan Kortschak just pointed me to Tie::File. This may be the right solution to this issue. It turns out that DB_File will read in the entire file to memory anyway, while Tie::File (by MJD of course) works on pieces as it should. See Tie::File in CPAN and also this informative post: http://perl.plover.com/TieFile/why-not-DB_File cheers all- (someday, maybe next month, I'll return in force) MAJ ----- Original Message ----- From: "shalabh sharma" To: "bioperl-l" Sent: Wednesday, April 28, 2010 10:13 AM Subject: [Bioperl-l] out of memory issue > Hi All, > I am trying to make a hash of 38 Million ids but every time i get the > following message : > > perl(191) malloc: *** mmap(size=16777216) failed (error code=12) > *** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > Out of memory! > > I am working on MacOX 10.5.8 with 4GB of memory. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From avilella at gmail.com Wed May 19 11:00:27 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 19 May 2010 16:00:27 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Awesome, thanks. I'll give it a try :-) On Wed, May 19, 2010 at 2:34 PM, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use > of Bio::Assembly::IO::sam (I think). I know there is only read capability > for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing > writes (some assembly (so to speak) required...)-- cheers MAJ > ----- Original Message ----- From: "Albert Vilella" > To: > Sent: Wednesday, May 19, 2010 4:36 AM > Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > > >> Hi, >> >> I would like to know what would be the best way to generate a SAM/BAM file >> with cDNA alignments against the human reference from a bunch of >> Bio::SimpleAlign >> cDNA multiple sequence alignment objects. >> >> Considering I've got a way to map the cDNAs to chromosome coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >> human >> coordinates? >> >> As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads >> assemblies. >> Should I be using some other tool existing not in bioperl? >> >> Cheers, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > From lincoln.stein at gmail.com Wed May 19 12:40:31 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Wed, 19 May 2010 12:40:31 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the > use of Bio::Assembly::IO::sam (I think). I know there is only read > capability for B:A:I:sam, but Samtools may give you the appropriate wrapper > for doing writes (some assembly (so to speak) required...)-- cheers MAJ > ----- Original Message ----- From: "Albert Vilella" > > To: > Sent: Wednesday, May 19, 2010 4:36 AM > > Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > > > Hi, >> >> I would like to know what would be the best way to generate a SAM/BAM file >> with cDNA alignments against the human reference from a bunch of >> Bio::SimpleAlign >> cDNA multiple sequence alignment objects. >> >> Considering I've got a way to map the cDNAs to chromosome coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >> human >> coordinates? >> >> As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads >> assemblies. >> Should I be using some other tool existing not in bioperl? >> >> Cheers, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From john.marshall at sanger.ac.uk Wed May 19 12:22:19 2010 From: john.marshall at sanger.ac.uk (John Marshall) Date: Wed, 19 May 2010 17:22:19 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: On 19 May 2010, at 14:34, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates > the use of Bio::Assembly::IO::sam (I think). I've only briefly skimmed the B:T:R:Samtools documentation, but it would appear that this mostly encapsulates running the various samtools subcommands. These provide various manipulations on SAM and BAM files, but don't give you anything in terms of converting from not- SAM/BAM to SAM/BAM. > ----- Original Message ----- From: "Albert Vilella" > >> Considering I've got a way to map the cDNAs to chromosome >> coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against >> ~23.000 human >> coordinates? Perhaps I misunderstand, but if you already have a bunch of snippets of sequence and their mapped coordinates, then the easy way to generate a SAM file containing them is just to print it out by hand. A SAM file is just a tab-separated text file. For each sequence in your Bio::SimpleAlign objects, print out a line containing appropriate values for each of the 11 main SAM fields. (If the snippets are effectively unpaired, then MRNM,MPOS,ISIZE can just be *,0,0, and the only FLAG values you'll be choosing between are 0, 4, 16, and 20.) You should also start the file with an @SQ header for each of the chromosomes you've mapped against. (I'm assuming you've read http://samtools.sourceforge.net/SAM1.pdf -- it's a little vague, but should be more than enough to explain how to e.g. print out a basic SAM file with only the main fields.) Once you've printed out a simple SAM file, you can use B:T:R:Samtools or samtools directly or other tools to convert it to the binary BAM format and/or otherwise work with it. Cheers, John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From maj at fortinbras.us Wed May 19 13:26:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:26:16 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: <42F365BE46A545CE9DF897BA0B18B8EF@NewLife> CORRECTION: B:T:R:Samtools wraps samtools directly, as John said. Sorry, it's been a while... MAJ ----- Original Message ----- From: Lincoln Stein To: Mark A. Jensen Cc: Albert Vilella ; bioperl-l at bioperl.org Sent: Wednesday, May 19, 2010 12:40 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From maj at fortinbras.us Wed May 19 13:30:25 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:30:25 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Yes that's right John; B:T:R:Samtools is used within the B:A:.I:sam to do the write out with samtools command line pgms. Interested parties might look at Bio::Asssembly::IO::sam to see how Lincoln's Bio::DB::Sam (which uses the libbam library directly via XS, also not BioPerl proper but we love it anyway) might be employed. ----- Original Message ----- From: "John Marshall" To: Cc: "Albert Vilella" Sent: Wednesday, May 19, 2010 12:22 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM > On 19 May 2010, at 14:34, Mark A. Jensen wrote: >> Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use >> of Bio::Assembly::IO::sam (I think). > > I've only briefly skimmed the B:T:R:Samtools documentation, but it would > appear that this mostly encapsulates running the various samtools > subcommands. These provide various manipulations on SAM and BAM files, but > don't give you anything in terms of converting from not- SAM/BAM to SAM/BAM. > >> ----- Original Message ----- From: "Albert Vilella" > > >>> Considering I've got a way to map the cDNAs to chromosome coordinates, >>> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >>> human >>> coordinates? > > Perhaps I misunderstand, but if you already have a bunch of snippets of > sequence and their mapped coordinates, then the easy way to generate a SAM > file containing them is just to print it out by hand. > > A SAM file is just a tab-separated text file. For each sequence in your > Bio::SimpleAlign objects, print out a line containing appropriate values for > each of the 11 main SAM fields. (If the snippets are effectively unpaired, > then MRNM,MPOS,ISIZE can just be *,0,0, and the only FLAG values you'll be > choosing between are 0, 4, 16, and 20.) > > You should also start the file with an @SQ header for each of the chromosomes > you've mapped against. > > (I'm assuming you've read http://samtools.sourceforge.net/SAM1.pdf -- it's a > little vague, but should be more than enough to explain how to e.g. print out > a basic SAM file with only the main fields.) > > Once you've printed out a simple SAM file, you can use B:T:R:Samtools or > samtools directly or other tools to convert it to the binary BAM format > and/or otherwise work with it. > > Cheers, > > John > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a > charity registered in England with number 1021457 and a company registered in > England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed May 19 13:21:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:21:56 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: B:T:R:Samtools wraps Bio::Samtools ----- Original Message ----- From: Lincoln Stein To: Mark A. Jensen Cc: Albert Vilella ; bioperl-l at bioperl.org Sent: Wednesday, May 19, 2010 12:40 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Thu May 20 11:37:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 10:37:16 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> Message-ID: <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Yes, if you have time. I have started along that path already, but I'm sure there are lingering spots where links point to the wrong place, or subversion/svn is mentioned. chris On May 20, 2010, at 10:34 AM, Brian Osborne wrote: > Chris, > > Done, easy. Should I remove all references to SVN from the Wiki? > > Brian O. > > On May 18, 2010, at 2:04 PM, Chris Fields wrote: > >> Yes. >> >> chris >> >> On May 18, 2010, at 11:06 AM, Brian Osborne wrote: >> >>> bioperl-l, >>> >>> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >>> >>> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >>> >>> Brian O. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 20 12:05:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 11:05:56 -0500 Subject: [Bioperl-l] Regarding git commits... Message-ID: All, Please make sure to update your local git repos prior to doing commits and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. chris From florent.angly at gmail.com Thu May 20 12:22:50 2010 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 20 May 2010 09:22:50 -0700 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: <4BF561DA.1070700@gmail.com> On 20/05/10 09:05, Chris Fields wrote: > All, > > Please make sure to update your local git repos prior to doing commits That's done with "git pull", as mentioned on the wiki (http://www.bioperl.org/wiki/Using_Git), right? > and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bosborne11 at verizon.net Thu May 20 11:34:39 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 20 May 2010 11:34:39 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> Message-ID: <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> Chris, Done, easy. Should I remove all references to SVN from the Wiki? Brian O. On May 18, 2010, at 2:04 PM, Chris Fields wrote: > Yes. > > chris > > On May 18, 2010, at 11:06 AM, Brian Osborne wrote: > >> bioperl-l, >> >> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >> >> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >> >> Brian O. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu May 20 12:58:22 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 20 May 2010 09:58:22 -0700 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <4BF561DA.1070700@gmail.com> References: <4BF561DA.1070700@gmail.com> Message-ID: <4BF56A2E.8060309@bioperl.org> I think you want $ git pull upstream master http://help.github.com/forking/ Florent Angly wrote, On 5/20/10 9:22 AM: > On 20/05/10 09:05, Chris Fields wrote: >> All, >> >> Please make sure to update your local git repos prior to doing commits > That's done with "git pull", as mentioned on the wiki > (http://www.bioperl.org/wiki/Using_Git), right? > >> and pushing to master, and merge commits in properly if they don't >> match. Please please please don't save over files if they don't >> merge correctly. I just found out I had a prior commit that fixed >> the test number and removed old files that was completely clobbered, >> so I'm having to hand-merge those changes back in now. If it were >> anything more involved I would revert that prior commit completely. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu May 20 13:35:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 12:35:09 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <4BF56A2E.8060309@bioperl.org> References: <4BF561DA.1070700@gmail.com> <4BF56A2E.8060309@bioperl.org> Message-ID: <86401472-ECAB-4C21-8BD1-61AB37003F64@illinois.edu> Yes. The general syntax is: git pull If you have a read-write checkout directly from bioperl/bioperl-live.git, 'origin' should be set to that, and if you are on the a specific branch a simple 'git pull' will work (it implies 'git pull origin '). All collabs can do this. In the case of a forked repo (which anyone can do), it's a little trickier as it's essentially a branch from the repository at a specific point; it isn't automatically synced. You can see that here: http://github.com/bioperl/bioperl-live/network In order to sync with the original repo, you need to specify exactly which remote to pull from, likely not 'origin' (which is your forked repo), but 'upstream' or whatever you set the original bioperl read-only repo to via: git remote add upstream git://github.com/bioperl/bioperl-live.git Then, to sync, do: git pull upstream master git push # goes to your forked repo chris PS - Note on the graph linked to I just synced my branch using the above. On May 20, 2010, at 11:58 AM, Jason Stajich wrote: > I think you want > $ git pull upstream master > > http://help.github.com/forking/ > > Florent Angly wrote, On 5/20/10 9:22 AM: >> On 20/05/10 09:05, Chris Fields wrote: >>> All, >>> >>> Please make sure to update your local git repos prior to doing commits >> That's done with "git pull", as mentioned on the wiki (http://www.bioperl.org/wiki/Using_Git), right? >> >>> and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Thu May 20 14:06:13 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 20 May 2010 13:06:13 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: On May 20, 2010, at 11:05 AM, Chris Fields wrote: > Please make sure to update your local git repos prior to doing commits and pushing to master I thought git refused to push if your local was out of date? (I thought this was one of the general selling points of git?) It seems to be doing that to me, below. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah jhannah at jaysnet-MacBook:~/src/sandbox$ git push To git at github.com:jhannah/sandbox.git ! [rejected] master -> master (non-fast-forward) error: failed to push some refs to 'git at github.com:jhannah/sandbox.git' To prevent you from losing history, non-fast-forward updates were rejected Merge the remote changes before pushing again. See the 'Note about fast-forwards' section of 'git push --help' for details. From cjfields at illinois.edu Thu May 20 14:43:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 13:43:12 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. chris On May 20, 2010, at 1:06 PM, Jay Hannah wrote: > On May 20, 2010, at 11:05 AM, Chris Fields wrote: >> Please make sure to update your local git repos prior to doing commits and pushing to master > > I thought git refused to push if your local was out of date? (I thought this was one of the general selling points of git?) It seems to be doing that to me, below. > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > jhannah at jaysnet-MacBook:~/src/sandbox$ git push > To git at github.com:jhannah/sandbox.git > ! [rejected] master -> master (non-fast-forward) > error: failed to push some refs to 'git at github.com:jhannah/sandbox.git' > To prevent you from losing history, non-fast-forward updates were rejected > Merge the remote changes before pushing again. See the 'Note about > fast-forwards' section of 'git push --help' for details. > From jay at jays.net Thu May 20 15:09:00 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 20 May 2010 14:09:00 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> References: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> Message-ID: <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> On May 20, 2010, at 1:43 PM, Chris Fields wrote: > It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. So... you're saying don't commit if you don't have any idea what you're committing? :) git pull git diff git status if local is clean then -edit- git diff if it looks good then git commit git status if it looks good then git push Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah enjoys preaching to the choir ;) From cjfields at illinois.edu Thu May 20 15:24:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 14:24:17 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> References: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> Message-ID: <95305268-0D84-478C-A380-68E81742F18F@illinois.edu> On May 20, 2010, at 2:09 PM, Jay Hannah wrote: > On May 20, 2010, at 1:43 PM, Chris Fields wrote: >> It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. > > So... you're saying don't commit if you don't have any idea what you're committing? :) > > git pull > git diff > git status > if local is clean then > -edit- > git diff if it looks good then git commit > git status if it looks good then git push > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > enjoys preaching to the choir ;) Maybe the point is, if someone is having a problem with git either pulling from or pushing to the remote repo, it's very likely b/c of a merge conflict (git is trying to tell you something). There are lots of ways to resolve those (most easily by hand if the change is small). But saving over the top of someone else's commit in a re-cloned repo is definitely not one of them. Possibly a section of 'Using git' that needs some work? chris From charles.tilford at bms.com Thu May 20 16:27:27 2010 From: charles.tilford at bms.com (Charles Tilford) Date: Thu, 20 May 2010 16:27:27 -0400 Subject: [Bioperl-l] Bio::Species irritated with "unclassified sequences" Message-ID: <4BF59B2F.9000300@bms.com> Bio::Species::classification() is irritated with me when I provide it with a @class_array that is composed of one node, particularly: $obj->classification("unclassified sequences") AFAICT this is a valid, single node taxa "tree": http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=12908 Subroutine classification is expecting at least two class members, the problem with the above call crops up as: Use of uninitialized value $vals[1] in quotemeta at /stf/biocgi/tilfordc/patch_lib/Bio/Species.pm line 179 ( $Id: Species.pm 16700 2010-01-15 19:50:11Z dave_messina $) ... and the relevant code is: sub classification { my ($self, @vals) = @_; if (@vals) { if (ref($vals[0]) eq 'ARRAY') { @vals = @{$vals[0]}; } # make sure the lineage contains us as first or second element # (lineage may have subspecies, species, genus ...) my $name = $self->node_name; my ($genus, $species) = (quotemeta($vals[1]), quotemeta($vals[0])); That is, it's expecting at least (species, genus) in the array. Am I misusing classification(), or Bio::Species in general? I know it's named "Species", but I've been using it as a generic tree object for arbitrary taxonomy nodes, not just species and subspecies. This block a little lower down: unless ($self->rank) { # and that we are rank species $self->rank('species'); } ... implies that the module can be used for taxa ranks other than species. However, doing so would not prevent the module being aggravated over a null $vals[1]. The use case here is building Bio::Seq::RichSeq objects pulled from a (very large) sequence database, and then dumped / displayed with SeqIO. Most are well behaved, but there's a non-trivial number of 'artificial' constructs that don't root to an organism. -CAT From dimitark at bii.a-star.edu.sg Thu May 20 22:18:21 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 21 May 2010 10:18:21 +0800 Subject: [Bioperl-l] a problem with HspI module? Message-ID: <4BF5ED6D.6030506@bii.a-star.edu.sg> Hello guys, i think i found a problem with ' Bio::Search::HSP::HSPI'. Consider the following HSP: ------------- Score = 48.9 bits (115), Expect = 8e-04, Method: Compositional matrix adjust. Identities = 27/77 (35%), Positives = 40/77 (51%), Gaps = 14/77 (18%) Frame = +1 Query 371 PSGMLLA-----SCSDDMTLKIWSMKQEVCIHDLQAHNKEIYTIKWSPTGPATSNPNSNI 425 P LLA S S D T+++W ++Q VC H L H + +Y++ +SP G Sbjct 6955270 PGLQLLAFSHPPSASFDSTVRLWDVEQGVCTHTLMKHQEPVYSVAFSPDGK--------- 6955422 Query 426 MLASASFDSTVRLWDIE 442 LAS SFD V +W+ + Sbjct 6955423 YLASGSFDKYVHIWNTQ 6955473 --------------- The method 'frac_identical' is not functioning right. ------------- Title : frac_identical Usage : my $frac_id = $hsp->frac_identical( ['query'|'hit'|'total'] ); Function: Returns the fraction of identitical positions for this HSP Returns : Float in range 0.0 -> 1.0 Args : 'query' = num identical / length of query seq (without gaps) 'hit' = num identical / length of hit seq (without gaps) 'total' = num identical / length of alignment (with gaps) default = 'total' --------------- According to the method description, for the HSP above, 'frac_identical' should return '0.42' with 'hit'. But it doesnt. Now with 'hit' gives '0.13'. With 'total' gives normal result '0.35'. Thats all. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From cjfields at illinois.edu Thu May 20 22:24:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 21:24:46 -0500 Subject: [Bioperl-l] a problem with HspI module? In-Reply-To: <4BF5ED6D.6030506@bii.a-star.edu.sg> References: <4BF5ED6D.6030506@bii.a-star.edu.sg> Message-ID: It would be best to file this in a bug report, along with example data. chris On May 20, 2010, at 9:18 PM, Dimitar Kenanov wrote: > Hello guys, > i think i found a problem with ' Bio::Search::HSP::HSPI'. Consider the following HSP: > ------------- > Score = 48.9 bits (115), Expect = 8e-04, Method: Compositional matrix adjust. > Identities = 27/77 (35%), Positives = 40/77 (51%), Gaps = 14/77 (18%) > Frame = +1 > > Query 371 PSGMLLA-----SCSDDMTLKIWSMKQEVCIHDLQAHNKEIYTIKWSPTGPATSNPNSNI 425 > P LLA S S D T+++W ++Q VC H L H + +Y++ +SP G > Sbjct 6955270 PGLQLLAFSHPPSASFDSTVRLWDVEQGVCTHTLMKHQEPVYSVAFSPDGK--------- 6955422 > > Query 426 MLASASFDSTVRLWDIE 442 > LAS SFD V +W+ + > Sbjct 6955423 YLASGSFDKYVHIWNTQ 6955473 > --------------- > > The method 'frac_identical' is not functioning right. > ------------- > Title : frac_identical > Usage : my $frac_id = $hsp->frac_identical( ['query'|'hit'|'total'] ); > Function: Returns the fraction of identitical positions for this HSP > Returns : Float in range 0.0 -> 1.0 > Args : 'query' = num identical / length of query seq (without gaps) > 'hit' = num identical / length of hit seq (without gaps) > 'total' = num identical / length of alignment (with gaps) > default = 'total' > --------------- > According to the method description, for the HSP above, 'frac_identical' should return '0.42' with 'hit'. But it doesnt. Now with 'hit' gives '0.13'. With 'total' gives normal result '0.35'. > > Thats all. > Cheers > > Dimitar > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Fri May 21 13:44:26 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 10:44:26 -0700 Subject: [Bioperl-l] codon tables, finding ORFs Message-ID: <4BF6C67A.4040202@cornell.edu> Hi all, Right now, Bio::Tools::CodonTable uses as its 'standard' table the NCBI one, described at http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG1. This table recognizes three different start codons: the usual ATG, plus TTG and CTG (which I'd never heard of before looking there, seems they are rare). The issue is, if you use this codon scheme to find open reading frames in nucleotide sequences, you get some ORFs that I think a lot of biologists would be surprised at, from these two (rare?) start codons. Seems to me, this might be a problem. I mean, a naive user (which just about everyone is!) would expect the default codon table to only recognize the canonical ATG as a start, right? And would be rather displeased if BioPerl said (by default) that something starting with one of these rare codons was an open reading frame? So I guess my question is, do we think BioPerl (Bio::Tools::CodonTable) should really recognize these rare start codons by default? Rob From scott at scottcain.net Fri May 21 14:15:20 2010 From: scott at scottcain.net (Scott Cain) Date: Fri, 21 May 2010 14:15:20 -0400 Subject: [Bioperl-l] [Gmod-schema] Trying to load my first database In-Reply-To: References: Message-ID: Hi Daniel, I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. Of course, the file you sent me would be the same file you sent me yesterday; sorry for my poor memory :-) This file uncovered a bug in BioPerl in the FeatureIO module. While fixing the bug may be difficult, working around it might not be too bad. Additionally, I'm not sure we should fix it right now, as this is an effort underway to rework this section of BioPerl anyway. The good news is that the work around is fairly simple. In the GFF that MAKER created, when parsing prodigal output, it generates GFF lines like this: Contig125 pred_gff:prodigal_v2.00 match 104 1723 157.5 + . ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; The tricky part is this tag/value in the ninth column: type=ATG. The tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is in the third column, so when it is parsing this line of GFF, it tries to reassign the feature type to something that isn't valid. The work around is pretty easy: since "type" is a problematic tag, and it appears that the type tag here is defining the start type, I would suggest doing a global search and replace on the file to replace "type=" with "start_type=". I did that and the file loaded fine. I don't know if it is MAKER that creates this tag or the BioPerl parser for prodigal, but changing this at the source might be nice (of course, it might also break somebody else's code :-/ I'll enter a bug for this in the BioPerl bug tracker. Scott On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: > Hi Scott, > > I used Maker to generate the attached file. > > -Daniel > > On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >> Hi Daniel, >> >> Please keep the schema mailing list cc'ed in so the responses can be >> archived and more eyes than just mine can try to solve the problem. >> >> Can you send a sample of the GFF that is causing the problem? ?Any >> ontology term that is in Chado should be "legal." ?If there's >> something causing a problem, we need to figure out what it is. >> >> Scott >> >> >> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>> Hi Scott, >>> >>> I am using the same image as we used in class. ?I was able to load >>> each of the examples in the GMOD course (Pythium) and on the Chado >>> website (yeast). >>> >>> On another note, is there an easy way to navigate the ontology terms >>> that are legal and standard in both GFF3 and in Chado. ?I am having >>> trouble understanding how to convert from an arbitrary analysis (e.g. >>> Blasting KEGG) into a format that works. >>> >>> Thanks so much! >>> -Daniel >>> >>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>> Hi Daniel, >>>> >>>> That error message looks like one that would come from an older >>>> version of BioPerl. ?What version do you have? >>>> >>>> Scott >>>> >>>> >>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>> Hi Scott, >>>>> >>>>> Thanks for the reply. ?Sorry, I should have been able to track down >>>>> that error. ?Could you tell me what the following error means? >>>>> >>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>> --noexon --recreate_cache >>>>> (Re)creating the uniquename cache in the database... >>>>> Creating table... >>>>> Populating table... >>>>> Creating indexes... >>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>> Preparing data for inserting into the chado database >>>>> (This may take a while ...) >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>> with key type. If you were adding new keys in, perhaps you want to >>>>> make use >>>>> of the archetype method to allow registration to a more basic type >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>> ----------------------------------------------------------- >>>>> >>>>> Abnormal termination, trying to clean up... >>>>> >>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>> won't be needed)... >>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>> Exiting... >>>>> >>>>> >>>>> Thanks so much! >>>>> -Daniel >>>>> >>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>> Hi Daniel, >>>>>> >>>>>> The error message you got said that the GFF file that you are trying >>>>>> to load couldn't be found; are you sure the path was correct? ?The >>>>>> file itself looks OK. >>>>>> >>>>>> Scott >>>>>> >>>>>> >>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>> Hello All, >>>>>>> >>>>>>> I am trying to load my first genome from maker. ?Not sure what the >>>>>>> problem is... any help is awesome! ?I am attaching at least part of >>>>>>> the dataset. >>>>>>> >>>>>>> -Daniel >>>>>>> >>>>>>> >>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>> --noexon >>>>>>> >>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>> MSG: Could not open >>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>> such file or directory >>>>>>> STACK: Error::throw >>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>> ----------------------------------------------------------- >>>>>>> >>>>>>> Abnormal termination, trying to clean up... >>>>>>> >>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>> Exiting... >>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gmod-schema mailing list >>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ------------------------------------------------------------------------ >>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>> Ontario Institute for Cancer Research >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >> Ontario Institute for Cancer Research >> > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Fri May 21 14:45:01 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 21 May 2010 14:45:01 -0400 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6C67A.4040202@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> Message-ID: <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> Rob, The user will use translate(), which can do something like this: $prot_obj = $my_seq_object->translate(-orf => 1, -start => "atg" ); CodonTable does little more than hold the codon/aa data. All the useful work is done by translate(), and there are lots of options. Here is part of the documentation: Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! Brian O. On May 21, 2010, at 1:44 PM, Robert Buels wrote: > Hi all, > > Right now, Bio::Tools::CodonTable uses as its 'standard' table the NCBI one, described at http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG1. > > This table recognizes three different start codons: the usual ATG, plus TTG and CTG (which I'd never heard of before looking there, seems they are rare). > > The issue is, if you use this codon scheme to find open reading frames in nucleotide sequences, you get some ORFs that I think a lot of biologists would be surprised at, from these two (rare?) start codons. > > Seems to me, this might be a problem. I mean, a naive user (which just about everyone is!) would expect the default codon table to only recognize the canonical ATG as a start, right? And would be rather displeased if BioPerl said (by default) that something starting with one of these rare codons was an open reading frame? > > So I guess my question is, do we think BioPerl (Bio::Tools::CodonTable) should really recognize these rare start codons by default? > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From briano at bioteam.net Fri May 21 14:52:19 2010 From: briano at bioteam.net (Brian Osborne) Date: Fri, 21 May 2010 14:52:19 -0400 Subject: [Bioperl-l] What is CPAN doing? Message-ID: bioperl-l, Here's the POD for the translate() method: =head2 translate Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Or if you expect a complete coding sequence (CDS) translation, with inititator at the beginning and terminator at the end: $protein_seq_obj = $cds_seq_obj->translate(-complete => 1); Or if you want translate() to find the first initiation codon and return the corresponding protein: $protein_seq_obj = $cds_seq_obj->translate(-orf => 1); Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The complete CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translated protein object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! By default translate() translates termination codons to the some character (default is *), both internal and trailing codons. Setting "-complete" to 1 tells translate() to remove the trailing character. -offset is used for seqfeatures which contain the the \codon_start tag and can be set to 1, 2, or 3. This is the offset by which the sequence translation starts relative to the first base of the feature For details on codon tables used by translate() see L. Deprecated argument set (v. 1.5.1 and prior versions) where each argument is an element in an array: 1: character for terminator (optional), defaults to '*'. 2: character for unknown amino acid (optional), defaults to 'X'. 3: frame (optional), valid values are 0, 1, 2, defaults to 0. 4: codon table id (optional), defaults to 1. 5: complete coding sequence expected, defaults to 0 (false). 6: boolean, throw exception if not complete coding sequence (true), defaults to warning (false) 7: codontable, a custom Bio::Tools::CodonTable object (optional). =cut And here's what appears on CPAN: Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The full CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translation object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : character for terminator (optional) defaults to '*' character for unknown amino acid (optional) defaults to 'X' frame (optional) valid values 0, 1, 2, defaults to 0 codon table id (optional) defaults to 1 complete coding sequence expected, defaults to 0 (false) boolean, throw exception if not complete CDS (true) or defaults to warning (false) Most of the POD is missing - does anyone know why? Brian O. From barani at avesthagen.com Thu May 20 07:27:04 2010 From: barani at avesthagen.com (barani at avesthagen.com) Date: Thu, 20 May 2010 16:57:04 +0530 (IST) Subject: [Bioperl-l] Bio::Biblio find method proxy problem Message-ID: <49660.192.168.1.5.1274354824.squirrel@mail.avesthagen.com> Hi, Our lab is behind firewall. I am using FC10 Linux. I have set the httpproxy in /etc/bash_profile. I am searching for research articles using Bio::Biblio "find" method as shown in the following PERL code.This program executes well, when I run it in the command line. But when i use the same code in PERL CGI, it does not work.(Says "couldn't retrieve results from http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"). Is there anyway that I can set the proxy within the codes as argument and make it executable ? It will be very useful if you guys can help me. ##################################################### #!/usr/bin/perl use Bio::Biblio; use Bio::Biblio::IO; my $search="ABySS[title] AND (Simpson[Author]) AND 2009[dp]"; my $biblio = Bio::Biblio->new(-access=> 'eutils'); $biblio->find($search)->has_next; while(my $xml = $biblio->get_next){ my $io = Bio::Biblio::IO->new( -data => $xml, -format => 'medlinexml' ); my $article = $io->next_bibref(); >>>>>>>>>>>>>>> XML Parser >>>>>>>>>>>> <<<<<<<<<<<<<<< XML Parser <<<<<<<<<<<< } ############################################################### Best Regards barani ----------------------------------- Baranidharan P Project Head Bioinformatics - Genomics Group Avesthagen Ltd Ground floor, Innovator Building International Tech Park Bangalore Whitefield Bangalore - 560066 Ph. 09900727597 Mail Off .barani at avesthagen.com Per. baranidharanp at gmail.com ------------------------------------- From bbimber at gmail.com Fri May 21 09:58:03 2010 From: bbimber at gmail.com (Ben Bimber) Date: Fri, 21 May 2010 08:58:03 -0500 Subject: [Bioperl-l] CommandExts and arrays Message-ID: I am getting an error when trying to pass an array as a param with command exts. I hope there is something obvious i'm missing, but I cant seem to figure this out. I am trying to run the merge two BAM files using Bio::Tools::Run::Samtools using something like this: my $new_bam = Bio::Tools::Run::Samtools->new( -command => 'merge', -program_dir => '/usr/bin/samtools/', )->run( -obm => output_file.bam', -ibm => ['file1.bam', 'file2.bam'], ); When i use an array for the -ibm param, I get an error saying 'cannot use string 'file1' as an arrayref while strict refs in place'. The error comes from this code in CommandExts.pm, around line 989. adding 'no strict' right before the final line stops the error: # expand arrayrefs my $l = $#files; for (0..$l) { if (ref($files[$_]) eq 'ARRAY') { splice(@files, $_, 1, @{$files[$_]}); #error thrown from this line splice(@switches, $_, 1, ($switches[$_]) x @{$files[$_]}); } Thanks for the help. From daniel.quest at gmail.com Fri May 21 15:34:35 2010 From: daniel.quest at gmail.com (Daniel Quest) Date: Fri, 21 May 2010 12:34:35 -0700 Subject: [Bioperl-l] [Gmod-schema] Trying to load my first database In-Reply-To: References: Message-ID: Hey Scott, Thanks so much for the work on this. I have CC'ed Doug Hyatt, the developer of Prodigal so that he is aware of this problem. I am thinking that Maker just passed the Prodigal tags through and then the conflict happened on the Chado load. From my POV it is probably easiest to make small changes to the Prodigal GFF3 output to sync up with the Chado schema. Thanks so much -Daniel On Fri, May 21, 2010 at 11:15 AM, Scott Cain wrote: > Hi Daniel, > > I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. > > Of course, the file you sent me would be the same file you sent me > yesterday; sorry for my poor memory :-) > > This file uncovered a bug in BioPerl in the FeatureIO module. ?While > fixing the bug may be difficult, working around it might not be too > bad. ?Additionally, I'm not sure we should fix it right now, as this > is an effort underway to rework this section of BioPerl anyway. ?The > good news is that the work around is fairly simple. > > In the GFF that MAKER created, when parsing prodigal output, it > generates GFF lines like this: > > Contig125 ? ? ? pred_gff:prodigal_v2.00 match ? 104 ? ? 1723 ? ?157.5 > ?+ ? ? ? . > ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; > > The tricky part is this tag/value in the ninth column: type=ATG. ?The > tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is > in the third column, so when it is parsing this line of GFF, it tries > to reassign the feature type to something that isn't valid. ?The work > around is pretty easy: since "type" is a problematic tag, and it > appears that the type tag here is defining the start type, I would > suggest doing a global search and replace on the file to replace > "type=" with "start_type=". ?I did that and the file loaded fine. ?I > don't know if it is MAKER that creates this tag or the BioPerl parser > for prodigal, but changing this at the source might be nice (of > course, it might also break somebody else's code :-/ ?I'll enter a bug > for this in the BioPerl bug tracker. > > Scott > > > On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: >> Hi Scott, >> >> I used Maker to generate the attached file. >> >> -Daniel >> >> On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >>> Hi Daniel, >>> >>> Please keep the schema mailing list cc'ed in so the responses can be >>> archived and more eyes than just mine can try to solve the problem. >>> >>> Can you send a sample of the GFF that is causing the problem? ?Any >>> ontology term that is in Chado should be "legal." ?If there's >>> something causing a problem, we need to figure out what it is. >>> >>> Scott >>> >>> >>> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>>> Hi Scott, >>>> >>>> I am using the same image as we used in class. ?I was able to load >>>> each of the examples in the GMOD course (Pythium) and on the Chado >>>> website (yeast). >>>> >>>> On another note, is there an easy way to navigate the ontology terms >>>> that are legal and standard in both GFF3 and in Chado. ?I am having >>>> trouble understanding how to convert from an arbitrary analysis (e.g. >>>> Blasting KEGG) into a format that works. >>>> >>>> Thanks so much! >>>> -Daniel >>>> >>>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>>> Hi Daniel, >>>>> >>>>> That error message looks like one that would come from an older >>>>> version of BioPerl. ?What version do you have? >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>>> Hi Scott, >>>>>> >>>>>> Thanks for the reply. ?Sorry, I should have been able to track down >>>>>> that error. ?Could you tell me what the following error means? >>>>>> >>>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>>> --noexon --recreate_cache >>>>>> (Re)creating the uniquename cache in the database... >>>>>> Creating table... >>>>>> Populating table... >>>>>> Creating indexes... >>>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>>> Preparing data for inserting into the chado database >>>>>> (This may take a while ...) >>>>>> >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>>> with key type. If you were adding new keys in, perhaps you want to >>>>>> make use >>>>>> of the archetype method to allow registration to a more basic type >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Abnormal termination, trying to clean up... >>>>>> >>>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>>> won't be needed)... >>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>> Exiting... >>>>>> >>>>>> >>>>>> Thanks so much! >>>>>> -Daniel >>>>>> >>>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> The error message you got said that the GFF file that you are trying >>>>>>> to load couldn't be found; are you sure the path was correct? ?The >>>>>>> file itself looks OK. >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>>> Hello All, >>>>>>>> >>>>>>>> I am trying to load my first genome from maker. ?Not sure what the >>>>>>>> problem is... any help is awesome! ?I am attaching at least part of >>>>>>>> the dataset. >>>>>>>> >>>>>>>> -Daniel >>>>>>>> >>>>>>>> >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>>> --noexon >>>>>>>> >>>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>>> MSG: Could not open >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>>> such file or directory >>>>>>>> STACK: Error::throw >>>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>>> ----------------------------------------------------------- >>>>>>>> >>>>>>>> Abnormal termination, trying to clean up... >>>>>>>> >>>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>>> Exiting... >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gmod-schema mailing list >>>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>> Ontario Institute for Cancer Research >>> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > From rmb32 at cornell.edu Fri May 21 16:11:24 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 13:11:24 -0700 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> Message-ID: <4BF6E8EC.6050001@cornell.edu> Brian Osborne wrote: > The user will use translate(), which can do something like this: > > $prot_obj = $my_seq_object->translate(-orf => 1, > -start => "atg" ); Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. Rob From carson.holt at genetics.utah.edu Fri May 21 15:53:35 2010 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 21 May 2010 13:53:35 -0600 Subject: [Bioperl-l] [maker-devel] [Gmod-schema] Trying to load my first database In-Reply-To: Message-ID: That is correct. MAKER will just pass user defined GFF3 tags through rather than trying to make sense of them or trimming them off. Carson On 5/21/10 1:34 PM, "Daniel Quest" wrote: Hey Scott, Thanks so much for the work on this. I have CC'ed Doug Hyatt, the developer of Prodigal so that he is aware of this problem. I am thinking that Maker just passed the Prodigal tags through and then the conflict happened on the Chado load. From my POV it is probably easiest to make small changes to the Prodigal GFF3 output to sync up with the Chado schema. Thanks so much -Daniel On Fri, May 21, 2010 at 11:15 AM, Scott Cain wrote: > Hi Daniel, > > I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. > > Of course, the file you sent me would be the same file you sent me > yesterday; sorry for my poor memory :-) > > This file uncovered a bug in BioPerl in the FeatureIO module. While > fixing the bug may be difficult, working around it might not be too > bad. Additionally, I'm not sure we should fix it right now, as this > is an effort underway to rework this section of BioPerl anyway. The > good news is that the work around is fairly simple. > > In the GFF that MAKER created, when parsing prodigal output, it > generates GFF lines like this: > > Contig125 pred_gff:prodigal_v2.00 match 104 1723 157.5 > + . > ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; > > The tricky part is this tag/value in the ninth column: type=ATG. The > tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is > in the third column, so when it is parsing this line of GFF, it tries > to reassign the feature type to something that isn't valid. The work > around is pretty easy: since "type" is a problematic tag, and it > appears that the type tag here is defining the start type, I would > suggest doing a global search and replace on the file to replace > "type=" with "start_type=". I did that and the file loaded fine. I > don't know if it is MAKER that creates this tag or the BioPerl parser > for prodigal, but changing this at the source might be nice (of > course, it might also break somebody else's code :-/ I'll enter a bug > for this in the BioPerl bug tracker. > > Scott > > > On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: >> Hi Scott, >> >> I used Maker to generate the attached file. >> >> -Daniel >> >> On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >>> Hi Daniel, >>> >>> Please keep the schema mailing list cc'ed in so the responses can be >>> archived and more eyes than just mine can try to solve the problem. >>> >>> Can you send a sample of the GFF that is causing the problem? Any >>> ontology term that is in Chado should be "legal." If there's >>> something causing a problem, we need to figure out what it is. >>> >>> Scott >>> >>> >>> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>>> Hi Scott, >>>> >>>> I am using the same image as we used in class. I was able to load >>>> each of the examples in the GMOD course (Pythium) and on the Chado >>>> website (yeast). >>>> >>>> On another note, is there an easy way to navigate the ontology terms >>>> that are legal and standard in both GFF3 and in Chado. I am having >>>> trouble understanding how to convert from an arbitrary analysis (e.g. >>>> Blasting KEGG) into a format that works. >>>> >>>> Thanks so much! >>>> -Daniel >>>> >>>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>>> Hi Daniel, >>>>> >>>>> That error message looks like one that would come from an older >>>>> version of BioPerl. What version do you have? >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>>> Hi Scott, >>>>>> >>>>>> Thanks for the reply. Sorry, I should have been able to track down >>>>>> that error. Could you tell me what the following error means? >>>>>> >>>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>>> --noexon --recreate_cache >>>>>> (Re)creating the uniquename cache in the database... >>>>>> Creating table... >>>>>> Populating table... >>>>>> Creating indexes... >>>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>>> Preparing data for inserting into the chado database >>>>>> (This may take a while ...) >>>>>> >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>>> with key type. If you were adding new keys in, perhaps you want to >>>>>> make use >>>>>> of the archetype method to allow registration to a more basic type >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Abnormal termination, trying to clean up... >>>>>> >>>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>>> won't be needed)... >>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>> Exiting... >>>>>> >>>>>> >>>>>> Thanks so much! >>>>>> -Daniel >>>>>> >>>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> The error message you got said that the GFF file that you are trying >>>>>>> to load couldn't be found; are you sure the path was correct? The >>>>>>> file itself looks OK. >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>>> Hello All, >>>>>>>> >>>>>>>> I am trying to load my first genome from maker. Not sure what the >>>>>>>> problem is... any help is awesome! I am attaching at least part of >>>>>>>> the dataset. >>>>>>>> >>>>>>>> -Daniel >>>>>>>> >>>>>>>> >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>>> --noexon >>>>>>>> >>>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>>> MSG: Could not open >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>>> such file or directory >>>>>>>> STACK: Error::throw >>>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>>> ----------------------------------------------------------- >>>>>>>> >>>>>>>> Abnormal termination, trying to clean up... >>>>>>>> >>>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>>> Exiting... >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gmod-schema mailing list >>>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. scott at scottcain dot net >>>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. scott at scottcain dot net >>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Fri May 21 16:44:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 15:44:18 -0500 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6E8EC.6050001@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> Message-ID: <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> On May 21, 2010, at 3:11 PM, Robert Buels wrote: > Brian Osborne wrote: >> The user will use translate(), which can do something like this: >> $prot_obj = $my_seq_object->translate(-orf => 1, >> -start => "atg" ); > > Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. > > Rob Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. chris From rmb32 at cornell.edu Fri May 21 16:48:20 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 13:48:20 -0700 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> Message-ID: <4BF6F194.3080209@cornell.edu> Chris Fields wrote: > Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. > > chris Oh they're available, CodonTable has a number of tables in it that you make translate() use optionally, and there are bacterial tables in there (but they are not well documented). The default behavior is the 'NCBI standard' (eukaryotic) table that I linked to in the original post on this thread. What I am looking for is a discussion of what the best default behavior of $seq->translate( -orf => 1 ) with no arguments should be. But also, there should be better documentation about the codon tables that are available, I can add that in my topic/longest_orf branch. Rob From cjfields at illinois.edu Fri May 21 16:52:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 15:52:15 -0500 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6F194.3080209@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> <4BF6F194.3080209@cornell.edu> Message-ID: <06B1B1F1-979F-461C-BC9B-57A79C26CCE7@illinois.edu> On May 21, 2010, at 3:48 PM, Robert Buels wrote: > Chris Fields wrote: > > Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. > > > > chris > > Oh they're available, CodonTable has a number of tables in it that you make translate() use optionally, and there are bacterial tables in there (but they are not well documented). The default behavior is the 'NCBI standard' (eukaryotic) table that I linked to in the original post on this thread. > > What I am looking for is a discussion of what the best default behavior of $seq->translate( -orf => 1 ) with no arguments should be. Probably the simplest, with documentation on how to change it when needed. > But also, there should be better documentation about the codon tables that are available, I can add that in my topic/longest_orf branch. > > Rob Agreed. More docs never hurt. chris From bosborne11 at verizon.net Fri May 21 16:32:30 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 21 May 2010 16:32:30 -0400 Subject: [Bioperl-l] codon tables, finding ORFs Message-ID: Rob, translate() is one of these methods where reading the documentation is required. Or to put it another way, if you tried to use it without reading the docs most of the time you'd get a result that differs from what you wanted, given the variety of ways to use it, quite apart from the issue of the 3 initiation codons. So really, you have to read the docs, and they say: By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! My concern right now is that CPAN has removed this text and more! If you wanted to add an additional codon table and make it a default I have no problem with that. But, the "naive user" who doesn't read the documentation is probably still going to get "surprising" results. I don't think there's any way around RTFM for this method, changing the default table does not change this. Brian O. On May 21, 2010, at 4:11 PM, Robert Buels wrote: > Brian Osborne wrote: >> The user will use translate(), which can do something like this: >> $prot_obj = $my_seq_object->translate(-orf => 1, >> -start => "atg" ); > > Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. > > Rob From rmb32 at cornell.edu Fri May 21 17:53:34 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 14:53:34 -0700 Subject: [Bioperl-l] POD rendering question/problem (was [Fwd: What is CPAN doing?]) Message-ID: <4BF700DE.8040804@cornell.edu> Hi search.cpan.org maintainers, For one of the methods in BioPerl, a good portion of the POD that's in the source [1] isn't being rendered into HTML on its search.cpan.org page [2]. We'd like to get this POD displaying properly, either by us (BioPerl) tweaking the POD on our end, or by you guys tweaking whatever process is making the HTML. So: do we need to tweak our POD to get it displaying properly? If so, what needs to change in that POD? Rob [1] The source and POD in question: http://search.cpan.org/src/CJFIELDS/BioPerl-1.6.1/Bio/PrimarySeqI.pm [2] The HTML in question: http://search.cpan.org/~cjfields/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm#translate -------- Original Message -------- Subject: [Bioperl-l] What is CPAN doing? Date: Fri, 21 May 2010 14:52:19 -0400 From: Brian Osborne To: BioPerl List bioperl-l, Here's the POD for the translate() method: =head2 translate Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Or if you expect a complete coding sequence (CDS) translation, with inititator at the beginning and terminator at the end: $protein_seq_obj = $cds_seq_obj->translate(-complete => 1); Or if you want translate() to find the first initiation codon and return the corresponding protein: $protein_seq_obj = $cds_seq_obj->translate(-orf => 1); Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The complete CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translated protein object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! By default translate() translates termination codons to the some character (default is *), both internal and trailing codons. Setting "-complete" to 1 tells translate() to remove the trailing character. -offset is used for seqfeatures which contain the the \codon_start tag and can be set to 1, 2, or 3. This is the offset by which the sequence translation starts relative to the first base of the feature For details on codon tables used by translate() see L. Deprecated argument set (v. 1.5.1 and prior versions) where each argument is an element in an array: 1: character for terminator (optional), defaults to '*'. 2: character for unknown amino acid (optional), defaults to 'X'. 3: frame (optional), valid values are 0, 1, 2, defaults to 0. 4: codon table id (optional), defaults to 1. 5: complete coding sequence expected, defaults to 0 (false). 6: boolean, throw exception if not complete coding sequence (true), defaults to warning (false) 7: codontable, a custom Bio::Tools::CodonTable object (optional). =cut And here's what appears on CPAN: Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The full CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translation object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : character for terminator (optional) defaults to '*' character for unknown amino acid (optional) defaults to 'X' frame (optional) valid values 0, 1, 2, defaults to 0 codon table id (optional) defaults to 1 complete coding sequence expected, defaults to 0 (false) boolean, throw exception if not complete CDS (true) or defaults to warning (false) Most of the POD is missing - does anyone know why? Brian O. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Fri May 21 17:56:37 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 21 May 2010 23:56:37 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser Message-ID: <1274478997.1997.4.camel@gonzo.home.kblin.org> Hi list, hi Thomas, I've just bumped into the fact that bioperl-live still doesn't seem to support the hmmer3 hmmscan output format (thanks for the help at #bioperl). The nice folks on IRC pointed me at an email from Thomas Sharpton, noting that he was already working on a parser for this. So I thought I'd ask about the status of that before I run off writing my own. Is there anything I can help with? Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From rmb32 at cornell.edu Fri May 21 18:32:20 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 15:32:20 -0700 Subject: [Bioperl-l] [perl #75252] AutoReply: POD rendering question/problem (was [Fwd: What is CPAN doing?]) In-Reply-To: References: <4BF700DE.8040804@cornell.edu> Message-ID: <4BF709F4.4030705@cornell.edu> Doing a little more investigation, the culprit seems to actually be a stray old (non-installed) version of the module in our uploaded dist. No action required on your part, unless there is a tweak to the indexing that would have not made this module be the top hit. Status: resolved Rob From cjfields at illinois.edu Fri May 21 19:22:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 18:22:41 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274478997.1997.4.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> Message-ID: <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. Relevant commit msg here: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl =========================================== dev.open-bio.org - Authorized Access Only =========================================== ... bioperl-hmmer3/ ... perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 =========================================== dev.open-bio.org - Authorized Access Only =========================================== perllib cjfields$ chris On May 21, 2010, at 4:56 PM, Kai Blin wrote: > Hi list, hi Thomas, > > I've just bumped into the fact that bioperl-live still doesn't seem to > support the hmmer3 hmmscan output format (thanks for the help at > #bioperl). The nice folks on IRC pointed me at an email from Thomas > Sharpton, noting that he was already working on a parser for this. So I > thought I'd ask about the status of that before I run off writing my > own. Is there anything I can help with? > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin > Abteilung Mikrobiologie/Biotechnologie > Eberhard-Karls-Universit?t T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Mon May 24 06:19:55 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 12:19:55 +0200 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: Hi Ben, This looks like it might be a bug. When I ask for the filespec for the 'merge' command: my @filespec = $new_bam->filespec; print join "\n", @filespec, "\n"; I get: obm *ibm (note the leading '*'). Could you please submit this as a bug? http://www.bioperl.org/wiki/Bugs Thanks, Dave From David.Messina at sbc.su.se Mon May 24 09:00:56 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 15:00:56 +0200 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: <8565_1274696770_ZZg0Z3D5iEeCi.00_C34B77C6-2A3E-4B97-83C2-9BE8679CA331@sbc.su.se> Message-ID: > ok, i put in that bug. Thanks. > why exactly does having the asterisk indicate > this is a bug? i thought the asterisk indicated that multiple values > were allowed for that argument? Ah okay, my ignorance of this module is showing. :) > on a related note, are we supposed to be able to pass file names that > have spaces to command exts? on the few cases where this came up, i > have never seemed to get this to work right, so i just got rid of the > spaces. Sorry, I don't know. Paging Mark Jensen ? have you got a moment to look into this? Dave From diment at gmail.com Sat May 22 04:25:55 2010 From: diment at gmail.com (Kieren Diment) Date: Sat, 22 May 2010 18:25:55 +1000 Subject: [Bioperl-l] OT: The Perl Survey Message-ID: <63B7289C-E218-4BBB-A5A4-33AFECA4C867@gmail.com> Hi, Sorry about the off topic posting, but I'm trying to get as large a sample of programmers that use Perl as possible. The Perl Foundation have funded The Perl Survey, 2010 which is ready for people to complete at http://survey.perlfoundation.org. If you could spend a little time to complete the survey, we would be most grateful. It should take around 10-15 minutes to complete. The official announcement is at: http://news.perlfoundation.org/2010/05/grant-update-the-perl-survey-1.html Thanks in advance Kieren Diment From parametres-personnels at hotmail.fr Sun May 23 11:57:14 2010 From: parametres-personnels at hotmail.fr (NamNAme) Date: Sun, 23 May 2010 08:57:14 -0700 (PDT) Subject: [Bioperl-l] Pfam database Message-ID: <28650160.post@talk.nabble.com> Dear all, A few weeks ago I wrote a program that need the pfam database, and I tested it on the first version of pfam where each protein family sequences are in one file. But now I would like to test it on the last version of pfam but the organization changed. I've found a file called Pfam-A.fasta which contains sequences and the family they belong to. But the sequences inside are not complete. So, I've two questions : Why these sequences are not complete ? And, How can I find a file with complete sequences and the family they belong to ? Thank you for your help. Bye. P-S : There is the file pfamseq, I tried to make a script to read it and then retreive the database structure i want but, this file is enourmous and use too much memory so it crashed. -- View this message in context: http://old.nabble.com/Pfam-database-tp28650160p28650160.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From staffa at niehs.nih.gov Mon May 24 10:32:26 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Mon, 24 May 2010 10:32:26 -0400 Subject: [Bioperl-l] Restriction Enzymes Message-ID: So, back in 2007 I wrote a script using use Bio::Tools::RestrictionEnzyme; and generated some useful restriction maps for a client. This year he comes back to me with some very new enzymes that RestrictionEnzyme did not recognize. I erroneously thought that I needed an update of BioPerl, which I requested of SysAdmin. They did this across the board, there is no going back. (I did learn about the NEB file that needed to be installed) Now it appears that I must re-write my scripts because RestrictionEnzyme is not known to the latest version of bioperl. Is this true? How hard would it be to keep things backward compatible. Have I missed something here? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Enterprise-Wide Information Technology Support Contract National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From David.Messina at sbc.su.se Mon May 24 11:55:45 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 17:55:45 +0200 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <4046E576-2109-45BB-969C-F0B6F5749957@sbc.su.se> Hi Nick, Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that module, see also: http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > How hard would it be to keep things backward compatible. > Have I missed something here? I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones are intended to be at least partially backwards compatible. Dave From cjfields at illinois.edu Mon May 24 11:58:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 10:58:11 -0500 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > So, back in 2007 I wrote a script using > > use Bio::Tools::RestrictionEnzyme; > > and generated some useful restriction maps for a client. > > This year he comes back to me with some very new enzymes > that RestrictionEnzyme did not recognize. I erroneously thought that I > needed an update of BioPerl, which I requested of SysAdmin. > They did this across the board, there is no going back. > (I did learn about the NEB file that needed to be installed) > > Now it appears that I must re-write my scripts because RestrictionEnzyme is > not known to the latest version of bioperl. Is this true? > How hard would it be to keep things backward compatible. > Have I missed something here? Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, with removal at 1.6 (an announcement was made to the list regarding this, with no respondents, prior to the 1.6.0 release). The live version of the DEPRECATED docs are here: http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED If I understand correctly, the main reason was most development was put into Bio::Restriction modules, with very little change occurring in Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme and call it via a 'use lib' directive (or local::lib) or package it with your script, it should still work. However, from my perspective, if the older module wasn't recognizing specific enzyme cut sites, and the supported one did, wouldn't it be easier to modify your script to use the newer supported one instead? If the supported Bio::Restriction modules don't recognize the new sites I would consider that a bug. > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Enterprise-Wide Information Technology Support Contract > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina chris From maj at fortinbras.us Mon May 24 12:21:03 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 May 2010 12:21:03 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <13392E899AB04A0E8F66336CDBE417BE@NewLife> The rewrite this summer of Bio::Restriction made several funky enzyme (non-pal, non-symmetric) types workable. I would think it wouldn't be too onerous to convert code to the new system and have it work rather quickly- MAJ ----- Original Message ----- From: "Chris Fields" To: "Staffa, Nick (NIH/NIEHS) [C]" Cc: "Bioperl-l" Sent: Monday, May 24, 2010 11:58 AM Subject: Re: [Bioperl-l] Restriction Enzymes > On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > >> So, back in 2007 I wrote a script using >> >> use Bio::Tools::RestrictionEnzyme; >> >> and generated some useful restriction maps for a client. >> >> This year he comes back to me with some very new enzymes >> that RestrictionEnzyme did not recognize. I erroneously thought that I >> needed an update of BioPerl, which I requested of SysAdmin. >> They did this across the board, there is no going back. >> (I did learn about the NEB file that needed to be installed) >> >> Now it appears that I must re-write my scripts because RestrictionEnzyme is >> not known to the latest version of bioperl. Is this true? >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, > with removal at 1.6 (an announcement was made to the list regarding this, with > no respondents, prior to the 1.6.0 release). The live version of the > DEPRECATED docs are here: > > http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED > > If I understand correctly, the main reason was most development was put into > Bio::Restriction modules, with very little change occurring in > Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older > BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme > and call it via a 'use lib' directive (or local::lib) or package it with your > script, it should still work. > > However, from my perspective, if the older module wasn't recognizing specific > enzyme cut sites, and the supported one did, wouldn't it be easier to modify > your script to use the newer supported one instead? If the supported > Bio::Restriction modules don't recognize the new sites I would consider that a > bug. > >> Nick Staffa >> Telephone: 919-316-4569 (NIEHS: 6-4569) >> Scientific Computing Support Group >> NIEHS Enterprise-Wide Information Technology Support Contract >> National Institute of Environmental Health Sciences >> National Institutes of Health >> Research Triangle Park, North Carolina > > > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Mon May 24 12:54:29 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 24 May 2010 09:54:29 -0700 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] Message-ID: <4BFAAF45.4090400@cornell.edu> -------- Original Message -------- Subject: Re: [perl #75252] POD rendering question/problem (was [Fwd: [Bioperl-l] What is CPAN doing?]) Date: Mon, 24 May 2010 08:33:35 -0700 From: Graham Barr via RT Reply-To: search-rt at cpan.org To: rmb32 at cornell.edu References: <4BF700DE.8040804 at cornell.edu> <3F316B7B-DBCC-4668-94E4-45471ED5ACBB at pobox.com> On May 21, 2010, at 4:54 PM, Robert Buels via RT wrote: > > [1] The source and POD in question: > http://search.cpan.org/src/CJFIELDS/BioPerl-1.6.1/Bio/PrimarySeqI.pm > > [2] The HTML in question: > http://search.cpan.org/~cjfields/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm#translate that HTML is not for the above POD, it is located at http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/PrimarySeqI.pm the issue seems to be that when displaying the POD from the examples directory the source link is linking to the real module the html shown in [2] is representative of http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm IMO it is confusing to include 2 different copies of the same module. I would suggest adding to META.yml no_index: dir: - examples/root/lib Graham. From staffa at niehs.nih.gov Mon May 24 14:32:54 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Mon, 24 May 2010 14:32:54 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: <13392E899AB04A0E8F66336CDBE417BE@NewLife> Message-ID: Thanks, all. On 5/24/10 12:21 PM, "Mark A. Jensen" wrote: The rewrite this summer of Bio::Restriction made several funky enzyme (non-pal, non-symmetric) types workable. I would think it wouldn't be too onerous to convert code to the new system and have it work rather quickly- MAJ ----- Original Message ----- From: "Chris Fields" To: "Staffa, Nick (NIH/NIEHS) [C]" Cc: "Bioperl-l" Sent: Monday, May 24, 2010 11:58 AM Subject: Re: [Bioperl-l] Restriction Enzymes > On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > >> So, back in 2007 I wrote a script using >> >> use Bio::Tools::RestrictionEnzyme; >> >> and generated some useful restriction maps for a client. >> >> This year he comes back to me with some very new enzymes >> that RestrictionEnzyme did not recognize. I erroneously thought that I >> needed an update of BioPerl, which I requested of SysAdmin. >> They did this across the board, there is no going back. >> (I did learn about the NEB file that needed to be installed) >> >> Now it appears that I must re-write my scripts because RestrictionEnzyme is >> not known to the latest version of bioperl. Is this true? >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, > with removal at 1.6 (an announcement was made to the list regarding this, with > no respondents, prior to the 1.6.0 release). The live version of the > DEPRECATED docs are here: > > http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED > > If I understand correctly, the main reason was most development was put into > Bio::Restriction modules, with very little change occurring in > Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older > BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme > and call it via a 'use lib' directive (or local::lib) or package it with your > script, it should still work. > > However, from my perspective, if the older module wasn't recognizing specific > enzyme cut sites, and the supported one did, wouldn't it be easier to modify > your script to use the newer supported one instead? If the supported > Bio::Restriction modules don't recognize the new sites I would consider that a > bug. > >> Nick Staffa >> Telephone: 919-316-4569 (NIEHS: 6-4569) >> Scientific Computing Support Group >> NIEHS Enterprise-Wide Information Technology Support Contract >> National Institute of Environmental Health Sciences >> National Institutes of Health >> Research Triangle Park, North Carolina > > > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bbimber at gmail.com Mon May 24 15:43:07 2010 From: bbimber at gmail.com (Ben Bimber) Date: Mon, 24 May 2010 14:43:07 -0500 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: <1274729912.4373.19.camel@epistle> References: <1274729912.4373.19.camel@epistle> Message-ID: as long as the limitation is known, i dont see it as a big problem. On Mon, May 24, 2010 at 2:38 PM, Dan Kortschak wrote: > Hi Dave, > > You are right, spaces are not allowed - they are actively stripped from > filenames (the other option would be to escape or otherwise quote them - > the is certainly doable, is there enough of a call to do this?). > > You can use last_execution() to see what was attempted to be run, this > should show the filenames (and everything else) that were used in the > IPC call. > > cheers > Dan > > On Mon, 2010-05-24 at 12:00 -0400, Dave Messina wrote: >> Message: 2 >> Date: Mon, 24 May 2010 15:00:56 +0200 >> From: Dave Messina >> Subject: Re: [Bioperl-l] CommandExts and arrays >> To: Ben Bimber >> Message-ID: >> Content-Type: text/plain; charset=windows-1252 >> >> > ok, i put in that bug. >> >> Thanks. >> >> >> > why exactly does having the asterisk indicate >> > this is a bug? ?i thought the asterisk indicated that multiple >> values >> > were allowed for that argument? >> >> Ah okay, my ignorance of this module is showing. :) >> >> >> > on a related note, are we supposed to be able to pass file names >> that >> > have spaces to command exts? ?on the few cases where this came up, i >> > have never seemed to get this to work right, so i just got rid of >> the >> > spaces. >> >> Sorry, I don't know. >> >> >> Paging Mark Jensen ? have you got a moment to look into this? >> >> >> Dave > > From David.Messina at sbc.su.se Mon May 24 18:03:19 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 00:03:19 +0200 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: <4BFAAF45.4090400@cornell.edu> References: <4BFAAF45.4090400@cornell.edu> Message-ID: From: Graham Barr via RT > IMO it is confusing to include 2 different copies of the same module. I agree. It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. Dave From dan.kortschak at adelaide.edu.au Mon May 24 15:38:32 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 25 May 2010 05:08:32 +0930 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: <1274729912.4373.19.camel@epistle> Hi Dave, You are right, spaces are not allowed - they are actively stripped from filenames (the other option would be to escape or otherwise quote them - the is certainly doable, is there enough of a call to do this?). You can use last_execution() to see what was attempted to be run, this should show the filenames (and everything else) that were used in the IPC call. cheers Dan On Mon, 2010-05-24 at 12:00 -0400, Dave Messina wrote: > Message: 2 > Date: Mon, 24 May 2010 15:00:56 +0200 > From: Dave Messina > Subject: Re: [Bioperl-l] CommandExts and arrays > To: Ben Bimber > Message-ID: > Content-Type: text/plain; charset=windows-1252 > > > ok, i put in that bug. > > Thanks. > > > > why exactly does having the asterisk indicate > > this is a bug? i thought the asterisk indicated that multiple > values > > were allowed for that argument? > > Ah okay, my ignorance of this module is showing. :) > > > > on a related note, are we supposed to be able to pass file names > that > > have spaces to command exts? on the few cases where this came up, i > > have never seemed to get this to work right, so i just got rid of > the > > spaces. > > Sorry, I don't know. > > > Paging Mark Jensen ? have you got a moment to look into this? > > > Dave From Russell.Smithies at agresearch.co.nz Mon May 24 18:01:25 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 25 May 2010 10:01:25 +1200 Subject: [Bioperl-l] taxonomy nightmare Message-ID: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> We've upgraded BioPerl recently and now lots of stuff appears broken though I'm sure it's not as bad as it looks. Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm deluged with errors. AFAIK, there were no changes to Perl 5.8.8 Any help greatly appreciated!!! Thanx, Russell Smithies ----------------------------------- #! /usr/local/bin/perl use strict; use warnings; use Bio::DB::Taxonomy; use Data::Dumper; my $idx_dir = '/data/home/smithiesr/taxonomy'; my $TAXDIR = "/data/home/smithiesr/taxdump"; my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -nodesfile => $nodefile, -namesfile => $namesfile, -directory => $idx_dir, -force => 1) or die $!; my $human = $db->get_taxon(-name => 'Homo sapiens'); print Dumper $human; ----------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references are not implemented in the version of perl at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. Compilation failed in require at (eval 21) line 3. ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 STACK: Bio::DB::Taxonomy::_load_tax_module /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 STACK: Bio::DB::Taxonomy::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 STACK: taxonomyTest.pl:15 ----------------------------------------------------------- ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Mon May 24 22:17:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:17:57 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: On May 24, 2010, at 7:46 PM, Thomas Sharpton wrote: > Hi all, > > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and hmmsearch output. It appears to be fully functional and I have had a handful of users test and integrate this module. > > We decided to push this module into a standalone svn repo (bioperl-hmmer3). I am a bit confused about why the repo is empty, as I committed the code back in March and have made a few updates since then to correct bugs identified by test users. Perhaps I screwed something up during the last commit. The commit doesn't show any added files. The original code apparently is on a branch of bioperl-dev, though (think this was pointed out on IRC): http://github.com/bioperl/bioperl-dev/tree/bioperl-hmmer3 Maybe that was the mixup? > Chris, should I just add the code to the github repo? I might need a pointer on how to do this without screwing it up. I started up a new github repo for it. You would just need to let me know your github ID so I can add you to it. Then (after you are added) the instructions are here: http://github.com/bioperl/bioperl-hmmer3 > Kai, I can mail an archive of the parser your way if you're in a hurry. With some assistance from Chris et. al., I expect the code to be in the github repo by the day's end. > > Apologies for any confusion and the delayed reply - I've been on the road. > > Best, > Tom No problem. Thanks for letting us know. chris > >> On May 21, 2010 4:24 PM, "Chris Fields" wrote: >> >> To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. >> >> Relevant commit msg here: >> >> http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html >> >> perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl >> =========================================== >> dev.open-bio.org - Authorized Access Only >> =========================================== >> ... >> bioperl-hmmer3/ >> ... >> perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 >> =========================================== >> dev.open-bio.org - Authorized Access Only >> =========================================== >> perllib cjfields$ >> >> chris >> >> On May 21, 2010, at 4:56 PM, Kai Blin wrote: >> >> > Hi list, hi Thomas, >> > >> > I've just bumped into the ... >> > From cjfields at illinois.edu Mon May 24 22:20:38 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:20:38 -0500 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: On May 24, 2010, at 5:03 PM, Dave Messina wrote: > From: Graham Barr via RT >> IMO it is confusing to include 2 different copies of the same module. > > I agree. > > It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). > > In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). > > I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. > > So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. > > Dave I agree. We should either prevent indexing or remove it, unless someone can suggest it's utility. chris From thomas.sharpton at gmail.com Mon May 24 20:46:04 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Mon, 24 May 2010 17:46:04 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: Hi all, To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and hmmsearch output. It appears to be fully functional and I have had a handful of users test and integrate this module. We decided to push this module into a standalone svn repo (bioperl-hmmer3). I am a bit confused about why the repo is empty, as I committed the code back in March and have made a few updates since then to correct bugs identified by test users. Perhaps I screwed something up during the last commit. Chris, should I just add the code to the github repo? I might need a pointer on how to do this without screwing it up. Kai, I can mail an archive of the parser your way if you're in a hurry. With some assistance from Chris et. al., I expect the code to be in the github repo by the day's end. Apologies for any confusion and the delayed reply - I've been on the road. Best, Tom On May 21, 2010 4:24 PM, "Chris Fields" wrote: To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. Relevant commit msg here: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html perllib cjfields$ svn ls svn+ssh:// dev.open-bio.org/home/svn-repositories/bioperl =========================================== dev.open-bio.org - Authorized Access Only =========================================== ... bioperl-hmmer3/ ... perllib cjfields$ svn ls svn+ssh:// dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 =========================================== dev.open-bio.org - Authorized Access Only =========================================== perllib cjfields$ chris On May 21, 2010, at 4:56 PM, Kai Blin wrote: > Hi list, hi Thomas, > > I've just bumped into the ... From Russell.Smithies at agresearch.co.nz Mon May 24 22:25:41 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 25 May 2010 14:25:41 +1200 Subject: [Bioperl-l] taxonomy nightmare In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32D88D065AA@exchsth.agresearch.co.nz> Fixed I think, some file permissions got screwed somewhere ;-( --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > Sent: Tuesday, 25 May 2010 10:01 a.m. > To: 'bioperl-l' > Subject: [Bioperl-l] taxonomy nightmare > > We've upgraded BioPerl recently and now lots of stuff appears broken > though I'm sure it's not as bad as it looks. > Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm > deluged with errors. > AFAIK, there were no changes to Perl 5.8.8 > > Any help greatly appreciated!!! > > Thanx, > > Russell Smithies > > ----------------------------------- > #! /usr/local/bin/perl > > use strict; > use warnings; > use Bio::DB::Taxonomy; > use Data::Dumper; > > my $idx_dir = '/data/home/smithiesr/taxonomy'; > my $TAXDIR = "/data/home/smithiesr/taxdump"; > > my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); > > my $db = new Bio::DB::Taxonomy(-source => 'flatfile', > -nodesfile => $nodefile, > -namesfile => $namesfile, > -directory => $idx_dir, > -force => 1) or die $!; > > my $human = $db->get_taxon(-name => 'Homo sapiens'); > print Dumper $human; > > ----------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references > are not implemented in the version of perl at > /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. > Compilation failed in require at (eval 21) line 3. > ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. > > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::Root::Root::_load_module > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 > STACK: Bio::DB::Taxonomy::_load_tax_module > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 > STACK: Bio::DB::Taxonomy::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 > STACK: taxonomyTest.pl:15 > ----------------------------------------------------------- > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dimitark at bii.a-star.edu.sg Mon May 24 22:28:19 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 25 May 2010 10:28:19 +0800 Subject: [Bioperl-l] about gene names Message-ID: <4BFB35C3.4010808@bii.a-star.edu.sg> Hi guys, i have a question How can I get only the gene names from NCBI Gene when i have the sequence id? For example with this id - NP_005264.2 i can search NCBI Gene online but i want to get only the gene name automatically. I was checking the Bio::DB::EntrezGene module but it didnt became clear to me if i can use it for my purposes. Thank you in advance. Greetings Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From David.Messina at sbc.su.se Mon May 24 18:23:32 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 00:23:32 +0200 Subject: [Bioperl-l] Pfam database In-Reply-To: <28650160.post@talk.nabble.com> References: <28650160.post@talk.nabble.com> Message-ID: Hi, The release notes for the latest Pfam (24.0) do mention file format changes, but I could not find documentation describing those changes. Your questions relating to that would best be answered by the people at Pfam. You can contact them here: pfam-help at sanger.ac.uk However, please do report back to us what you learn. It's quite likely our code is not compatible with Pfam 24.0, and we would need that information to fix it. Thanks, Dave On May 23, 2010, at 5:57 PM, NamNAme wrote: > > Dear all, > A few weeks ago I wrote a program that need the pfam database, and I tested > it on the first version of pfam where each protein family sequences are in > one file. > But now I would like to test it on the last version of pfam but the > organization changed. > I've found a file called Pfam-A.fasta which contains sequences and the > family they belong to. But the sequences inside are not complete. > So, I've two questions : Why these sequences are not complete ? > And, How can I find a file with complete sequences and the family they > belong to ? > Thank you for your help. > Bye. > P-S : There is the file pfamseq, I tried to make a script to read it and > then retreive the database structure i want but, this file is enourmous and > use too much memory so it crashed. > -- > View this message in context: http://old.nabble.com/Pfam-database-tp28650160p28650160.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 24 22:54:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:54:03 -0500 Subject: [Bioperl-l] taxonomy nightmare In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> Message-ID: You may have a version of perl that either doesn't include Scalar::Util or includes a broken version. Try installing Scalar::Util from CPAN to see if it fixes the problem. Here's a link on the problem: http://www.perlmonks.org/?node_id=424737 chris On May 24, 2010, at 5:01 PM, Smithies, Russell wrote: > We've upgraded BioPerl recently and now lots of stuff appears broken though I'm sure it's not as bad as it looks. > Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm deluged with errors. > AFAIK, there were no changes to Perl 5.8.8 > > Any help greatly appreciated!!! > > Thanx, > > Russell Smithies > > ----------------------------------- > #! /usr/local/bin/perl > > use strict; > use warnings; > use Bio::DB::Taxonomy; > use Data::Dumper; > > my $idx_dir = '/data/home/smithiesr/taxonomy'; > my $TAXDIR = "/data/home/smithiesr/taxdump"; > > my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); > > my $db = new Bio::DB::Taxonomy(-source => 'flatfile', > -nodesfile => $nodefile, > -namesfile => $namesfile, > -directory => $idx_dir, > -force => 1) or die $!; > > my $human = $db->get_taxon(-name => 'Homo sapiens'); > print Dumper $human; > > ----------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references are not implemented in the version of perl at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. > Compilation failed in require at (eval 21) line 3. > ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. > Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 > STACK: Bio::DB::Taxonomy::_load_tax_module /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 > STACK: Bio::DB::Taxonomy::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 > STACK: taxonomyTest.pl:15 > ----------------------------------------------------------- > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Tue May 25 01:58:27 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 07:58:27 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: <1274767107.2271.11.camel@gonzo.home.kblin.org> On Mon, 2010-05-24 at 17:46 -0700, Thomas Sharpton wrote: > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and > hmmsearch output. It appears to be fully functional and I have had a handful > of users test and integrate this module. That's pretty much what I need. Thanks to the folks on IRC, I got pointed at the correct repository yesterday evening. > Kai, I can mail an archive of the parser your way if you're in a hurry. With > some assistance from Chris et. al., I expect the code to be in the github > repo by the day's end. No worries, that's fine. I've got a checkout of the standalone repository that I can play with now. Is there any particular reason you decided to create a new parser instead of integrating the code into the existing hmmer.pm module? I haven't looked at how the hmmer2 hmmsearch output looks compared to the hmmer3 version and if there's any conflicts. Cheers, Kai PS: Tom, sorry for the repost, forgot to CC the list. Pre-coffee email sending, it never works. -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From dan.kortschak at adelaide.edu.au Tue May 25 02:12:27 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 25 May 2010 15:42:27 +0930 Subject: [Bioperl-l] Bioperl-l Digest, Vol 85, Issue 34 In-Reply-To: References: Message-ID: <1274767947.32025.49.camel@zoidberg.mbs.adelaide.edu.au> Dimitar, Try having a look through the EUtilities cookbook: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook cheers Dan On Tue, 2010-05-25 at 01:58 -0400, Dimitar Kenanov wrote: > Date: Tue, 25 May 2010 10:28:19 +0800 > From: Dimitar Kenanov > Subject: [Bioperl-l] about gene names > To: "'bioperl-l at bioperl.org'" > Message-ID: <4BFB35C3.4010808 at bii.a-star.edu.sg> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi guys, > i have a question How can I get only the gene names from NCBI Gene > when > i have the sequence id? For example with this id - NP_005264.2 i can > search NCBI Gene online but i want to get only the gene name > automatically. I was checking the Bio::DB::EntrezGene module but it > didnt became clear to me if i can use it for my purposes. > > Thank you in advance. > > Greetings > Dimitar > From kai.blin at biotech.uni-tuebingen.de Tue May 25 07:41:59 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 13:41:59 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> On Mon, 2010-05-24 at 17:46 -0700, Thomas Sharpton wrote: Hi Tom, > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and > hmmsearch output. It appears to be fully functional and I have had a handful > of users test and integrate this module. I've tried using the hmmer3 parser for my script, but it seems like the hmm_name member of the result object isn't set, and I'm using that. I saw this before when trying to write a test case that integrates into the Bioperl test framework. (Error output is Can't locate object method "hmm_name" via package "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, line 152.) I'm happy to work on this a bit myself if you're not working on this anyway, so we don't duplicate efforts. I just don't get why the hmm_name isn't picked up correctly, and I haven't been able to figure out how to get at the output that $self->debug() when running the tests. Oh well, it's a learning experience in any case. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Tue May 25 08:37:47 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 14:37:47 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> Message-ID: <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> On Tue, 2010-05-25 at 13:41 +0200, Kai Blin wrote: Whined a little too early. > I've tried using the hmmer3 parser for my script, but it seems like the > hmm_name member of the result object isn't set, and I'm using that. > > I saw this before when trying to write a test case that integrates into > the Bioperl test framework. > (Error output is Can't locate object method "hmm_name" via package > "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, > line 152.) I just found the stuff I needed to add to the hmmer3Result.pm file. I'm currently busy adding a comprehensive test case for this module that integrates into the bioperl test harness. What's the best way to publish my additions? Do I create a fork of bioperl-live on Github or how is this handled? Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Tue May 25 08:46:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 07:46:48 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> Message-ID: On May 25, 2010, at 7:37 AM, Kai Blin wrote: > On Tue, 2010-05-25 at 13:41 +0200, Kai Blin wrote: > > Whined a little too early. > >> I've tried using the hmmer3 parser for my script, but it seems like the >> hmm_name member of the result object isn't set, and I'm using that. >> >> I saw this before when trying to write a test case that integrates into >> the Bioperl test framework. >> (Error output is Can't locate object method "hmm_name" via package >> "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, >> line 152.) > > I just found the stuff I needed to add to the hmmer3Result.pm file. I'm > currently busy adding a comprehensive test case for this module that > integrates into the bioperl test harness. > > What's the best way to publish my additions? Do I create a fork of > bioperl-live on Github or how is this handled? Create a fork of the proper repository, which will eventually be bioperl-hmmer3. However, Thomas hasn't added that code in yet; not sure how much has changed since the original deposition into bioperl-dev in March, but it's possible more has been done. chris > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin > Abteilung Mikrobiologie/Biotechnologie > Eberhard-Karls-Universit?t T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > From dueldor at yahoo.com Tue May 25 08:30:59 2010 From: dueldor at yahoo.com (Dubi Eldor) Date: Tue, 25 May 2010 05:30:59 -0700 (PDT) Subject: [Bioperl-l] How to find secondary structures Message-ID: <766825.32163.qm@web37308.mail.mud.yahoo.com> Hi, I am a new user of BioPerl. I would like to find secondary sturctures in sequences of ~10K nt long. Are there any functions that can help me? Thanks, Dubi From David.Messina at sbc.su.se Tue May 25 09:58:38 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 15:58:38 +0200 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <3065CE83-3E61-4080-B475-F609E74A9FD4@sbc.su.se> On May 25, 2010, at 15:54, Staffa, Nick (NIH/NIEHS) [C] wrote: > The tutorial, I discovered, has an error. > a very bad experience for a trusting newby. > whereas the tutorial has these bold examples in the first box under > Identifying restriction enzyme sites (Bio::Restriction) > > use Bio::Restriction::EnzymeCollection; > my $all_collection = Bio::Restriction::EnzymeCollection; > > This is the form of the statement that seems to work: > my $all_collection = Bio::Restriction::EnzymeCollection->new(); Thanks, fixed. From bosborne11 at verizon.net Tue May 25 09:04:01 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 25 May 2010 09:04:01 -0400 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: Dave, I looked at the scripts, and like you I concluded they didn't use that local Bio/ directory. Then I ran then with and without that Bio/ directory, same results. So I removed that local Bio/ directory. Rob, does some additional action need to be taken by Chris, or some other Bioperl maintainer, at CPAN/PAUSE? Brian O. On May 24, 2010, at 6:03 PM, Dave Messina wrote: > From: Graham Barr via RT >> IMO it is confusing to include 2 different copies of the same module. > > I agree. > > It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). > > In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). > > I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. > > So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Tue May 25 09:54:17 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Tue, 25 May 2010 09:54:17 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: <4046E576-2109-45BB-969C-F0B6F5749957@sbc.su.se> Message-ID: The tutorial, I discovered, has an error. a very bad experience for a trusting newby. whereas the tutorial has these bold examples in the first box under Identifying restriction enzyme sites (Bio::Restriction) use Bio::Restriction::EnzymeCollection; my $all_collection = Bio::Restriction::EnzymeCollection; This is the form of the statement that seems to work: my $all_collection = Bio::Restriction::EnzymeCollection->new(); All the other stuff necessary for my purpose of getting fragment lengths is there and seems to work if the $enzyme database has the enzyme under the name you enter. Updating the database with the file from NEB seems to be up to the user or his sysadmin. On 5/24/10 11:55 AM, "Dave Messina" wrote: Hi Nick, Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that module, see also: http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > How hard would it be to keep things backward compatible. > Have I missed something here? I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones are intended to be at least partially backwards compatible. Dave From cjfields at illinois.edu Tue May 25 10:30:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 09:30:09 -0500 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> I have added a 'no_index' to that specific directory in Build.PL, suppose we can change that back if there is no purpose to it (though it might come in handy with spots we don't need to be indexed). chris On May 25, 2010, at 8:04 AM, Brian Osborne wrote: > Dave, > > I looked at the scripts, and like you I concluded they didn't use that local Bio/ directory. Then I ran then with and without that Bio/ directory, same results. So I removed that local Bio/ directory. > > Rob, does some additional action need to be taken by Chris, or some other Bioperl maintainer, at CPAN/PAUSE? > > Brian O. > > On May 24, 2010, at 6:03 PM, Dave Messina wrote: > >> From: Graham Barr via RT >>> IMO it is confusing to include 2 different copies of the same module. >> >> I agree. >> >> It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). >> >> In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). >> >> I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. >> >> So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Tue May 25 10:51:02 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Tue, 25 May 2010 10:51:02 -0400 Subject: [Bioperl-l] New Restriction Analysis Message-ID: I have tried both these methods for getting new enzyme info into the system: use Bio::Restriction::IO; my $re_io = Bio::Restriction::IO->new(-file => $file, -format=>'withrefm'); my $rebase_collection = $re_io->read; A REBASE file in the correct format can be found at ftp://ftp.neb.com/pub/rebase - it will have a name like "withrefm.308". If need be you can also create new enzymes, like this: my $re = new Bio::Restriction::Enzyme(-enzyme => 'BioRI', -seq => 'GG^AATTCC'); But the BioPerl sends an error without informing me which of my statements caused it: Using first the withreftm.005 file from rebase and then these statements (not both at the same time): my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'SgrDI', -seq => 'CG^TCGACG'); Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.8/Bio/Restriction/Analysis.pm line 529. This works: my $pattern = $enzyme->site; print "pattern = $pattern\n"; which would lead me to believe there is nothing wrong with my enzyme. Could there be a problem if there were no cuts? That must be it, because putting info for EcoRI in instead of SgrDI, the program works: [Not the whole program, but only the bioPerl stuff. my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'EcoRI', -seq => 'G^AATTC'); use Bio::Restriction::Analysis; my $pattern = $enzyme->site; print "pattern = $pattern\n"; my $db = Bio::DB::Fasta->new("/uoldhome/estaffa/westmoreland/$filename", -makeid => \&make_my_id); my $obj = $db->get_Seq_by_id("$sequenceID"); #Sequence Object my $analysis = Bio::Restriction::Analysis->new(-seq => $obj); my @strings = $analysis->fragments($enzyme); What to do? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Enterprise-Wide Information Technology Support Contract National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From maj at fortinbras.us Tue May 25 12:20:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:20:41 -0400 Subject: [Bioperl-l] How to find secondary structures In-Reply-To: <766825.32163.qm@web37308.mail.mud.yahoo.com> References: <766825.32163.qm@web37308.mail.mud.yahoo.com> Message-ID: <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> Sounds like a job for infernal and it's Bioperl wrapper (in Bio::Tools::Run); right Chris? MAJ ----- Original Message ----- From: "Dubi Eldor" To: Sent: Tuesday, May 25, 2010 8:30 AM Subject: [Bioperl-l] How to find secondary structures > Hi, > > I am a new user of BioPerl. > I would like to find secondary sturctures in sequences of ~10K nt long. > Are there any functions that can help me? > > Thanks, > Dubi > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue May 25 12:19:42 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:19:42 -0400 Subject: [Bioperl-l] New Restriction Analysis In-Reply-To: References: Message-ID: Hi Nick, You're right, as far as I can tell; the offending line is @cut_positions=@{$self->{'_cut_positions'}->{$enz}}; so $self->{_cut_positions}->{$enz} must be null. I would say this is a bug; if you can put what you've reported below in a bug report at http://bugzilla.bioperl.org, that would be great. A workaround would be to check whether you have cuts first before calling the method; but that may be impossible, in which case a truly awful kludge would be to append a recognized site at the end of your sequences. Just till we can get to the fix. cheers Mark ----- Original Message ----- From: "Staffa, Nick (NIH/NIEHS) [C]" To: "Bioperl-l" Sent: Tuesday, May 25, 2010 10:51 AM Subject: [Bioperl-l] New Restriction Analysis >I have tried both these methods for getting new enzyme info into the system: > > use Bio::Restriction::IO; > my $re_io = Bio::Restriction::IO->new(-file => $file, > -format=>'withrefm'); > my $rebase_collection = $re_io->read; > A REBASE file in the correct format can be found at > ftp://ftp.neb.com/pub/rebase - it will have a name like "withrefm.308". If > need be you can also create new enzymes, like this: > my $re = new Bio::Restriction::Enzyme(-enzyme => 'BioRI', > -seq => 'GG^AATTCC'); > But the BioPerl sends an error without informing me which of my statements > caused it: > > Using first the withreftm.005 file from rebase and then these statements (not > both at the same time): > my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'SgrDI', > -seq => 'CG^TCGACG'); > > > Can't use an undefined value as an ARRAY reference at > /usr/lib/perl5/site_perl/5.8.8/Bio/Restriction/Analysis.pm line 529. > > This works: > my $pattern = $enzyme->site; > print "pattern = $pattern\n"; > which would lead me to believe there is nothing wrong with my enzyme. > Could there be a problem if there were no cuts? > That must be it, because putting info for EcoRI in instead of SgrDI, the > program works: > > [Not the whole program, but only the bioPerl stuff. > my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'EcoRI', > -seq => 'G^AATTC'); > use Bio::Restriction::Analysis; > my $pattern = $enzyme->site; > print "pattern = $pattern\n"; > my $db = Bio::DB::Fasta->new("/uoldhome/estaffa/westmoreland/$filename", > -makeid => \&make_my_id); > my $obj = $db->get_Seq_by_id("$sequenceID"); #Sequence Object > my $analysis = Bio::Restriction::Analysis->new(-seq => $obj); > my @strings = $analysis->fragments($enzyme); > > What to do? > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Enterprise-Wide Information Technology Support Contract > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue May 25 12:38:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 11:38:12 -0500 Subject: [Bioperl-l] How to find secondary structures In-Reply-To: <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> References: <766825.32163.qm@web37308.mail.mud.yahoo.com> <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> Message-ID: <2B6207D9-7221-4949-A7EE-EE6ED54EFF7B@illinois.edu> Yes, that would look for Rfam-based conserved structures. Should work for the latest infernal release, but let me know if you run into problems. Should also look at ERPIN and RNAMotif (both have similar BioPerl wrappers). chris On May 25, 2010, at 11:20 AM, Mark A. Jensen wrote: > Sounds like a job for infernal and it's Bioperl wrapper (in Bio::Tools::Run); right Chris? > MAJ > ----- Original Message ----- From: "Dubi Eldor" > To: > Sent: Tuesday, May 25, 2010 8:30 AM > Subject: [Bioperl-l] How to find secondary structures > > >> Hi, >> >> I am a new user of BioPerl. >> I would like to find secondary sturctures in sequences of ~10K nt long. >> Are there any functions that can help me? >> >> Thanks, >> Dubi >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue May 25 12:43:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:43:41 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <8EE661A4491C4A0FAD9875CF790F8164@NewLife> Thanks for the headsup on that-- we can fix. The refm file should be downloaded relatively transparently by the class directly... MAJ ----- Original Message ----- From: "Staffa, Nick (NIH/NIEHS) [C]" To: "Dave Messina" ; "Chris Fields" ; "Mark A. Jensen" Cc: "Bioperl-l" Sent: Tuesday, May 25, 2010 9:54 AM Subject: Re: [Bioperl-l] Restriction Enzymes > The tutorial, I discovered, has an error. > a very bad experience for a trusting newby. > whereas the tutorial has these bold examples in the first box under > Identifying restriction enzyme sites (Bio::Restriction) > > use Bio::Restriction::EnzymeCollection; > my $all_collection = Bio::Restriction::EnzymeCollection; > > This is the form of the statement that seems to work: > my $all_collection = Bio::Restriction::EnzymeCollection->new(); > > All the other stuff necessary for my purpose of getting fragment lengths is > there and seems to work > if the $enzyme database has the enzyme under the name you enter. > Updating the database with the file from NEB seems to be up to the user or his > sysadmin. > > > On 5/24/10 11:55 AM, "Dave Messina" wrote: > > Hi Nick, > > Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that > module, see also: > > http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > > >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme > was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones > are intended to be at least partially backwards compatible. > > > Dave > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue May 25 13:14:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 13:14:24 -0400 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: <409221E1D1E947108DEDBB5F34E1EBB7@NewLife> Don't think you want 'no strict'; the error's saying something about syntax to you. In the snippet, I see a missing opening single quote for output_file.bam. The asterisk means "expect an array ref", so that's ok. ----- Original Message ----- From: "Ben Bimber" To: "bioperl-l" Sent: Friday, May 21, 2010 9:58 AM Subject: [Bioperl-l] CommandExts and arrays >I am getting an error when trying to pass an array as a param with > command exts. I hope there is something obvious i'm missing, but I > cant seem to figure this out. > > I am trying to run the merge two BAM files using > Bio::Tools::Run::Samtools using something like this: > > my $new_bam = Bio::Tools::Run::Samtools->new( > -command => 'merge', > -program_dir => '/usr/bin/samtools/', > )->run( > -obm => output_file.bam', > -ibm => ['file1.bam', 'file2.bam'], > ); > > When i use an array for the -ibm param, I get an error saying 'cannot > use string 'file1' as an arrayref while strict refs in place'. The > error comes from this code in CommandExts.pm, around line 989. adding > 'no strict' right before the final line stops the error: > > # expand arrayrefs > my $l = $#files; > for (0..$l) { > if (ref($files[$_]) eq 'ARRAY') { > splice(@files, $_, 1, @{$files[$_]}); > #error thrown from this line > splice(@switches, $_, 1, ($switches[$_]) x @{$files[$_]}); > } > > > Thanks for the help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From thomas.sharpton at gmail.com Tue May 25 14:33:06 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 25 May 2010 11:33:06 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274767107.2271.11.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> Message-ID: <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> Hi Kai, I've just pushed the code to github, which you can find here: http://github.com/bioperl/bioperl-hmmer3 Please use this updated code before making any significant changes - I think I may have already fixed the bug you brought up earlier (but maybe not?). Do let me know if you have any problems getting ahold of this data or if you find any bugs in the code I'd deposited. Still getting my head wrapped around github. > No worries, that's fine. I've got a checkout of the standalone > repository that I can play with now. Is there any particular reason > you > decided to create a new parser instead of integrating the code into > the > existing hmmer.pm module? I haven't looked at how the hmmer2 hmmsearch > output looks compared to the hmmer3 version and if there's any > conflicts. Trying to integrate hmmer3 into the old hmmer searchIO module was the original idea. But after talking to some of the BioPerl gurus and considering the inherent differences between hmmer3 and hmmer2 (at least during beta, though there are still some major output report differences in the live release), we decided as separate module would be ideal. I don't want to speak out of turn, but it sounds like this might be one of the ways that the bioperl project is expanded in the future without overbloating bioperl-live. In theory, we can extend Bio::Run into this module as well in the future, such that bioperl- hmmer3 has a SearchIO path in addition to a Run path. I don't know what the more experienced developers currently think about this idea. This is an obvious statement, but I feel it's important to be clear on these matters - you should feel free to make any and all contributions to the development of this module as you see fit. BioPerl has been wonderful to me and I started this module to give a little back, but this remains community generated software. FYI - I have a fix that I'm working on to handle the secondary structure track in the alignment report, so if you're particularly interested in that data, give me a bit and I'll have it up and running. All the best, Tom From David.Messina at sbc.su.se Tue May 25 14:52:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 20:52:29 +0200 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> References: <4BFAAF45.4090400@cornell.edu> <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> Message-ID: <704A3AD7-BF8E-4C52-A3C5-D402B59BFD66@sbc.su.se> On May 25, 2010, at 4:30 PM, Chris Fields wrote: > I have added a 'no_index' to that specific directory in Build.PL, suppose we can change that back if there is no purpose to it (though it might come in handy with spots we don't need to be indexed). Good idea ? it's bound to come up at some point. On May 25, 2010, at 3:04 PM, Brian Osborne wrote: > So I removed that local Bio/ directory. Great, thanks Brian! Dave From hlapp at gmx.net Tue May 25 17:10:42 2010 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 May 2010 15:10:42 -0600 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> Message-ID: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> I'm a little concerned that this discussion is disconnected from the list and so misses a lot of possible input. Are we moving our development discussion to IRC or github commit comments? Regarding $feature->seq(), the API documentation expressly states that the return type is Bio::PrimarySeqI, as it does for $feature- >entire_seq(). The original rationale for that was to avoid circular references. Bio::SeqI objects contain references to attached features, which in turn contain a reference to the seq object they are attached to. A Bio::SeqI object holds the basic sequence properties (everything except annotation and feature objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a reference to, not the containing Bio::SeqI object. It's possible that S::U::weaken() can solve the circular reference problem, but this fact should be tested. I.e., attach a feature with a SeqI-reference to a SeqI, dispose the SeqI, and then test that the feature has lost the reference to the SeqI too. This still leaves the issue though that then you have a SeqFeatureI object with a dangling reference to a sequence object. If you have those SeqFeatureI objects stored in a feature store, this may wreak havoc. I'd like to see convincing arguments that it doesn't. Bottom line - just forking on git and committing a change isn't a substitute for bringing up an issue and possible solutions on the list, and the vetting of pull requests can fall upon only one or two core developers. Two eyeballs often spot a lot less than a hundred. -hilmar On May 25, 2010, at 2:02 PM, GitHub wrote: > Ah, but my link's old, forget it. This one is better: http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html > > From: cjfields > View this commit online: http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From kai.blin at biotech.uni-tuebingen.de Tue May 25 17:50:29 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 23:50:29 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> Message-ID: <1274824229.2271.60.camel@gonzo.home.kblin.org> On Tue, 2010-05-25 at 11:33 -0700, Thomas Sharpton wrote: Hi Thomas, > http://github.com/bioperl/bioperl-hmmer3 > > Please use this updated code before making any significant changes - I > think I may have already fixed the bug you brought up earlier (but > maybe not?). Do let me know if you have any problems getting ahold of > this data or if you find any bugs in the code I'd deposited. Still > getting my head wrapped around github. I've seen the repo, and forked from it already to push my changes. Some of the folks from IRC gave me write access and Chris Fields actually pushed my changes. Most notable about the changes is probably a bit hidden by the noise, but I've changed the Hit->raw_score to contain the overall score, not the "best domain" score. > Trying to integrate hmmer3 into the old hmmer searchIO module was the > original idea. But after talking to some of the BioPerl gurus and > considering the inherent differences between hmmer3 and hmmer2 (at > least during beta, though there are still some major output report > differences in the live release), we decided as separate module would > be ideal. Some of the folks on IRC suggested that we might want to integrate the hmmer.pm parser as well, modularizing this a bit and loading the correct parser depending on the requested format. > This is an obvious statement, but I feel it's important to be clear on > these matters - you should feel free to make any and all contributions > to the development of this module as you see fit. BioPerl has been > wonderful to me and I started this module to give a little back, but > this remains community generated software. I'm planning on adding even more tests, but the basic features for hmmscan parsing seem to be there. I'm currently running an extensive test run on real genome data, hopefully I can see the results of that in a couple of days. Cheers, and thanks for the help, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Tue May 25 17:55:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 16:55:53 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: I agree, but we spotted this from IRC, then added the comments on that merge. Dave also spotted my original code comments (which appeared in the fork queue, and which echo the very same concerns you have) after the commit as well, and managed to revert it. So, with forked where it appears further discussion is warranted (like this), we should bring it to the main list (and IRC, if anyone happens to be there) for discussion. Sounds good to me. For those on list, here are Adam's and my comments on this (linked here: http://github.com/adsj/bioperl-live/commit/24ec961b217084e248f4fdbd174aadace1a27ac4#comments): adsj: "Hi Chris, thanks for the comment. The reason is this: I have a class, MyApp::Seq, which ISA Bio::Seq::RichSeq and adds some extra methods I use in the application. When I call ->seq() on a feature from one of my MyApp::Seq objects, I want to get a MyApp::Seq object back (because of the extra methods). Am I making sense? I have been running with this patch since at least 1.5.2, so it has been a while since I digged into it. Maybe there is a cleaner solution. I am not sure what your comment about changing the API means - I think it is quite reasonable/natural that MyApp::Seq->get_Features"->seq" returns MyApp::Seq objects?" My response: "Calling seq() on a feature should return a truncation of whatever your Bio::SeqFeatureI does (it normally calls trunc(start, end) on it's attached sequence). For Bio::Seq it's normally returning a simple Bio::PrimarySeq, not a Bio::Seq, b/c that is what is attached to the Feature. This is why we don't need GC. There are no circular refs: Bio::Seq has-a PrimarySeq and has-a Features (via FeatureHolderI), each Feature has the same PrimarySeq as the parent Bio::Seq. It's hard to know if there is a workaround w/o knowing what you are asking for (e.g. what MyApp::Seq does), but you can certainly override the default methods to DTRT for your specific case. For instance, redefine add_SeqFeature() for your class to attach self as you have above for Bio::Seq. In this case, we should patch SeqFeature::Generic to use weaken() as you show above just in case this is needed by others, but maybe in the context of (pseudocode) 'weaken if $seq to be attached is-a Bio::SeqI', and not hammered down to check the very specific 'Bio::PrimarySeq'. Anyway, this is what I mean by changing the default API, which is what the above Bio::Seq change does. This would change the context of what is currently being returned (self, instead of a simpler contained Bio::PrimarySeqI). Also, anything gained by abstracting the raw seq handling of Feature data by linking to PrimarySeq is lost when you link to the parent, thus always requiring GC and weaken() (which is notoriously flaky dep. on context)." chris On May 25, 2010, at 4:10 PM, Hilmar Lapp wrote: > I'm a little concerned that this discussion is disconnected from the list and so misses a lot of possible input. Are we moving our development discussion to IRC or github commit comments? > > Regarding $feature->seq(), the API documentation expressly states that the return type is Bio::PrimarySeqI, as it does for $feature->entire_seq(). > > The original rationale for that was to avoid circular references. Bio::SeqI objects contain references to attached features, which in turn contain a reference to the seq object they are attached to. A Bio::SeqI object holds the basic sequence properties (everything except annotation and feature objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a reference to, not the containing Bio::SeqI object. > > It's possible that S::U::weaken() can solve the circular reference problem, but this fact should be tested. I.e., attach a feature with a SeqI-reference to a SeqI, dispose the SeqI, and then test that the feature has lost the reference to the SeqI too. > > This still leaves the issue though that then you have a SeqFeatureI object with a dangling reference to a sequence object. If you have those SeqFeatureI objects stored in a feature store, this may wreak havoc. I'd like to see convincing arguments that it doesn't. > > Bottom line - just forking on git and committing a change isn't a substitute for bringing up an issue and possible solutions on the list, and the vetting of pull requests can fall upon only one or two core developers. Two eyeballs often spot a lot less than a hundred. > > -hilmar > > On May 25, 2010, at 2:02 PM, GitHub wrote: > >> Ah, but my link's old, forget it. This one is better: http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html >> >> From: cjfields >> View this commit online: http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From thomas.sharpton at gmail.com Tue May 25 18:29:38 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 25 May 2010 15:29:38 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274824229.2271.60.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: Thanks for the contributions, Kai. > I've seen the repo, and forked from it already to push my changes. > Some > of the folks from IRC gave me write access and Chris Fields actually > pushed my changes. Just saw this. Thanks for doing that, Chris. > Most notable about the changes is probably a bit hidden by the noise, > but I've changed the Hit->raw_score to contain the overall score, not > the "best domain" score. So this brings up an interesting point. At some point, we'll have to build out a few additional SearchIO methods to incorporate some of the additional information encoded in the HMMER v3 reports. Sean talks a bit in the user manual about the importance of looking at both the full sequence and the best domain (see page 18 in the manual linked to on this page http://hmmer.janelia.org/#documentation). For example, he mentions that one should consider the e-value of both the full sequence and best domain to ascertain if the query is homologous to a profile being considered via hmmsearch. He also mentions that looking at the full sequence report values without consideration of the best domain report values can be misleading. I'm not saying that your approach regarding Hit->raw_score is wrong - proper interpretation of the results is up to the end user and there are benefits to looking at the full sequence (again, communicated on page 18) - but we might consider how to best encode the SearchIO methods to mitigate end user confusion and mistakes. >> Trying to integrate hmmer3 into the old hmmer searchIO module was the >> original idea. But after talking to some of the BioPerl gurus and >> considering the inherent differences between hmmer3 and hmmer2 (at >> least during beta, though there are still some major output report >> differences in the live release), we decided as separate module would >> be ideal. > > Some of the folks on IRC suggested that we might want to integrate the > hmmer.pm parser as well, modularizing this a bit and loading the > correct > parser depending on the requested format. This might make sense, given that HMMER v3 is now live and seems to be adopted by researchers at an increasing rate. Since I used hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult to do, either. I think a thorough conversation on this point is warranted as others I've talked to have preferred the modules to be separate. I'd be interested to hear what other have to say on this point. >> This is an obvious statement, but I feel it's important to be clear >> on >> these matters - you should feel free to make any and all >> contributions >> to the development of this module as you see fit. BioPerl has been >> wonderful to me and I started this module to give a little back, but >> this remains community generated software. > > I'm planning on adding even more tests, but the basic features for > hmmscan parsing seem to be there. I'm currently running an extensive > test run on real genome data, hopefully I can see the results of > that in > a couple of days. Awesome! > Cheers, and thanks for the help, Likewise. T From kannabiran.nandakumar at gmail.com Tue May 25 18:30:18 2010 From: kannabiran.nandakumar at gmail.com (Kanna) Date: Tue, 25 May 2010 15:30:18 -0700 (PDT) Subject: [Bioperl-l] new to this group Message-ID: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Hi guys, I am new to this group. I work in bioinformatics and would like to contribute to the BioPerl project. I am interested in the OBO file parsing module to start with. I visited the project priority list and the page seems to have been modified around 6 months ago. If it is already completed could anyone suggest modules I can contribute to? Thanks, Kanna From David.Messina at sbc.su.se Tue May 25 18:41:27 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 00:41:27 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: On May 25, 2010, at 11:55 PM, Chris Fields wrote: > Sounds good to me. Me too, and just to clarify for everyone following along, I erroneously committed the code in question to bioperl-live master (head), reverted that commit, and moved it to a branch (http://github.com/bioperl/bioperl-live/commits/topic/adsj-seqobj-return). Dave From maj at fortinbras.us Tue May 25 21:37:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 21:37:38 -0400 Subject: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: <525D25AC2CDF42E99C1F4072B02D0C1B@NewLife> I +1 Hilmar, but note that already git is doing what it is designed to do: devolve development. My $0.02 is: that is how BioPerl will keep from becoming a dinosaur. I believe that we as a community, judging from the track of the last year or so, are committed to this evolution by devolution, and the move to git is part of that overall plan. The increase in IRC chatter, led by deafferet and rbuels, prefigured this and it was generally considered a Good Thing. So, I would propose that people (devs and users) make their views known (on list and elsewhere) about how best to communicate and have dev-oriented conversations: it may be that a listserv alone is not nimble enough. MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "BioPerl List" Sent: Tuesday, May 25, 2010 5:10 PM Subject: Re: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) > I'm a little concerned that this discussion is disconnected from the list and > so misses a lot of possible input. Are we moving our development discussion > to IRC or github commit comments? > > Regarding $feature->seq(), the API documentation expressly states that the > return type is Bio::PrimarySeqI, as it does for $feature- > >entire_seq(). > > The original rationale for that was to avoid circular references. Bio::SeqI > objects contain references to attached features, which in turn contain a > reference to the seq object they are attached to. A Bio::SeqI object holds > the basic sequence properties (everything except annotation and feature > objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a > reference to, not the containing Bio::SeqI object. > > It's possible that S::U::weaken() can solve the circular reference problem, > but this fact should be tested. I.e., attach a feature with a SeqI-reference > to a SeqI, dispose the SeqI, and then test that the feature has lost the > reference to the SeqI too. > > This still leaves the issue though that then you have a SeqFeatureI object > with a dangling reference to a sequence object. If you have those SeqFeatureI > objects stored in a feature store, this may wreak havoc. I'd like to see > convincing arguments that it doesn't. > > Bottom line - just forking on git and committing a change isn't a substitute > for bringing up an issue and possible solutions on the list, and the vetting > of pull requests can fall upon only one or two core developers. Two eyeballs > often spot a lot less than a hundred. > > -hilmar > > On May 25, 2010, at 2:02 PM, GitHub wrote: > >> Ah, but my link's old, forget it. This one is better: >> http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html >> >> From: cjfields >> View this commit online: >> http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From asjo at koldfront.dk Wed May 26 01:41:52 2010 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 26 May 2010 07:41:52 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: <87zkznb4nz.fsf@topper.koldfront.dk> On Tue, 25 May 2010 15:10:42 -0600, Hilmar wrote: > Bottom line - just forking on git and committing a change isn't a > substitute for bringing up an issue and possible solutions on the > list, and the vetting of pull requests can fall upon only one or two > core developers. Two eyeballs often spot a lot less than a hundred. Just to clarify: I specifically _didn't_ make a Pull request yet. I simply created the fork store the patch in a visible way - my intention was then to clean the patch up and make it ready for comments/discussion (I just haven't had time to do so yet). I am new to github, but as I understood the interface there, anyone is free (encouraged?) to "fork" their own clone to work in, as a kind of "public" personal workspace, and when you feel that your clone is ready to be merged, then - only then - you do a "Pull request". If that isn't the way github is supposed to be used, or that isn't the way BioPerl wants to use it, let me know and I'll adjust. I appreciate the comments so far, and will get back to this as soon as I can. Thanks, Adam -- "Sunday morning when the rain begins to fall Adam Sj?gren I believe I have seen the end of it all" asjo at koldfront.dk From David.Messina at sbc.su.se Wed May 26 05:24:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 11:24:11 +0200 Subject: [Bioperl-l] Bio::Species irritated with "unclassified sequences" In-Reply-To: <4BF59B2F.9000300@bms.com> References: <4BF59B2F.9000300@bms.com> Message-ID: <50665C57-007D-49CC-86A7-4595D176EA73@sbc.su.se> Hi Charles, Thanks for your report. I believe your interpretation of Bio::Species::classification is correct. It looks like this is going to require a little more investigation. Could you please submit this as a bug report along with a little test case? http://www.bioperl.org/wiki/Bugs Dave On May 20, 2010, at 22:27, Charles Tilford wrote: > Bio::Species::classification() is irritated with me when I provide it with a @class_array that is composed of one node, particularly: > > $obj->classification("unclassified sequences") > > AFAICT this is a valid, single node taxa "tree": > > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=12908 > > Subroutine classification is expecting at least two class members, the problem with the above call crops up as: > > Use of uninitialized value $vals[1] in quotemeta at /stf/biocgi/tilfordc/patch_lib/Bio/Species.pm line 179 > ( $Id: Species.pm 16700 2010-01-15 19:50:11Z dave_messina $) > > > ... and the relevant code is: > > sub classification { > my ($self, @vals) = @_; > > if (@vals) { > if (ref($vals[0]) eq 'ARRAY') { > @vals = @{$vals[0]}; > } > > # make sure the lineage contains us as first or second element > # (lineage may have subspecies, species, genus ...) > my $name = $self->node_name; > my ($genus, $species) = (quotemeta($vals[1]), quotemeta($vals[0])); > > > That is, it's expecting at least (species, genus) in the array. Am I misusing classification(), or Bio::Species in general? I know it's named "Species", but I've been using it as a generic tree object for arbitrary taxonomy nodes, not just species and subspecies. This block a little lower down: > > unless ($self->rank) { > # and that we are rank species > $self->rank('species'); > } > > > ... implies that the module can be used for taxa ranks other than species. However, doing so would not prevent the module being aggravated over a null $vals[1]. > > The use case here is building Bio::Seq::RichSeq objects pulled from a (very large) sequence database, and then dumped / displayed with SeqIO. Most are well behaved, but there's a non-trivial number of 'artificial' constructs that don't root to an organism. > > -CAT > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed May 26 07:53:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 06:53:50 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <87zkznb4nz.fsf@topper.koldfront.dk> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> Message-ID: On May 26, 2010, at 12:41 AM, Adam Sj?gren wrote: > On Tue, 25 May 2010 15:10:42 -0600, Hilmar wrote: > >> Bottom line - just forking on git and committing a change isn't a >> substitute for bringing up an issue and possible solutions on the >> list, and the vetting of pull requests can fall upon only one or two >> core developers. Two eyeballs often spot a lot less than a hundred. > > Just to clarify: I specifically _didn't_ make a Pull request yet. > > I simply created the fork store the patch in a visible way - my > intention was then to clean the patch up and make it ready for > comments/discussion (I just haven't had time to do so yet). > > I am new to github, but as I understood the interface there, anyone is > free (encouraged?) to "fork" their own clone to work in, as a kind of > "public" personal workspace, and when you feel that your clone is ready > to be merged, then - only then - you do a "Pull request". That's odd; I recall receiving a pull request from your fork at some point, but maybe I simply looked into the fork queue instead (which I thought was derived from pull requests, but maybe not). > If that isn't the way github is supposed to be used, or that isn't the > way BioPerl wants to use it, let me know and I'll adjust. > > I appreciate the comments so far, and will get back to this as soon as I > can. > > > Thanks, > > Adam No problem Adam, we're going through the learning curve on this end as well re: this specific github feature. I think how you are going about this is fine, we'll need to come up with some documentation as to how our collabs pull in forked code. chrus From hlapp at drycafe.net Wed May 26 09:27:55 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 26 May 2010 07:27:55 -0600 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <87zkznb4nz.fsf@topper.koldfront.dk> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> Message-ID: <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> On May 25, 2010, at 11:41 PM, Adam Sj?gren wrote: > as I understood the interface there, anyone is free (encouraged?) to > "fork" their own clone to work in, as a kind of "public" personal > workspace, and when you feel that your clone is ready to be merged, > then - only then - you do a "Pull request". That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) And yes, encouraged to fork indeed. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Wed May 26 10:03:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 16:03:14 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> Message-ID: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> On May 26, 2010, at 15:27, Hilmar Lapp wrote: > That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) That would be me. :) His commits were sitting in the fork queue, which I mistakenly understood to mean a pull request had been made. Turns out that's not the case (See http://github.com/blog/270-the-fork-queue). Dave From David.Messina at sbc.su.se Wed May 26 10:52:05 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 16:52:05 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: > So this brings up an interesting point. At some point, we'll have to build out a few additional SearchIO methods to incorporate some of the additional information encoded in the HMMER v3 reports. Would the new methods need to be added to SearchIO if they're specific to H3? (as opposed to just being in the H3 sub-class) > Sean talks a bit in the user manual about the importance of looking at both the full sequence and the best domain (see page 18 in the manual linked to on this page http://hmmer.janelia.org/#documentation). For example, he mentions that one should consider the e-value of both the full sequence and best domain to ascertain if the query is homologous to a profile being considered via hmmsearch. > > He also mentions that looking at the full sequence report values without consideration of the best domain report values can be misleading. I'm not saying that your approach regarding Hit->raw_score is wrong - proper interpretation of the results is up to the end user and there are benefits to looking at the full sequence (again, communicated on page 18) - but we might consider how to best encode the SearchIO methods to mitigate end user confusion and mistakes. I think this is a great idea. Of course it's always best for end-users to RTFM and understand the tools they're using, but it's clearly beneficial to make it easier to do the right thing. Having not considered it too much, I'm not sure how to accomplish this without breaking the SearchIO idiom. But presumably a way could be found. >> Some of the folks on IRC suggested that we might want to integrate the >> hmmer.pm parser as well, modularizing this a bit and loading the correct >> parser depending on the requested format. > This might make sense, given that HMMER v3 is now live and seems to be adopted by researchers at an increasing rate. Since I used hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult to do, either. I think a thorough conversation on this point is warranted as others I've talked to have preferred the modules to be separate. > > I'd be interested to hear what other have to say on this point. I did not follow the IRC discussion, so I confess I'm not totally clear on what "integrate the hmmer.pm parser" means. I'm taking it to mean combining the code that parses HMMER2 with the code that parses HMMER3. But then "modularizing this a bit and loading the correct parser depending on the requested format" seems to contradict that assumption. Perhaps you (or someone) could clarify a bit what the HMMER2 - HMMER3 integration would look like (and the goal of doing so) ? Dave From thomas.sharpton at gmail.com Wed May 26 11:25:24 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 26 May 2010 08:25:24 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: Thanks for the feedback, Dave. >> So this brings up an interesting point. At some point, we'll have >> to build out a few additional SearchIO methods to incorporate some >> of the additional information encoded in the HMMER v3 reports. > > Would the new methods need to be added to SearchIO if they're > specific to H3? (as opposed to just being in the H3 sub-class) Sorry for being unclear - the methods in question would be, at least in my mind, specific to the H3 sub-class. > >> Sean talks a bit in the user manual about the importance of looking >> at both the full sequence and the best domain (see page 18 in the >> manual linked to on this page http://hmmer.janelia.org/#documentation) >> . For example, he mentions that one should consider the e-value of >> both the full sequence and best domain to ascertain if the query is >> homologous to a profile being considered via hmmsearch. >> >> He also mentions that looking at the full sequence report values >> without consideration of the best domain report values can be >> misleading. I'm not saying that your approach regarding Hit- >> >raw_score is wrong - proper interpretation of the results is up to >> the end user and there are benefits to looking at the full sequence >> (again, communicated on page 18) - but we might consider how to >> best encode the SearchIO methods to mitigate end user confusion and >> mistakes. > > I think this is a great idea. > > Of course it's always best for end-users to RTFM and understand the > tools they're using, but it's clearly beneficial to make it easier > to do the right thing. > > Having not considered it too much, I'm not sure how to accomplish > this without breaking the SearchIO idiom. But presumably a way could > be found. > I'll see if I can't hit the drawing board and come up with a naming scheme for additional H3 methods that retrieve some of the extra data encoded in the new reports. It *probably* makes most sense, at least from the standpoint of the user's perspective, to adopt the full- length report values as the standard hit->significance and hit- >raw_score while having something like hit->best_significance and hit- >best_score as H3 methods that return the best-domain report values. Again, this could use some thought/discussion. > >>> Some of the folks on IRC suggested that we might want to integrate >>> the >>> hmmer.pm parser as well, modularizing this a bit and loading the >>> correct >>> parser depending on the requested format. > >> This might make sense, given that HMMER v3 is now live and seems to >> be adopted by researchers at an increasing rate. Since I used >> hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult >> to do, either. I think a thorough conversation on this point is >> warranted as others I've talked to have preferred the modules to be >> separate. >> >> I'd be interested to hear what other have to say on this point. > > I did not follow the IRC discussion, so I confess I'm not totally > clear on what "integrate the hmmer.pm parser" means. I'm taking it > to mean combining the code that parses HMMER2 with the code that > parses HMMER3.= > But then "modularizing this a bit and loading the correct parser > depending on the requested format" seems to contradict that > assumption. > > Perhaps you (or someone) could clarify a bit what the HMMER2 - > HMMER3 integration would look like (and the goal of doing so) ? > I was not a part of that conversation either and I'm also operating under a similar assumption about what "integrating the hmmer.pm parser" means. I too am confused about the statement regarding modularization; I assume Kai meant that next_result would leverage the HMMER version number (which it already grabs) to guide the appropriate parsing of the datafile. Not thinking about this too carefully, it might be a simple as: next_result{ version = get_hmmer_version if version == 2 parse V2 report file if version == 3 parse V3 report file } to make the code a bit more manageable, the various version parsers could be appropriated to independent subroutines. Kai, is this along the lines of what you were thinking? If this is correct (that is, merging the H2 and H3 parsers into a single hmmer.pm module), I see one primary benefit - the end user need not specify which HMMER module they want to implement, just use Bio::SearchIO::hmmer - and one secondary benefit - there's enough similarity between H2 and H3 reports that some from the H2 parser redundantly appears in the H3 parser. There are certainly other benefits that I'm overlooking. The only real downside I see at the moment is that the hmmer.pm parser becomes a bit more complicated and bloated. But I suspect this can be remedied with careful partitioning of the code into appropriate subroutines and thorough documentation. I am a bit concerned about how the aforementioned H3 specific methods are incorporated, but that should be manageable. I wonder if anyone involved in the IRC discussion cares to weigh in? Regardless, I'd advocate getting the H3 version fully flushed out to deal with the issues brought up in the first half of this message prior to an attempt to merge the two modules, as the merging process may be affected by the structure of the H3 parser. Best, Tom From cjfields at illinois.edu Wed May 26 12:13:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 11:13:59 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: On May 26, 2010, at 9:03 AM, Dave Messina wrote: > > On May 26, 2010, at 15:27, Hilmar Lapp wrote: > >> That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) > > > That would be me. :) > > His commits were sitting in the fork queue, which I mistakenly understood to mean a pull request had been made. Turns out that's not the case (See http://github.com/blog/270-the-fork-queue). > > > Dave We can clarify that in the docs on the bioperl site, maybe in a github-specific section. chris From cjfields at illinois.edu Wed May 26 12:17:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 11:17:50 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: <3826604E-CD90-42A5-A0B2-004D9922B6AA@illinois.edu> On May 26, 2010, at 10:25 AM, Thomas Sharpton wrote: >> ... >> I did not follow the IRC discussion, so I confess I'm not totally clear on what "integrate the hmmer.pm parser" means. I'm taking it to mean combining the code that parses HMMER2 with the code that parses HMMER3.= > >> But then "modularizing this a bit and loading the correct parser depending on the requested format" seems to contradict that assumption. >> >> Perhaps you (or someone) could clarify a bit what the HMMER2 - HMMER3 integration would look like (and the goal of doing so) ? >> > > I was not a part of that conversation either and I'm also operating under a similar assumption about what "integrating the hmmer.pm parser" means. I too am confused about the statement regarding modularization; I assume Kai meant that next_result would leverage the HMMER version number (which it already grabs) to guide the appropriate parsing of the datafile. Not thinking about this too carefully, it might be a simple as: > > next_result{ > version = get_hmmer_version > if version == 2 > parse V2 report file > if version == 3 > parse V3 report file > } > > to make the code a bit more manageable, the various version parsers could be appropriated to independent subroutines. > > Kai, is this along the lines of what you were thinking? > > If this is correct (that is, merging the H2 and H3 parsers into a single hmmer.pm module), I see one primary benefit - the end user need not specify which HMMER module they want to implement, just use Bio::SearchIO::hmmer - and one secondary benefit - there's enough similarity between H2 and H3 reports that some from the H2 parser redundantly appears in the H3 parser. There are certainly other benefits that I'm overlooking. > > The only real downside I see at the moment is that the hmmer.pm parser becomes a bit more complicated and bloated. But I suspect this can be remedied with careful partitioning of the code into appropriate subroutines and thorough documentation. I am a bit concerned about how the aforementioned H3 specific methods are incorporated, but that should be manageable. > > I wonder if anyone involved in the IRC discussion cares to weigh in? > > Regardless, I'd advocate getting the H3 version fully flushed out to deal with the issues brought up in the first half of this message prior to an attempt to merge the two modules, as the merging process may be affected by the structure of the H3 parser. > > Best, > Tom That's essentially the idea, though it can be cleaner than that if we're expecting the entire stream of reports will be of the same version (set the proper next_result method at instantiation). SearchIO::infernal does something like this. Or it can call out to a handler, like SearchIO::blastxml. YMMV. chris From maj at fortinbras.us Wed May 26 13:43:37 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 May 2010 13:43:37 -0400 Subject: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail><9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net><87zkznb4nz.fsf@topper.koldfront.dk><1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: <85C731A2326D45FB903FB1B0D5C5DEBF@NewLife> No zeal is is overweening that is on the side of the Right. ----- Original Message ----- From: "Dave Messina" To: "Hilmar Lapp" Cc: "Adam Sj?gren" ; Sent: Wednesday, May 26, 2010 10:03 AM Subject: Re: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) > > On May 26, 2010, at 15:27, Hilmar Lapp wrote: > >> That would be my understanding too. Maybe some overzealous Bioperl gitizens >> at work who weren't going to wait for this? ;) > > > That would be me. :) > > His commits were sitting in the fork queue, which I mistakenly understood to > mean a pull request had been made. Turns out that's not the case (See > http://github.com/blog/270-the-fork-queue). > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Wed May 26 15:03:21 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:03:21 +0200 Subject: [Bioperl-l] new to this group In-Reply-To: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: Hi Kanna, Welcome! We're always happy to have more people jump in the deep end of the pool and help out. >From my reading of the project priority page, the OBO file parsing stuff has been done: > (This appears to be basically solved with the new OBOEngine, Sohel will need to comment if it is indeed finished). --jason stajich 20:10, 19 June 2006 (EDT) ( see http://www.bioperl.org/wiki/Project_priority_list#Ontology_file_parsing ) Can anyone (Hilmar?) who knows where we're at with this verify that our OBO parser is in good shape? I did notice this open bug, Kanna: bp_load_ontology ISBN title parsing error in OBO format http://bugzilla.open-bio.org/show_bug.cgi?id=2730 Is that something you might be interested in? > I visited the project priority list and the page seems to have been modified around 6 months ago. Agreed, it's probably time for someone to go through and update it. I'll post to the list separately about this. > If it is already completed could anyone suggest modules I can contribute to? But even though the project priority list is outdated, the open bugs list is not: http://bugzilla.open-bio.org/buglist.cgi?product=Bioperl&bug_status=NEW I would recommend you look for something relatively small to start with and submit a patch for that. And then as you go along we'll get a better idea of how to direct you as you get a better idea of what needs to be done. Dave From David.Messina at sbc.su.se Wed May 26 15:22:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:22:40 +0200 Subject: [Bioperl-l] project priority list Message-ID: <0DC6E827-8855-4463-8C58-79CC26BDF42D@sbc.su.se> So, as pointed out by Kanna in another thread, our Project Priority list is getting a little stale. http://www.bioperl.org/wiki/Project_priority_list There are lot of things on there that have been crossed off for years now. I propose that we do some housecleaning, including deleting long-finished projects from the list. (They'll still live on in the wiki history of the page.) Unless someone objects, I'll start poking at it a bit, but if other core devs with relevant knowledge of various projects could take a moment to peruse and edit too, that would be great. Dave From jay at jays.net Wed May 26 15:27:01 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 26 May 2010 14:27:01 -0500 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: <1D273263-F9B4-4612-961B-E2B0F480FBC3@jays.net> On May 26, 2010, at 2:03 PM, Dave Messina wrote: > I would recommend you look for something relatively small to start with and submit a patch for that. Ideally "submit a patch" means create a github.com account, click "fork" on the bioperl-live repo, commit your changes into your fork, then send us a "pull request". :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From scott at scottcain.net Wed May 26 15:36:16 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 26 May 2010 15:36:16 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git Message-ID: Hi all, For GBrowse on the 1.X branch there is a network install script that people can download and execute and it will install all of the prerequisites and then install GBrowse. For this script, we also support a -d(eveloper) option, to get GBrowse and BioPerl from their repositories. Now that BioPerl has moved to git, I have a question: does anybody know if there is a way (preferably via url) to get bioperl from git in a non-interactive way? The read-only url on the bioperl-live git page, http://github.com/bioperl/bioperl-live.git, leads to a 404 error, and even if it didn't, I have a feeling that it would take a click or two to get to downloading source. Does anybody with more git-fu than me (which isn't a hard thing to have, since I don't have much) have any suggestions? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From David.Messina at sbc.su.se Wed May 26 15:41:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:41:10 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: <1F539D4E-D352-4F93-AF1E-E9324B970D34@sbc.su.se> > We can clarify that in the docs on the bioperl site, maybe in a github-specific section. I've stubbed it in on Using Git http://www.bioperl.org/wiki/Using_Git Please modify or expand as you see fit. Dave From scott at scottcain.net Wed May 26 15:57:21 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 26 May 2010 15:57:21 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: Also on the bioperl git page is a "download master" link, which pops up a cute javascript window offering me a choice of zip or tar files. If I copy the url of the tar file, I get a page that says: You are being redirected. where presumably, the digits after "bioperl-release" will change on a regular basis (right?), so that doesn't help much either (yes, I know I could parse the redirect message and get that url, but really, is there such a thing as a HEAD url?) Thanks, Scott On Wed, May 26, 2010 at 3:36 PM, Scott Cain wrote: > Hi all, > > For GBrowse on the 1.X branch there is a network install script that > people can download and execute and it will install all of the > prerequisites and then install GBrowse. ?For this script, we also > support a -d(eveloper) option, to get GBrowse and BioPerl from their > repositories. ?Now that BioPerl has moved to git, I have a question: > does anybody know if there is a way (preferably via url) to get > bioperl from git in a non-interactive way? > > The read-only url on the bioperl-live git page, > http://github.com/bioperl/bioperl-live.git, leads to a 404 error, and > even if it didn't, I have a feeling that it would take a click or two > to get to downloading source. ?Does anybody with more git-fu than me > (which isn't a hard thing to have, since I don't have much) have any > suggestions? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From kai.blin at biotech.uni-tuebingen.de Wed May 26 16:07:02 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Wed, 26 May 2010 22:07:02 +0200 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: <1274904422.3019.2.camel@gonzo.home.kblin.org> On Wed, 2010-05-26 at 15:36 -0400, Scott Cain wrote: Hi Scott, > For GBrowse on the 1.X branch there is a network install script that > people can download and execute and it will install all of the > prerequisites and then install GBrowse. For this script, we also > support a -d(eveloper) option, to get GBrowse and BioPerl from their > repositories. Now that BioPerl has moved to git, I have a question: > does anybody know if there is a way (preferably via url) to get > bioperl from git in a non-interactive way? A quick look on the "BioPerl moved to git" announcement (http://news.open-bio.org/news/2010/05/bioperl-has-moved-to-github/) you can find the following link: http://github.com/bioperl/bioperl-live/archives/master This page gives links to a zip and a tar version of BioPerl's master repository, which seems to be what you want. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From David.Messina at sbc.su.se Wed May 26 16:09:22 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 22:09:22 +0200 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Hi Scott, I think the URLs you want are these http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots snapshots of the current repository. If you want instead to grab a static version of a repository, say a tagged revision, you can do like this: http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 (where "for_gmod_0_003" is the tag). By the way, I am getting these URLs on GitHub by: 1. going to the GitHub page for the relevant repository e.g. http://github.com/bioperl/bioperl-live 2. navigating to the tag or branch of interest using the "Switch Branches" or "Switch Tags" pulldowns 3. clicking on the Download Source button 4. right-clicking on the big TAR icon to copy the link underlying it Dave From rmb32 at cornell.edu Wed May 26 16:48:13 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 26 May 2010 13:48:13 -0700 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: <4BFD890D.4080205@cornell.edu> Sigh .... once we get our house in order to the point where it's easy to and quick to make releases with bugfixes, you'll be able to just get the most recent copies of the parts you need from CPAN. That'll be the day. Rob From hlapp at drycafe.net Wed May 26 18:05:36 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 26 May 2010 16:05:36 -0600 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: On May 26, 2010, at 1:03 PM, Dave Messina wrote: > Can anyone (Hilmar?) who knows where we're at with this verify that > our OBO parser is in good shape? The obo parser should be working. It's not wrapping the go-perl parser though. I should revisit the code I've written for that, I know ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Wed May 26 19:27:27 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 18:27:27 -0500 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> On May 26, 2010, at 5:05 PM, Hilmar Lapp wrote: > > On May 26, 2010, at 1:03 PM, Dave Messina wrote: > >> Can anyone (Hilmar?) who knows where we're at with this verify that our OBO parser is in good shape? > > > The obo parser should be working. It's not wrapping the go-perl parser though. I should revisit the code I've written for that, I know ... > > -hilmar So, that might be an area for someone to work on? chris From hlapp at drycafe.net Thu May 27 09:30:05 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 27 May 2010 07:30:05 -0600 Subject: [Bioperl-l] new to this group In-Reply-To: <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> Message-ID: <292C7384-2EF0-45F7-85F9-BB173FE2B6E5@drycafe.net> On May 26, 2010, at 5:27 PM, Chris Fields wrote: >> The obo parser should be working. It's not wrapping the go-perl >> parser though. I should revisit the code I've written for that, I >> know ... >> > > So, that might be an area for someone to work on? Certainly if you want to start from scratch. The code I've written isn't committed (yes, shame on me). That said, I suppose I could now easily commit it to a branch and not cause any harm, right :-) It's not a very good target for a newcomer at all, though. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From kai.blin at biotech.uni-tuebingen.de Thu May 27 10:50:40 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 27 May 2010 16:50:40 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: <1274971840.9545.316.camel@mikropc7.biotech.uni-tuebingen.de> On Wed, 2010-05-26 at 08:25 -0700, Thomas Sharpton wrote: > > Having not considered it too much, I'm not sure how to accomplish > > this without breaking the SearchIO idiom. But presumably a way could > > be found. > > > > I'll see if I can't hit the drawing board and come up with a naming > scheme for additional H3 methods that retrieve some of the extra data > encoded in the new reports. It *probably* makes most sense, at least > from the standpoint of the user's perspective, to adopt the full- > length report values as the standard hit->significance and hit- > >raw_score while having something like hit->best_significance and hit- > >best_score as H3 methods that return the best-domain report values. > Again, this could use some thought/discussion. My reasoning for the change was that you can get at the best sequence score by (at worst) iterating over the top sequences. Without the change there was no way to get at the overall profile score, so that data was lost. Arguably this is just one way to try and make the data from the HMMer results accessible via the SearchIO interface. > I was not a part of that conversation either and I'm also operating > under a similar assumption about what "integrating the hmmer.pm > parser" means. I too am confused about the statement regarding > modularization; I assume Kai meant that next_result would leverage the > HMMER version number (which it already grabs) to guide the appropriate > parsing of the datafile. Not thinking about this too carefully, it > might be a simple as: > > next_result{ > version = get_hmmer_version > if version == 2 > parse V2 report file > if version == 3 > parse V3 report file > } > > to make the code a bit more manageable, the various version parsers > could be appropriated to independent subroutines. > > Kai, is this along the lines of what you were thinking? Yes, this is more or less what I meant. But I agree that we first want to get the hmmer3 parser sorted out and working nicely. More test cases for the parser would be nice, I just got sidetracked by another bug affecting my code. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From scott at scottcain.net Thu May 27 11:29:42 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 27 May 2010 11:29:42 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: Hi All, Thanks for pointing out the links. It's weird: using curl on those urls retrieves a "redirect" page, whereas LWP::Simple::mirror gets the tarball. Anyway, the script works again :-) Scott On Wed, May 26, 2010 at 4:09 PM, Dave Messina wrote: > Hi Scott, > > I think the URLs you want are these > > ? ? ? ?http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots > > snapshots of the current repository. > > > If you want instead to grab a static version of a repository, say a tagged revision, you can do like this: > > http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 > > (where "for_gmod_0_003" is the tag). > > > By the way, I am getting these URLs on GitHub by: > > 1. ?going to the GitHub page for the relevant repository > > ? ? ? ?e.g. http://github.com/bioperl/bioperl-live > > 2. ?navigating to the tag or branch of interest using the "Switch Branches" or "Switch Tags" pulldowns > > 3. ?clicking on the Download Source button > > 4. ?right-clicking on the big TAR icon to copy the link underlying it > > > > Dave > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Thu May 27 11:40:37 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 May 2010 11:40:37 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Message-ID: Chris, Removed all erroneous references to Subversion except for these pages, which require detailed editing and/or a familiarity with Git: http://www.bioperl.org/wiki/Emacs_bioperl-mode http://www.bioperl.org/wiki/HOWTO:Wrappers http://www.bioperl.org/wiki/Making_a_BioPerl_release http://www.bioperl.org/w/index.php/HOWTO:BlastPlus One issue now is the references to pedigree, microarray, GUI, pipeline, and ext, which only exist in SVN. Also GUI, pipeline, and microarray are unsupported, and have been unsupported for many years. Yet they are still listed in pages like: http://www.bioperl.org/wiki/Getting_BioPerl They shouldn't be listed alongside bioperl-live or -run, or they should not be listed at all. Should they be removed? or put into their own "unsupported" section? Brian O. On May 20, 2010, at 11:37 AM, Chris Fields wrote: > Yes, if you have time. I have started along that path already, but I'm sure there are lingering spots where links point to the wrong place, or subversion/svn is mentioned. > > chris > > On May 20, 2010, at 10:34 AM, Brian Osborne wrote: > >> Chris, >> >> Done, easy. Should I remove all references to SVN from the Wiki? >> >> Brian O. >> >> On May 18, 2010, at 2:04 PM, Chris Fields wrote: >> >>> Yes. >>> >>> chris >>> >>> On May 18, 2010, at 11:06 AM, Brian Osborne wrote: >>> >>>> bioperl-l, >>>> >>>> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >>>> >>>> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >>>> >>>> Brian O. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From cjfields at illinois.edu Thu May 27 11:58:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 May 2010 10:58:06 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Message-ID: On May 27, 2010, at 10:40 AM, Brian Osborne wrote: > Chris, > > Removed all erroneous references to Subversion except for these pages, which require detailed editing and/or a familiarity with Git: > > http://www.bioperl.org/wiki/Emacs_bioperl-mode > > http://www.bioperl.org/wiki/HOWTO:Wrappers > > http://www.bioperl.org/wiki/Making_a_BioPerl_release > > http://www.bioperl.org/w/index.php/HOWTO:BlastPlus Okay, looks good so far. I know the emacs mode stuff will be handled by Mark (I'm assuming the others will follow suit). I'll have to go in and clean up the 'making a release' page myself to update it. > One issue now is the references to pedigree, microarray, GUI, pipeline, and ext, which only exist in SVN. By 'only existing in svn', do you mean they are only found there? I moved everything over for archiving: http://github.com/bioperl/bioperl-gui http://github.com/bioperl/bioperl-microarray http://github.com/bioperl/bioperl-pedigree http://github.com/bioperl/bioperl-pipeline > Also GUI, pipeline, and microarray are unsupported, and have been unsupported for many years. Yet they are still listed in pages like: > > http://www.bioperl.org/wiki/Getting_BioPerl > > They shouldn't be listed alongside bioperl-live or -run, or they should not be listed at all. > > Should they be removed? or put into their own "unsupported" section? I think to an 'unsupported' or 'unmaintained' section; could add the corba and pise ones as well (just noticed that the pise repo was missing from github, so just added it for archiving). > Brian O. Thanks brian! chris From sdavis2 at mail.nih.gov Thu May 27 12:04:04 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 27 May 2010 12:04:04 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: On Thu, May 27, 2010 at 11:29 AM, Scott Cain wrote: > Hi All, > > Thanks for pointing out the links. It's weird: using curl on those > urls retrieves a "redirect" page, whereas LWP::Simple::mirror gets the > tarball. Anyway, the script works again :-) > > Hi, Scott. For curl, try: curl -L .... The -L follows redirects. Sean > > On Wed, May 26, 2010 at 4:09 PM, Dave Messina > wrote: > > Hi Scott, > > > > I think the URLs you want are these > > > > http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots > > > > snapshots of the current repository. > > > > > > If you want instead to grab a static version of a repository, say a > tagged revision, you can do like this: > > > > http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 > > > > (where "for_gmod_0_003" is the tag). > > > > > > By the way, I am getting these URLs on GitHub by: > > > > 1. going to the GitHub page for the relevant repository > > > > e.g. http://github.com/bioperl/bioperl-live > > > > 2. navigating to the tag or branch of interest using the "Switch > Branches" or "Switch Tags" pulldowns > > > > 3. clicking on the Download Source button > > > > 4. right-clicking on the big TAR icon to copy the link underlying it > > > > > > > > Dave > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From remi.planel at free.fr Fri May 28 06:29:50 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 12:29:50 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult Message-ID: <4BFF9B1E.10500@free.fr> Hi all, I would like to get a clone of a Bio::Search::Result::GenericResult object and I'm not sure of what I'm doing ... I've tried something like : /my $searchIn = Bio::SearchIO->new( -file => 'result.bls', -format => 'blastxml', ); my $result = $searchIn->next_result; my $result_copy = $result->new($result); /It seems to work but I'm not sure to understand how. So I would like to know if I'll get in trouble using this code and if all the fields are copied one by one. Thank you, R?mi // From David.Messina at sbc.su.se Fri May 28 07:32:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 28 May 2010 13:32:40 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFF9B1E.10500@free.fr> References: <4BFF9B1E.10500@free.fr> Message-ID: Hi R?mi, As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). So I don't think the code you showed will work. However, there are modules such as Clone::More and Clone::Fast that can do it. http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. Dave From remi.planel at free.fr Fri May 28 08:17:01 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 14:17:01 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: References: <4BFF9B1E.10500@free.fr> Message-ID: <4BFFB43D.50409@free.fr> You're right, it's not working there is some missing fields ... Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : -Display Result object as HTML -Ask for filter criteria -Filter Result object -Display filtered Result object as HTML. ... etc And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. I'll have a look to the modules you've mentioned, thanks. Dave Messina wrote: > Hi R?mi, > > As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). > > So I don't think the code you showed will work. > > However, there are modules such as Clone::More and Clone::Fast that can do it. > > http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm > http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm > > > Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. > > Dave > > > From cjfields at illinois.edu Fri May 28 09:25:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 28 May 2010 08:25:54 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFFB43D.50409@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> Message-ID: <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> Remi, Using the constructor that way is not supported. But it's completely unnecessary. Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. Something like the following should work (of course completely untested :) my $result = $in->next_result; # filter on HSP write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); # rewind the result to go back to the beginning $result->rewind; # open a new filehandle here for second report output # filter on hit and HSP write_html('result2.html', $result, { 'HIT' => \&hit_filter, 'HSP' => \&hsp_filter }); # rewind the result to go back to the beginning $result->rewind; # and so on.... sub write_html { my ($file, $result, $filters) = @_; # note that $filter is a hash ref above my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new (-filters => $filters ); my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); $out->write_result($result); } sub hsp_filter { my $hsp = shift; return 1 if $hsp->length('total') > 100; } sub hit_filter { my $hit = shift; return 1 if $hit->significance < 1e-5; } chris On May 28, 2010, at 7:17 AM, Remi wrote: > You're right, it's not working there is some missing fields ... > > Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : > > -Display Result object as HTML > -Ask for filter criteria > -Filter Result object > -Display filtered Result object as HTML. > ... etc > > And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. > > I'll have a look to the modules you've mentioned, thanks. > > > > > Dave Messina wrote: >> Hi R?mi, >> >> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >> >> So I don't think the code you showed will work. >> >> However, there are modules such as Clone::More and Clone::Fast that can do it. >> >> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >> >> >> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >> >> Dave >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri May 28 10:34:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 28 May 2010 09:34:13 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFFD3D5.2000409@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> Message-ID: Let us know how it goes, and if you run into any bugs. chris On May 28, 2010, at 9:31 AM, Remi wrote: > Thank you very much !!!! > I'm gonna try it right away > > Chris Fields wrote: >> Remi, >> >> Using the constructor that way is not supported. But it's completely unnecessary. >> >> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >> >> Something like the following should work (of course completely untested :) >> >> my $result = $in->next_result; >> >> # filter on HSP >> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >> >> # rewind the result to go back to the beginning >> $result->rewind; >> >> # open a new filehandle here for second report output >> # filter on hit and HSP >> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >> 'HSP' => \&hsp_filter }); >> >> # rewind the result to go back to the beginning >> $result->rewind; >> >> # and so on.... >> >> sub write_html { >> my ($file, $result, $filters) = @_; >> # note that $filter is a hash ref above >> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >> (-filters => $filters ); >> >> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >> $out->write_result($result); >> } >> >> sub hsp_filter { >> my $hsp = shift; >> return 1 if $hsp->length('total') > 100; >> } >> >> sub hit_filter { >> my $hit = shift; >> return 1 if $hit->significance < 1e-5; >> } >> >> chris >> >> >> On May 28, 2010, at 7:17 AM, Remi wrote: >> >> >> >>> You're right, it's not working there is some missing fields ... >>> >>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>> >>> -Display Result object as HTML >>> -Ask for filter criteria >>> -Filter Result object >>> -Display filtered Result object as HTML. >>> ... etc >>> >>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>> >>> I'll have a look to the modules you've mentioned, thanks. >>> >>> >>> >>> >>> Dave Messina wrote: >>> >>> >>>> Hi R?mi, >>>> >>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>> >>>> So I don't think the code you showed will work. >>>> >>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>> >>>> >>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>> >>>> >>>> >>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>> >>>> Dave >>>> >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> > From remi.planel at free.fr Fri May 28 10:31:49 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 16:31:49 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> Message-ID: <4BFFD3D5.2000409@free.fr> An HTML attachment was scrubbed... URL: From fij at elte.hu Sun May 30 05:32:58 2010 From: fij at elte.hu (Farkas, Illes) Date: Sun, 30 May 2010 11:32:58 +0200 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) Message-ID: Hi, I've ran across a relatively simple, but specific task. I would like to put interaction (, , ) data from many sources (databases) into a single list containing the following in each record: , , , . (I am aware that there will be some loss during the ID conversion.) I have found so far the following possibilities: (1) BioMart perl API. Seems to be much smarter (and more complex) than what I would need. Also, I would need to parse input and output just as much as with newly written subroutines/modules. (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and KEGG IDs, but I could not find them on the "From" list. (3) Synergizer. I cannot run it in remote batch mode. From what I would need I could not find BioGrid, ENSP and KEGG identifiers. (4) Writing it all with ID mapping files downloaded from each database and contributing it to BioPerl. How can I contribute? How do I find the best place within BioPerl to add a particular module? Whom do I need to ask for approval? Thanks in advance for any comments. Illes -- http://hal.elte.hu/fij From maj at fortinbras.us Sun May 30 09:42:50 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 30 May 2010 09:42:50 -0400 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: Message-ID: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Illes-- no approval necessary (or, if you like, I approve). What you can do is describe what you want to do as an enhancement request at http://bugzilla.bioperl.org, and then attach your new code to that request. We can review it from there. cheers MAJ ----- Original Message ----- From: "Farkas, Illes" To: Sent: Sunday, May 30, 2010 5:32 AM Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > Hi, > > I've ran across a relatively simple, but specific task. I would like to put > interaction (, , ) data from many sources > (databases) into a single list containing the following in each record: > , , , > . (I am aware that there will be some loss during the ID > conversion.) > > I have found so far the following possibilities: > > (1) BioMart perl API. Seems to be much smarter (and more complex) than what > I would need. Also, I would need to parse input and output just as much as > with newly written subroutines/modules. > > (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and > KEGG IDs, but I could not find them on the "From" list. > > (3) Synergizer. I cannot run it in remote batch mode. From what I would need > I could not find BioGrid, ENSP and KEGG identifiers. > > (4) Writing it all with ID mapping files downloaded from each database and > contributing it to BioPerl. How can I contribute? How do I find the best > place within BioPerl to add a particular module? Whom do I need to ask for > approval? > > Thanks in advance for any comments. > Illes > > -- > http://hal.elte.hu/fij > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun May 30 11:00:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 30 May 2010 10:00:09 -0500 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> References: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Message-ID: Another couple of options: 1) for code changes, fork the code on GitHub, add your code there, then make a push request 2) for adding code, create a repo on github with the code, chris On May 30, 2010, at 8:42 AM, Mark A. Jensen wrote: > Illes-- no approval necessary (or, if you like, I approve). What you can do is describe what you want to do as an enhancement request at http://bugzilla.bioperl.org, and then attach your new code to that request. We can review it from there. > cheers MAJ > ----- Original Message ----- From: "Farkas, Illes" > To: > Sent: Sunday, May 30, 2010 5:32 AM > Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > > >> Hi, >> >> I've ran across a relatively simple, but specific task. I would like to put >> interaction (, , ) data from many sources >> (databases) into a single list containing the following in each record: >> , , , >> . (I am aware that there will be some loss during the ID >> conversion.) >> >> I have found so far the following possibilities: >> >> (1) BioMart perl API. Seems to be much smarter (and more complex) than what >> I would need. Also, I would need to parse input and output just as much as >> with newly written subroutines/modules. >> >> (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and >> KEGG IDs, but I could not find them on the "From" list. >> >> (3) Synergizer. I cannot run it in remote batch mode. From what I would need >> I could not find BioGrid, ENSP and KEGG identifiers. >> >> (4) Writing it all with ID mapping files downloaded from each database and >> contributing it to BioPerl. How can I contribute? How do I find the best >> place within BioPerl to add a particular module? Whom do I need to ask for >> approval? >> >> Thanks in advance for any comments. >> Illes >> >> -- >> http://hal.elte.hu/fij >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun May 30 11:05:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 30 May 2010 10:05:37 -0500 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: Message-ID: <84D300DB-C22D-494E-ABAF-EBC10FEE0E7C@illinois.edu> On May 30, 2010, at 4:32 AM, Farkas, Illes wrote: > Hi, > > I've ran across a relatively simple, but specific task. I would like to put > interaction (, , ) data from many sources > (databases) into a single list containing the following in each record: > , , , > . (I am aware that there will be some loss during the ID > conversion.) > > I have found so far the following possibilities: > > (1) BioMart perl API. Seems to be much smarter (and more complex) than what > I would need. Also, I would need to parse input and output just as much as > with newly written subroutines/modules. Or, wondering whether you could create a set of BioPerl<->BioMart bridge modules. > (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and > KEGG IDs, but I could not find them on the "From" list. I added an id_mapper to Bio::DB::SwissProt that calls to this. It hasn't been broadly tested yet, but you are welcome to add more to it. Might also be useful to have a DB wrapper around a locally-built ID mapping database, which would give you more flexibility than the web interface. > (3) Synergizer. I cannot run it in remote batch mode. From what I would need > I could not find BioGrid, ENSP and KEGG identifiers. > > (4) Writing it all with ID mapping files downloaded from each database and > contributing it to BioPerl. How can I contribute? How do I find the best > place within BioPerl to add a particular module? Whom do I need to ask for > approval? > > Thanks in advance for any comments. > Illes A generalized ID mapping interface would be nice. You could also incorporate some of NCBI's eutils stuff along these lines, or their gi2acc mappings. chris From maj at fortinbras.us Sun May 30 19:59:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 30 May 2010 19:59:38 -0400 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Message-ID: <6553B9DFF86F472B8B2D0D8A72171056@NewLife> Yes, that's definitely the Way to Do It post-git- MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Farkas, Illes" ; Sent: Sunday, May 30, 2010 11:00 AM Subject: Re: [Bioperl-l] ID mapping (or: contributing to BioPerl) Another couple of options: 1) for code changes, fork the code on GitHub, add your code there, then make a push request 2) for adding code, create a repo on github with the code, chris On May 30, 2010, at 8:42 AM, Mark A. Jensen wrote: > Illes-- no approval necessary (or, if you like, I approve). What you can do is > describe what you want to do as an enhancement request at > http://bugzilla.bioperl.org, and then attach your new code to that request. We > can review it from there. > cheers MAJ > ----- Original Message ----- From: "Farkas, Illes" > To: > Sent: Sunday, May 30, 2010 5:32 AM > Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > > >> Hi, >> >> I've ran across a relatively simple, but specific task. I would like to put >> interaction (, , ) data from many sources >> (databases) into a single list containing the following in each record: >> , , , >> . (I am aware that there will be some loss during the ID >> conversion.) >> >> I have found so far the following possibilities: >> >> (1) BioMart perl API. Seems to be much smarter (and more complex) than what >> I would need. Also, I would need to parse input and output just as much as >> with newly written subroutines/modules. >> >> (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and >> KEGG IDs, but I could not find them on the "From" list. >> >> (3) Synergizer. I cannot run it in remote batch mode. From what I would need >> I could not find BioGrid, ENSP and KEGG identifiers. >> >> (4) Writing it all with ID mapping files downloaded from each database and >> contributing it to BioPerl. How can I contribute? How do I find the best >> place within BioPerl to add a particular module? Whom do I need to ask for >> approval? >> >> Thanks in advance for any comments. >> Illes >> >> -- >> http://hal.elte.hu/fij >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 31 09:23:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 31 May 2010 08:23:13 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4C037F22.3090209@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> Message-ID: <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> That sounds like a bug. Does filtering at the hit level work around this? sub hit_filter { my $hit = shift; # filter hsps here my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; @passing_hsps; } sub hsp_filter { # original filter } chris On May 31, 2010, at 4:19 AM, Remi wrote: > Hi, > > Everything is working well but there is still one point that giving me some trouble. > When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. > Is there a way to get rid of this description line ? > Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? > > Thanks, > > R?mi > > > Chris Fields wrote: >> Let us know how it goes, and if you run into any bugs. >> >> chris >> >> On May 28, 2010, at 9:31 AM, Remi wrote: >> >> >> >>> Thank you very much !!!! >>> I'm gonna try it right away >>> >>> Chris Fields wrote: >>> >>> >>>> Remi, >>>> >>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>> >>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>> >>>> Something like the following should work (of course completely untested :) >>>> >>>> my $result = $in->next_result; >>>> >>>> # filter on HSP >>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>> >>>> # rewind the result to go back to the beginning >>>> $result->rewind; >>>> >>>> # open a new filehandle here for second report output >>>> # filter on hit and HSP >>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>> 'HSP' => \&hsp_filter }); >>>> >>>> # rewind the result to go back to the beginning >>>> $result->rewind; >>>> >>>> # and so on.... >>>> >>>> sub write_html { >>>> my ($file, $result, $filters) = @_; >>>> # note that $filter is a hash ref above >>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>> (-filters => $filters ); >>>> >>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >>>> $out->write_result($result); >>>> } >>>> >>>> sub hsp_filter { >>>> my $hsp = shift; >>>> return 1 if $hsp->length('total') > 100; >>>> } >>>> >>>> sub hit_filter { >>>> my $hit = shift; >>>> return 1 if $hit->significance < 1e-5; >>>> } >>>> >>>> chris >>>> >>>> >>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>> >>>> >>>> >>>> >>>> >>>>> You're right, it's not working there is some missing fields ... >>>>> >>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>> >>>>> -Display Result object as HTML >>>>> -Ask for filter criteria >>>>> -Filter Result object >>>>> -Display filtered Result object as HTML. >>>>> ... etc >>>>> >>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>> >>>>> I'll have a look to the modules you've mentioned, thanks. >>>>> >>>>> >>>>> >>>>> >>>>> Dave Messina wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Hi R?mi, >>>>>> >>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>> >>>>>> So I don't think the code you showed will work. >>>>>> >>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>> >>>>>> >>>>>> >>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>> >>>>>> Dave >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> >>>>> >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >> >> >> > From remi.planel at free.fr Mon May 31 09:47:40 2010 From: remi.planel at free.fr (Remi) Date: Mon, 31 May 2010 15:47:40 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> Message-ID: <4C03BDFC.5050109@free.fr> Yes, at the hit level everything works fine. Actually, at the hsp level, the alignment part is not written to the HTML file but the description before the alignment and the description of the hit at the beginning of the file are written. I had a quick look to the code and I'm not sure this is a bug. Chris Fields wrote: > That sounds like a bug. Does filtering at the hit level work around this? > > sub hit_filter { > my $hit = shift; > # filter hsps here > my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; > @passing_hsps; > } > > sub hsp_filter { > # original filter > } > > chris > > On May 31, 2010, at 4:19 AM, Remi wrote: > > >> Hi, >> >> Everything is working well but there is still one point that giving me some trouble. >> When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. >> Is there a way to get rid of this description line ? >> Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? >> >> Thanks, >> >> R?mi >> >> >> Chris Fields wrote: >> >>> Let us know how it goes, and if you run into any bugs. >>> >>> chris >>> >>> On May 28, 2010, at 9:31 AM, Remi wrote: >>> >>> >>> >>> >>>> Thank you very much !!!! >>>> I'm gonna try it right away >>>> >>>> Chris Fields wrote: >>>> >>>> >>>> >>>>> Remi, >>>>> >>>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>>> >>>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>>> >>>>> Something like the following should work (of course completely untested :) >>>>> >>>>> my $result = $in->next_result; >>>>> >>>>> # filter on HSP >>>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>>> >>>>> # rewind the result to go back to the beginning >>>>> $result->rewind; >>>>> >>>>> # open a new filehandle here for second report output >>>>> # filter on hit and HSP >>>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>>> 'HSP' => \&hsp_filter }); >>>>> >>>>> # rewind the result to go back to the beginning >>>>> $result->rewind; >>>>> >>>>> # and so on.... >>>>> >>>>> sub write_html { >>>>> my ($file, $result, $filters) = @_; >>>>> # note that $filter is a hash ref above >>>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>>> (-filters => $filters ); >>>>> >>>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >>>>> $out->write_result($result); >>>>> } >>>>> >>>>> sub hsp_filter { >>>>> my $hsp = shift; >>>>> return 1 if $hsp->length('total') > 100; >>>>> } >>>>> >>>>> sub hit_filter { >>>>> my $hit = shift; >>>>> return 1 if $hit->significance < 1e-5; >>>>> } >>>>> >>>>> chris >>>>> >>>>> >>>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> You're right, it's not working there is some missing fields ... >>>>>> >>>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>>> >>>>>> -Display Result object as HTML >>>>>> -Ask for filter criteria >>>>>> -Filter Result object >>>>>> -Display filtered Result object as HTML. >>>>>> ... etc >>>>>> >>>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>>> >>>>>> I'll have a look to the modules you've mentioned, thanks. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Dave Messina wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Hi R?mi, >>>>>>> >>>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>>> >>>>>>> So I don't think the code you showed will work. >>>>>>> >>>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>>> >>>>>>> Dave >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> >>>>>> >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> > > From cjfields at illinois.edu Mon May 31 09:54:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 31 May 2010 08:54:22 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4C03BDFC.5050109@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> <4C03BDFC.5050109@free.fr> Message-ID: <454FE98D-4EE5-4DFB-A877-6DE7822C4DA4@illinois.edu> My concern is to ensure we aren't filtering twice as much (one at the hit level, one pass at the HSP level). It should be one pass. chris On May 31, 2010, at 8:47 AM, Remi wrote: > Yes, at the hit level everything works fine. > Actually, at the hsp level, the alignment part is not written to the HTML file but the description before the alignment and the description of the hit at the beginning of the file are written. > > I had a quick look to the code and I'm not sure this is a bug. > > Chris Fields wrote: >> That sounds like a bug. Does filtering at the hit level work around this? >> >> sub hit_filter { >> my $hit = shift; >> # filter hsps here >> my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; >> @passing_hsps; >> } >> >> sub hsp_filter { >> # original filter >> } >> >> chris >> >> On May 31, 2010, at 4:19 AM, Remi wrote: >> >> >>> Hi, >>> >>> Everything is working well but there is still one point that giving me some trouble. >>> When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. >>> Is there a way to get rid of this description line ? >>> Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? >>> >>> Thanks, >>> >>> R?mi >>> >>> >>> Chris Fields wrote: >>> >>>> Let us know how it goes, and if you run into any bugs. >>>> >>>> chris >>>> >>>> On May 28, 2010, at 9:31 AM, Remi wrote: >>>> >>>> >>>> >>>>> Thank you very much !!!! >>>>> I'm gonna try it right away >>>>> >>>>> Chris Fields wrote: >>>>> >>>>> >>>>>> Remi, >>>>>> >>>>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>>>> Something like the following should work (of course completely untested :) >>>>>> >>>>>> my $result = $in->next_result; >>>>>> >>>>>> # filter on HSP >>>>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>>>> >>>>>> # rewind the result to go back to the beginning >>>>>> $result->rewind; >>>>>> >>>>>> # open a new filehandle here for second report output >>>>>> # filter on hit and HSP >>>>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>>>> 'HSP' => \&hsp_filter }); >>>>>> >>>>>> # rewind the result to go back to the beginning >>>>>> $result->rewind; >>>>>> >>>>>> # and so on.... >>>>>> >>>>>> sub write_html { >>>>>> my ($file, $result, $filters) = @_; >>>>>> # note that $filter is a hash ref above >>>>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>>>> (-filters => $filters ); >>>>>> >>>>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); $out->write_result($result); >>>>>> } >>>>>> >>>>>> sub hsp_filter { my $hsp = shift; >>>>>> return 1 if $hsp->length('total') > 100; >>>>>> } >>>>>> >>>>>> sub hit_filter { my $hit = shift; >>>>>> return 1 if $hit->significance < 1e-5; >>>>>> } >>>>>> >>>>>> chris >>>>>> >>>>>> >>>>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> You're right, it's not working there is some missing fields ... >>>>>>> >>>>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>>>> >>>>>>> -Display Result object as HTML >>>>>>> -Ask for filter criteria >>>>>>> -Filter Result object >>>>>>> -Display filtered Result object as HTML. >>>>>>> ... etc >>>>>>> >>>>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>>>> >>>>>>> I'll have a look to the modules you've mentioned, thanks. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Dave Messina wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi R?mi, >>>>>>>> >>>>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>>>> >>>>>>>> So I don't think the code you showed will work. >>>>>>>> >>>>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>>>> >>>>>>>> Dave >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> >>>>>>> >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From remi.planel at free.fr Mon May 31 05:19:30 2010 From: remi.planel at free.fr (Remi) Date: Mon, 31 May 2010 11:19:30 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> Message-ID: <4C037F22.3090209@free.fr> An HTML attachment was scrubbed... URL: From aradwen at gmail.com Sat May 1 06:45:18 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Sat, 1 May 2010 12:45:18 +0200 Subject: [Bioperl-l] Pfam_Scan Message-ID: Hello everyone, I would like to know if there is a way to cluster the output of Pfam_Scan results. I mean is we can parse it and then output clusters containing sequences sharing the same domains or Pfams. This is a bit special since we could have multidomains proteins inside, which rule we have to follow in this case ? Rad -- R. ANIBA From David.Messina at sbc.su.se Sat May 1 18:28:48 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 00:28:48 +0200 Subject: [Bioperl-l] Pfam_Scan In-Reply-To: References: Message-ID: <6CA3B4F2-CF3E-45DD-BE51-9F7218C5CEE9@sbc.su.se> Hi Rad, As far as I can tell the Pfam_Scan output is simply tab-delimited text (see details below), so you should be able to group sequences which share domains by sorting on the sixth column. I suspect that sequences with multiple domain hits will have multiple lines in the output, one per hit, so if you want to identify sequences which share the same _set_ of domains you will have to do the bookkeeping yourself. That being said, Pfam_Scan is not part of BioPerl ? it's distributed by the Pfam team ? so you may want to contact them directly for help (pfam-help at sanger.ac.uk). Dave [from the Pfam_Scan documentation] The output format is: Example output (with -pfamB, -as options): Q5NEL3.1 2 224 2 227 PB013481 Pfam-B_13481 Pfam-B 1 184 226 358.5 1.4e-107 NA NA O65039.1 38 93 38 93 PF08246 Inhibitor_I29 Domain 1 58 58 45.9 2.8e-12 1 No_clan O65039.1 126 342 126 342 PF00112 Peptidase_C1 Domain 1 216 216 296.0 1.1e-88 1 CL0125 predicted_active_site[150,285,307] From David.Messina at sbc.su.se Sun May 2 04:54:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 10:54:54 +0200 Subject: [Bioperl-l] RFC: SNP::Inherit In-Reply-To: References: Message-ID: Hi Christopher, Looks good! The only recommendation I would make is to change the namespace to Bio::SNP::Inherit. The convention on CPAN is to minimize the number of new toplevel namespaces (which SNP would be), and although many of the Bio::* modules are part of BioPerl, that namespace is not restricted to BioPerl and there are plenty of non-BioPerl packages there. Dave On Apr 29, 2010, at 10:26 PM, Christopher Bottoms wrote: > Dear Bioperl community, > > I was thinking of uploading a module to CPAN that converts SNP genotype data > to parental allele designations. Below is the perldoc. This is not a > "BioPerl" module per se, so I'm not sure what namespace to put it under. > > I would be glad to send anyone the source if they are interested in checking > it out more. I just did not want to send everyone an unsolicited attachment. > > Thank you for your time, > Christopher Bottoms (molecules) > From David.Messina at sbc.su.se Sun May 2 05:59:07 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 11:59:07 +0200 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <4BDA986D.3020302@bii.a-star.edu.sg> References: <4BDA986D.3020302@bii.a-star.edu.sg> Message-ID: <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> Hi Dimitar, The syntax you want is: # Build a Genewise alignment factory my $factory = Bio::Tools::Run::Genewise->new(); # turn on the quiet switch $factory->QUIET(1); # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects my @genes = $factory->run($protein_seq, $genomic_seq); This turns out be incorrectly documented on the man page, at least in part: > Available Params: > > NB: These should be passed without the '-' or they will be ignored, > except switches such as 'hmmer' (which have no corresponding value) > which should be set on the factory object using the AUTOLOADed methods > of the same name. > > Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] > Alg [-kbyte,-alg] > HMM [-hmmer] > Output [-gff,-gener,-alb,-pal,-block,-divide] > Standard [-help,-version,-silent,-quiet,-errorlog] That is, these don't work as expected: $factory->quiet; $factory->quiet(1); due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. Dave From maj at fortinbras.us Sun May 2 15:28:22 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 2 May 2010 15:28:22 -0400 Subject: [Bioperl-l] new core developers Rob Buels and Dave Messina Message-ID: Hi Folks, On behalf of the core team, I am delighted to announce two new members: Rob Buels and Dave Messina. They are so, er, honored on the basis of their selfless work on the list, on IRC, in development of new modules and their active and sustained participation in BioPerl maintenance, design and promotion. Welcome Rob and Dave! MAJ and the BioPerl core developers From skastu01 at students.poly.edu Sun May 2 22:41:04 2010 From: skastu01 at students.poly.edu (Lakshmi Kastury) Date: Mon, 3 May 2010 02:41:04 +0000 Subject: [Bioperl-l] Using BIO::SEARCHIO Message-ID: I am attempting to use the BIO::SEARCHIO system to parse a Blast output file. A new instance is he file is read through the following: my $input = new BIO::SearchIO (-file =>'blast_report_0.txt', -format =>'blast'); When I run my program, I receive the following message: "Can't locate object method "new" via package "BIO::SearchIO" (perhaps you forgot to load "BIO::SearchIO"? Is this an optional module which needs to be installed separately? Thanks, Lakshmi Kastury From maj at fortinbras.us Sun May 2 22:57:28 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 2 May 2010 22:57:28 -0400 Subject: [Bioperl-l] Using BIO::SEARCHIO In-Reply-To: References: Message-ID: you need to say "Bio::SearchIO", and not "BIO::SearchIO" MAJ ----- Original Message ----- From: "Lakshmi Kastury" To: Sent: Sunday, May 02, 2010 10:41 PM Subject: [Bioperl-l] Using BIO::SEARCHIO > > > > > > > > > > > > I am attempting to use the BIO::SEARCHIO system to parse a Blast output file. > > A new instance is he file is read through the following: > my $input = new BIO::SearchIO (-file =>'blast_report_0.txt', -format > =>'blast'); > > When I run my program, I receive the following message: > "Can't locate object method "new" via package "BIO::SearchIO" (perhaps you > forgot to load "BIO::SearchIO"? > > Is this an optional module which needs to be installed separately? > > > > Thanks, > Lakshmi Kastury > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 3 00:22:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 2 May 2010 23:22:46 -0500 Subject: [Bioperl-l] Full bioperl-live github demo Message-ID: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> All, I have pushed a demo of the bioperl-live (all branches and tags) to github here: http://github.com/bioperl/bioperl-test This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. chris From heikki.lehvaslaiho at gmail.com Mon May 3 07:45:10 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 3 May 2010 14:45:10 +0300 Subject: [Bioperl-l] BLAST parsing broken Message-ID: Chris, latest additions to Bio::SearchIO::blast.pm broke the parsing of normal blast output. $result->query_name returns now undef. (Using the anonymous git now). This change still works: commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 Author: cjfields Date: Sun Dec 20 04:39:58 2009 +0000 Robson's patch for buggy blastpgp output But this does not: commit 9a89c3434597104dd50553e3562983d78d14a544 Author: cjfields Date: Thu Apr 15 04:21:17 2010 +0000 [bug 3031] patches for catching algorithm ref, courtesy Razi Khaja. That makes it easy to find the diffs: $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm index 378023a..6f7eeeb 100644 --- a/Bio/SearchIO/blast.pm +++ b/Bio/SearchIO/blast.pm @@ -209,6 +209,7 @@ BEGIN { 'BlastOutput_program' => 'RESULT-algorithm_name', 'BlastOutput_version' => 'RESULT-algorithm_version', + 'BlastOutput_algorithm-reference' => 'RESULT-algorithm_reference', 'BlastOutput_query-def' => 'RESULT-query_name', 'BlastOutput_query-len' => 'RESULT-query_length', 'BlastOutput_query-acc' => 'RESULT-query_accession', @@ -504,6 +505,26 @@ sub next_result { } ); } + # parse the BLAST algorithm reference + elsif(/^Reference:\s+(.*)$/) { + # want to preserve newlines for the BLAST algorithm reference + my $algorithm_reference = "$1\n"; + $_ = $self->_readline; + # while the current line, does not match an empty line, a RID:, or a Database:, we are still looking at the + # algorithm_reference, append it to what we parsed so far + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { + $algorithm_reference .= "$_"; + $_ = $self->_readline; + } + # if we exited the while loop, we saw an empty line, a RID:, or a Database:, so push it back + $self->_pushback($_); + $self->element( + { + 'Name' => 'BlastOutput_algorithm-reference', + 'Data' => $algorithm_reference + } + ); + } # added Windows workaround for bug 1985 elsif (/^(Searching|Results from round)/) { next unless $1 =~ /Results from round/; I am not sure why reference parsing messes things up. Maybe it eats too many lines from the result file. Yours, -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia From cjfields at illinois.edu Mon May 3 08:08:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 07:08:01 -0500 Subject: [Bioperl-l] BLAST parsing broken In-Reply-To: References: Message-ID: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Odd, I ran tests on that prior to commit. I'll work on fixing that (in svn of course, until the migration is complete). chris On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > Chris, > > latest additions to Bio::SearchIO::blast.pm broke the parsing of normal > blast output. $result->query_name returns now undef. > > (Using the anonymous git now). This change still works: > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > Author: cjfields > Date: Sun Dec 20 04:39:58 2009 +0000 > > Robson's patch for buggy blastpgp output > > But this does not: > > commit 9a89c3434597104dd50553e3562983d78d14a544 > Author: cjfields > Date: Thu Apr 15 04:21:17 2010 +0000 > > [bug 3031] > > patches for catching algorithm ref, courtesy Razi Khaja. > > That makes it easy to find the diffs: > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > index 378023a..6f7eeeb 100644 > --- a/Bio/SearchIO/blast.pm > +++ b/Bio/SearchIO/blast.pm > @@ -209,6 +209,7 @@ BEGIN { > > 'BlastOutput_program' => 'RESULT-algorithm_name', > 'BlastOutput_version' => 'RESULT-algorithm_version', > + 'BlastOutput_algorithm-reference' => 'RESULT-algorithm_reference', > 'BlastOutput_query-def' => 'RESULT-query_name', > 'BlastOutput_query-len' => 'RESULT-query_length', > 'BlastOutput_query-acc' => 'RESULT-query_accession', > @@ -504,6 +505,26 @@ sub next_result { > } > ); > } > + # parse the BLAST algorithm reference > + elsif(/^Reference:\s+(.*)$/) { > + # want to preserve newlines for the BLAST algorithm reference > + my $algorithm_reference = "$1\n"; > + $_ = $self->_readline; > + # while the current line, does not match an empty line, a RID:, > or a Database:, we are still looking at the > + # algorithm_reference, append it to what we parsed so far > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > + $algorithm_reference .= "$_"; > + $_ = $self->_readline; > + } > + # if we exited the while loop, we saw an empty line, a RID:, or > a Database:, so push it back > + $self->_pushback($_); > + $self->element( > + { > + 'Name' => 'BlastOutput_algorithm-reference', > + 'Data' => $algorithm_reference > + } > + ); > + } > # added Windows workaround for bug 1985 > elsif (/^(Searching|Results from round)/) { > next unless $1 =~ /Results from round/; > > > I am not sure why reference parsing messes things up. Maybe it eats too many > lines from the result file. > > Yours, > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon May 3 08:25:10 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 3 May 2010 08:25:10 -0400 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> Message-ID: Hi Chris, I attempted a clone and got the following. Is this my problem? thanks MAJ $ git clone http://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ Getting alternates list for http://github.com/bioperl/bioperl-test.git Getting pack list for http://github.com/bioperl/bioperl-test.git Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, May 03, 2010 12:22 AM Subject: [Bioperl-l] Full bioperl-live github demo > All, > > I have pushed a demo of the bioperl-live (all branches and tags) to github > here: > > http://github.com/bioperl/bioperl-test > > This is separate from the 'bioperl-live' repo at the same github account for > the time being. The conversion was performed using svn2git (the gitorious > C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), > using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and > rerun can be performed very quickly. The actual conversion of the entire > bioperl repo took very little time, actually (less than 3 minutes). I think, > with some additional small work using the svn2git rules pretty much everything > is ready for migration. > > In this run, all subversion tags are converted to git tags (branches remain > git branches as expected). Just in case I'm missing something, I would like > everyone to take a look at this, though. In particular, I would like to make > sure tags and branches are as they are expected. So far I haven't seen > anything that stands out as odd. > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 3 09:07:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 08:07:46 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> Message-ID: <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): cjfields$ git clone git://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ remote: Counting objects: 86737, done. remote: Compressing objects: 100% (22309/22309), done. remote: Total 86737 (delta 64759), reused 85957 (delta 63979) Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. Resolving deltas: 100% (64759/64759), done. For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? chris On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > Hi Chris, > I attempted a clone and got the following. Is this my problem? > thanks MAJ > > $ git clone http://github.com/bioperl/bioperl-test.git > > Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ > Getting alternates list for http://github.com/bioperl/bioperl-test.git > Getting pack list for http://github.com/bioperl/bioperl-test.git > Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c > Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 > Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c > which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f > error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile > fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed > > > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, May 03, 2010 12:22 AM > Subject: [Bioperl-l] Full bioperl-live github demo > > >> All, >> >> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >> >> http://github.com/bioperl/bioperl-test >> >> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >> >> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 3 09:19:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 08:19:17 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <8796492301724F2CA132F97AE57C2700@NewLife> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> Message-ID: Added you in. SSH access should work with any ssh keys you have set in github. We can play around with this for the time being (try post commit hooks, etc), but obviously can't make any serious commits to it until we are ready for complete migration; everything will still need to go to dev svn until then. Also noticed that we are topping the account out at the moment, but removing the old read-only repos should help. May need to think about that in the long-term. chris On May 3, 2010, at 8:13 AM, Mark A. Jensen wrote: > That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with > majensen > cheers Chris- MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Monday, May 03, 2010 9:07 AM > Subject: Re: [Bioperl-l] Full bioperl-live github demo > > > This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): > > cjfields$ git clone git://github.com/bioperl/bioperl-test.git > Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ > remote: Counting objects: 86737, done. > remote: Compressing objects: 100% (22309/22309), done. > remote: Total 86737 (delta 64759), reused 85957 (delta 63979) > Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. > Resolving deltas: 100% (64759/64759), done. > > For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? > > chris > > On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > >> Hi Chris, >> I attempted a clone and got the following. Is this my problem? >> thanks MAJ >> >> $ git clone http://github.com/bioperl/bioperl-test.git >> >> Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ >> Getting alternates list for http://github.com/bioperl/bioperl-test.git >> Getting pack list for http://github.com/bioperl/bioperl-test.git >> Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 >> Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f >> error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile >> fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed >> >> >> ----- Original Message ----- From: "Chris Fields" >> To: "BioPerl List" >> Sent: Monday, May 03, 2010 12:22 AM >> Subject: [Bioperl-l] Full bioperl-live github demo >> >> >>> All, >>> >>> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >>> >>> http://github.com/bioperl/bioperl-test >>> >>> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >>> >>> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From maj at fortinbras.us Mon May 3 09:13:27 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 3 May 2010 09:13:27 -0400 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> Message-ID: <8796492301724F2CA132F97AE57C2700@NewLife> That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with majensen cheers Chris- MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Monday, May 03, 2010 9:07 AM Subject: Re: [Bioperl-l] Full bioperl-live github demo This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): cjfields$ git clone git://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ remote: Counting objects: 86737, done. remote: Compressing objects: 100% (22309/22309), done. remote: Total 86737 (delta 64759), reused 85957 (delta 63979) Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. Resolving deltas: 100% (64759/64759), done. For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? chris On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > Hi Chris, > I attempted a clone and got the following. Is this my problem? > thanks MAJ > > $ git clone http://github.com/bioperl/bioperl-test.git > > Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ > Getting alternates list for http://github.com/bioperl/bioperl-test.git > Getting pack list for http://github.com/bioperl/bioperl-test.git > Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c > Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 > Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c > which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f > error: file > /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack > is not a GIT packfile > fatal: packfile > /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack > cannot be accessed > > > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, May 03, 2010 12:22 AM > Subject: [Bioperl-l] Full bioperl-live github demo > > >> All, >> >> I have pushed a demo of the bioperl-live (all branches and tags) to github >> here: >> >> http://github.com/bioperl/bioperl-test >> >> This is separate from the 'bioperl-live' repo at the same github account for >> the time being. The conversion was performed using svn2git (the gitorious >> C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), >> using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and >> rerun can be performed very quickly. The actual conversion of the entire >> bioperl repo took very little time, actually (less than 3 minutes). I think, >> with some additional small work using the svn2git rules pretty much >> everything is ready for migration. >> >> In this run, all subversion tags are converted to git tags (branches remain >> git branches as expected). Just in case I'm missing something, I would like >> everyone to take a look at this, though. In particular, I would like to make >> sure tags and branches are as they are expected. So far I haven't seen >> anything that stands out as odd. >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 3 10:04:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 09:04:16 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <8796492301724F2CA132F97AE57C2700@NewLife> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> Message-ID: <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> I like this: http://github.com/bioperl/bioperl-test/graphs/impact Kinda cool yet scary. chris On May 3, 2010, at 8:13 AM, Mark A. Jensen wrote: > That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with > majensen > cheers Chris- MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Monday, May 03, 2010 9:07 AM > Subject: Re: [Bioperl-l] Full bioperl-live github demo > > > This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): > > cjfields$ git clone git://github.com/bioperl/bioperl-test.git > Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ > remote: Counting objects: 86737, done. > remote: Compressing objects: 100% (22309/22309), done. > remote: Total 86737 (delta 64759), reused 85957 (delta 63979) > Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. > Resolving deltas: 100% (64759/64759), done. > > For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? > > chris > > On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > >> Hi Chris, >> I attempted a clone and got the following. Is this my problem? >> thanks MAJ >> >> $ git clone http://github.com/bioperl/bioperl-test.git >> >> Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ >> Getting alternates list for http://github.com/bioperl/bioperl-test.git >> Getting pack list for http://github.com/bioperl/bioperl-test.git >> Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 >> Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f >> error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile >> fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed >> >> >> ----- Original Message ----- From: "Chris Fields" >> To: "BioPerl List" >> Sent: Monday, May 03, 2010 12:22 AM >> Subject: [Bioperl-l] Full bioperl-live github demo >> >> >>> All, >>> >>> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >>> >>> http://github.com/bioperl/bioperl-test >>> >>> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >>> >>> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mnrusimh at gmail.com Mon May 3 18:42:41 2010 From: mnrusimh at gmail.com (Ram Podicheti) Date: Mon, 03 May 2010 18:42:41 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID Message-ID: <4BDF5161.4030209@gmail.com> Is there a way to obtain the Ensembl Gene ID from an Entrez Gene ID? In other words, I am hoping to get 'ENSMUSG00000029372' as the output when I supply 57349. Many thanks, Ram Podicheti From sdavis2 at mail.nih.gov Mon May 3 19:14:58 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 3 May 2010 19:14:58 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID In-Reply-To: <4BDF5161.4030209@gmail.com> References: <4BDF5161.4030209@gmail.com> Message-ID: On Mon, May 3, 2010 at 6:42 PM, Ram Podicheti wrote: > Is there a way to obtain the Ensembl Gene ID from an Entrez Gene ID? In > other words, I am hoping to get 'ENSMUSG00000029372' as the output when > I supply 57349. > Check out the Biomart interface to Ensembl. You can supply any type of ID as a filter and get back gene information, including the ID, that map to that ID. I believe there is a perl interface to biomart, but I haven't used it to comment directly. There is also an R/Bioconductor interface. Sean From mnrusimh at gmail.com Mon May 3 20:42:49 2010 From: mnrusimh at gmail.com (Ram Podicheti) Date: Mon, 03 May 2010 20:42:49 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID In-Reply-To: References: <4BDF5161.4030209@gmail.com> Message-ID: <4BDF6D89.2000408@gmail.com> Thanks Sean, that definitely helped. Ram Sean Davis wrote: > > > On Mon, May 3, 2010 at 6:42 PM, Ram Podicheti > wrote: > > Is there a way to obtain the Ensembl Gene ID from an Entrez Gene > ID? In > other words, I am hoping to get 'ENSMUSG00000029372' as the output > when > I supply 57349. > > > Check out the Biomart interface to Ensembl. You can supply any type > of ID as a filter and get back gene information, including the ID, > that map to that ID. I believe there is a perl interface to biomart, > but I haven't used it to comment directly. There is also an > R/Bioconductor interface. > > Sean > From razi.khaja at gmail.com Tue May 4 13:55:00 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Tue, 4 May 2010 13:55:00 -0400 Subject: [Bioperl-l] BLAST parsing broken In-Reply-To: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: That is odd. Heikki, do you have a blast output file that produces this error? Could you attach the file and either send to the list or myself (if the list does not accept attachments). Thanks, Razi On Mon, May 3, 2010 at 8:08 AM, Chris Fields wrote: > Odd, I ran tests on that prior to commit. I'll work on fixing that (in svn > of course, until the migration is complete). > > chris > > On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > > > Chris, > > > > latest additions to Bio::SearchIO::blast.pm broke the parsing of normal > > blast output. $result->query_name returns now undef. > > > > (Using the anonymous git now). This change still works: > > > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > Author: cjfields > > Date: Sun Dec 20 04:39:58 2009 +0000 > > > > Robson's patch for buggy blastpgp output > > > > But this does not: > > > > commit 9a89c3434597104dd50553e3562983d78d14a544 > > Author: cjfields > > Date: Thu Apr 15 04:21:17 2010 +0000 > > > > [bug 3031] > > > > patches for catching algorithm ref, courtesy Razi Khaja. > > > > That makes it easy to find the diffs: > > > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > > index 378023a..6f7eeeb 100644 > > --- a/Bio/SearchIO/blast.pm > > +++ b/Bio/SearchIO/blast.pm > > @@ -209,6 +209,7 @@ BEGIN { > > > > 'BlastOutput_program' => 'RESULT-algorithm_name', > > 'BlastOutput_version' => 'RESULT-algorithm_version', > > + 'BlastOutput_algorithm-reference' => > 'RESULT-algorithm_reference', > > 'BlastOutput_query-def' => 'RESULT-query_name', > > 'BlastOutput_query-len' => 'RESULT-query_length', > > 'BlastOutput_query-acc' => 'RESULT-query_accession', > > @@ -504,6 +505,26 @@ sub next_result { > > } > > ); > > } > > + # parse the BLAST algorithm reference > > + elsif(/^Reference:\s+(.*)$/) { > > + # want to preserve newlines for the BLAST algorithm > reference > > + my $algorithm_reference = "$1\n"; > > + $_ = $self->_readline; > > + # while the current line, does not match an empty line, a > RID:, > > or a Database:, we are still looking at the > > + # algorithm_reference, append it to what we parsed so far > > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > > + $algorithm_reference .= "$_"; > > + $_ = $self->_readline; > > + } > > + # if we exited the while loop, we saw an empty line, a RID:, > or > > a Database:, so push it back > > + $self->_pushback($_); > > + $self->element( > > + { > > + 'Name' => 'BlastOutput_algorithm-reference', > > + 'Data' => $algorithm_reference > > + } > > + ); > > + } > > # added Windows workaround for bug 1985 > > elsif (/^(Searching|Results from round)/) { > > next unless $1 =~ /Results from round/; > > > > > > I am not sure why reference parsing messes things up. Maybe it eats too > many > > lines from the result file. > > > > Yours, > > > > -Heikki > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > #4216 > > 4700 King Abdullah University of Science and Technology (KAUST) > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue May 4 14:18:02 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 4 May 2010 14:18:02 -0400 Subject: [Bioperl-l] parsing GenBank file Message-ID: Hi All, i have a huge GenBank file ( downloaded from RDP containing all bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). I wrote a simple script for this: #!/usr/bin/perl -w use Bio::SeqIO; my $seqio_object = Bio::SeqIO->new(-file => "$ARGV[0]"); while(my $seq_object = $seqio_object->next_seq){ my $id = $seq_object->id; print "$id\t"; my $species_object = $seq_object->species; my @classification = $seq_object->species->classification; foreach my $val (@classification){print "$val\t";} print "\n"; } I am getting the output like: S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root S000148973 uncultured Geothrix sp. Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root S000431649 uncultured Acidobacteria bacterium Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root .. .. This is the exact output i want, but i am missing lot of records (they are there in the genbank file but not in my output). I also got a warning during parsing: --------------------- WARNING --------------------- MSG: Unbalanced quote in: /db_xref="taxon:35783" /germline" /mol_type="genomic DNA" /organism="Enterococcus sp." /strain="LMG12316"No further qualifiers will be added for this feature --------------------------------------------------- So i was just wondering that is this warning message causing that problem or i am doing something wrong? Thanks Shalabh From jay at jays.net Tue May 4 23:30:25 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 4 May 2010 22:30:25 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? Message-ID: $work[0] wants me to fire up Buildbot + Smolder to know when and who broke our tests, and how quickly (or not) our test count is growing over time. Then #moose asked me if I could also host the same for Moose and Class::MOP. And $work[1] uses the heck out of BioPerl. So I'm wondering if I can leverage all my synergies somehow and also host for BioPerl. http://buildbot.net/trac http://sourceforge.net/projects/smolder/ Has anything happened since this 2008 thread?: Subject: Test coverage for BioPerl now available http://article.gmane.org/gmane.comp.lang.perl.bio.general/17731/match=smolder If this would be a Good Thing for BioPerl I could try to try... :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Wed May 5 00:24:51 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 4 May 2010 23:24:51 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: Message-ID: On May 4, 2010, at 10:30 PM, Jay Hannah wrote: > http://sourceforge.net/projects/smolder/ Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) http://search.cpan.org/perldoc?Smolder http://github.com/mpeters/smolder Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From dimitark at bii.a-star.edu.sg Wed May 5 02:58:21 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Wed, 05 May 2010 14:58:21 +0800 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> References: <4BDA986D.3020302@bii.a-star.edu.sg> <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> Message-ID: <4BE1170D.8040108@bii.a-star.edu.sg> Hi Dave, thank you for the tip. Now it works like a charm :) Greetings Dimitar On 05/02/2010 05:59 PM, Dave Messina wrote: > Hi Dimitar, > > The syntax you want is: > > # Build a Genewise alignment factory > my $factory = Bio::Tools::Run::Genewise->new(); > > # turn on the quiet switch > $factory->QUIET(1); > > # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects > my @genes = $factory->run($protein_seq, $genomic_seq); > > > This turns out be incorrectly documented on the man page, at least in part: > >> Available Params: >> >> NB: These should be passed without the '-' or they will be ignored, >> except switches such as 'hmmer' (which have no corresponding value) >> which should be set on the factory object using the AUTOLOADed methods >> of the same name. >> >> Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] >> Alg [-kbyte,-alg] >> HMM [-hmmer] >> Output [-gff,-gener,-alb,-pal,-block,-divide] >> Standard [-help,-version,-silent,-quiet,-errorlog] >> > > That is, these don't work as expected: > > $factory->quiet; > $factory->quiet(1); > > due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. > > And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. > > > So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. > > > Dave > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore email: dimitark at bii.a-star.edu.sg tel: +65 6478 8514 From dimitark at bii.a-star.edu.sg Wed May 5 03:06:04 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Wed, 05 May 2010 15:06:04 +0800 Subject: [Bioperl-l] about gene "boundaries" In-Reply-To: References: <4BD8357B.5030804@bii.a-star.edu.sg> <24714E9B-B3E5-4703-92F8-64483FA59AFC@illinois.edu> <4BD90F94.4040608@bii.a-star.edu.sg> Message-ID: <4BE118DC.7000806@bii.a-star.edu.sg> Hi Malcolm, thank you very much for that information. Didnt even know such program existed :) I now use 'blastdbcmd' for extraction of DNA sequence from my DB. I only had to reformat my DB with 'parse seqids' parameter in order to be able to give the 'entry' parameter to 'blastdbcmd'. Now my script is working. Thanx again. Cheers Dimitar On 04/30/2010 10:16 PM, Cook, Malcolm wrote: > Dimitar, > > Since you have indexed your database with makeblastdb, you might simply use `blastdbcmd` to extract, in fasta format, sub-sequences from the indexed database using identifiers and integer ranges > > blastdbcmd is included in the blast+ suite of programs, which also included makeblastdb which you report you have running. > > see: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/user_maual.pdf > > I've not (yet) used the blast+ suite (still using the old blast) so I've not tested this myself yet, but I think something like the following will work for you: > > blastdbcmd -db yourBlastDatabase -entry chr2 -range 100-300 -outformat fasta > > will extract chr2:100-300 from yourBlastDatabase > > Good Luck > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dimitar Kenanov > Sent: Wednesday, April 28, 2010 11:48 PM > To: Chris Fields; bioperl-l at bioperl.org; scott at scottcain.net; hrh at fmi.ch > Subject: Re: [Bioperl-l] about gene "boundaries" > > Hi guys, > today with rested head and after some reading i found the solution to my problem in BioPerl. Its Bio::DB::Fasta. It does what i want sufficiently well. > Thank you again for the help and im sorry for the trouble caused. > > Cheers > Dimitar > > On 04/28/2010 11:10 PM, Chris Fields wrote: > >> By local DB, do you mean a BioPerl-based local DB? Or is it something else? This is a bit vague. >> >> On the BioPerl side I suggest looking into Bio::DB::SeqFeature::Store for storing and querying genome information (it does exactly what you want if the proper information is loaded), or maybe the Ensembl Perl API, which can be used with a local or remote Ensembl setup. Beyond that you'll need to be more specific. >> >> chris >> >> On Apr 28, 2010, at 8:17 AM, Dimitar Kenanov wrote: >> >> >> >>> Hello guys, >>> i have a question about gene "boundaries". Is there some module in BioPerl which can help me extract the DNA sequence from a genomic DB (from specific chromosome). I have my human genome in a local DB and some "from-to" data sets corresponding to different chromosomes. So i want to get the DNA seqs for these from-to's. I know i can do that the normal way but if there is a way to do it with BioPerl it will be more consistent with the rest of the code. >>> >>> Thanks for any tips :) >>> >>> Cheers >>> Dimitar >>> >>> -- >>> Dimitar Kenanov >>> Postdoctoral research fellow >>> Protein Sequence Analysis Group >>> Bioinformatics Institute >>> A*STAR, Singapore >>> email: dimitark at bii.a-star.edu.sg >>> tel: +65 6478 8514 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > email: dimitark at bii.a-star.edu.sg > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore email: dimitark at bii.a-star.edu.sg tel: +65 6478 8514 From David.Messina at sbc.su.se Wed May 5 03:46:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 09:46:17 +0200 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <4BE1170D.8040108@bii.a-star.edu.sg> References: <4BDA986D.3020302@bii.a-star.edu.sg> <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> <4BE1170D.8040108@bii.a-star.edu.sg> Message-ID: <9F2DC6C9-7707-4C4A-8DE1-0B37387F7F8A@sbc.su.se> Great, glad to hear that. Thanks for letting us know about the problem! Dave On May 5, 2010, at 8:58, Dimitar Kenanov wrote: > Hi Dave, > thank you for the tip. Now it works like a charm :) > > Greetings > Dimitar > > > On 05/02/2010 05:59 PM, Dave Messina wrote: >> Hi Dimitar, >> >> The syntax you want is: >> >> # Build a Genewise alignment factory >> my $factory = Bio::Tools::Run::Genewise->new(); >> >> # turn on the quiet switch >> $factory->QUIET(1); >> >> # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects >> my @genes = $factory->run($protein_seq, $genomic_seq); >> >> >> This turns out be incorrectly documented on the man page, at least in part: >> >>> Available Params: >>> >>> NB: These should be passed without the '-' or they will be ignored, >>> except switches such as 'hmmer' (which have no corresponding value) >>> which should be set on the factory object using the AUTOLOADed methods >>> of the same name. >>> >>> Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] >>> Alg [-kbyte,-alg] >>> HMM [-hmmer] >>> Output [-gff,-gener,-alb,-pal,-block,-divide] >>> Standard [-help,-version,-silent,-quiet,-errorlog] >>> >> >> That is, these don't work as expected: >> >> $factory->quiet; >> $factory->quiet(1); >> >> due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. >> >> And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. >> >> >> So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. >> >> >> Dave >> >> > > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > email: dimitark at bii.a-star.edu.sg > tel: +65 6478 8514 > From torsten.seemann at infotech.monash.edu.au Wed May 5 03:48:55 2010 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 5 May 2010 17:48:55 +1000 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: > ? ? ?i have a huge GenBank file ( downloaded from RDP containing all > bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). > I am getting the output like: > S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > Holophagales Holophagae "Acidobacteria" Bacteria Root > This is the exact output i want, but i am missing lot of records (they are > there in the genbank file but not in my output). > I also got a warning during parsing: > --------------------- WARNING --------------------- > MSG: Unbalanced quote in: > /db_xref="taxon:35783" /germline" > /mol_type="genomic DNA" > /organism="Enterococcus sp." > /strain="LMG12316"No further qualifiers will be added for this feature > --------------------------------------------------- > So i was just wondering that is this warning message causing that problem or > i am doing something wrong? "Unbalanced quote" means there is not an even number (multiple of 2) double-quote (") symbols around the tag's value. I can see that the first line (below) looks problematic: YOU HAVE: /db_xref="taxon:35783" /germline" SHOULD BE: /db_xref="taxon:35783" /germline I suspect there is a problem either with RDP's genbank producer, or Bioperl is having problem with the "germline" qualifier which is a 'null valued' qualifier like /pseudo - it takes no ="value" string. (I think in Bioperl this is handled by setting the value to "_no_value" ?) http://www.ncbi.nlm.nih.gov/collab/FT/ Qualifier /germline Definition the sequence presented in the entry has not undergone somatic rearrangement as part of an adaptive immune response; it is the unrearranged sequence that was inherited from the parental germline Value format none Example /germline Comment /germline should not be used to indicate that the source of the sequence is a gamete or germ cell; /germline and /rearranged cannot be used in the same source feature; /germline and /rearranged should only be used for molecules that can undergo somatic rearrangements as part of an adaptive immune response; these are the T-cell receptor (TCR) and immunoglobulin loci in the jawed vertebrates, and the unrelated variable lymphocyte receptor (VLR) locus in the jawless fish (lampreys and hagfish); /germline and /rearranged should not be used outside of the Craniata (taxid=89593) --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From cjfields at illinois.edu Wed May 5 08:12:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 07:12:30 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: Message-ID: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> On May 4, 2010, at 11:24 PM, Jay Hannah wrote: > On May 4, 2010, at 10:30 PM, Jay Hannah wrote: >> http://sourceforge.net/projects/smolder/ > > Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) > > http://search.cpan.org/perldoc?Smolder > http://github.com/mpeters/smolder > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? chris From cjfields at illinois.edu Wed May 5 08:30:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 07:30:30 -0500 Subject: [Bioperl-l] using default string values for undef/empty, was Re: parsing GenBank file In-Reply-To: References: Message-ID: On May 5, 2010, at 2:48 AM, Torsten Seemann wrote: >> i have a huge GenBank file ( downloaded from RDP containing all >> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). >> I am getting the output like: >> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >> Holophagales Holophagae "Acidobacteria" Bacteria Root >> This is the exact output i want, but i am missing lot of records (they are >> there in the genbank file but not in my output). >> I also got a warning during parsing: >> --------------------- WARNING --------------------- >> MSG: Unbalanced quote in: >> /db_xref="taxon:35783" /germline" >> /mol_type="genomic DNA" >> /organism="Enterococcus sp." >> /strain="LMG12316"No further qualifiers will be added for this feature >> --------------------------------------------------- >> So i was just wondering that is this warning message causing that problem or >> i am doing something wrong? > > "Unbalanced quote" means there is not an even number (multiple of 2) > double-quote (") symbols around the tag's value. I can see that the > first line (below) looks problematic: > > YOU HAVE: > > /db_xref="taxon:35783" /germline" > > SHOULD BE: > > /db_xref="taxon:35783" > /germline > > I suspect there is a problem either with RDP's genbank producer, or > Bioperl is having problem with the "germline" qualifier which is a > 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > think in Bioperl this is handled by setting the value to "_no_value" > ?) > ... > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA Ugh, didn't notice the '_no_value' bit. Probably my opinion, but I don't like stubs like that as they tend to be brittle and run into issues (like this one, for instance). I would prefer we just leave that as undef and only quote defined values (with the exceptions in %FTQUAL_NO_QUOTE). Any reason for this behavior (is it related to ORM-related stuff like bioperl-db)? Can we change that to something a bit more realistic? chris From David.Messina at sbc.su.se Wed May 5 09:00:39 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 15:00:39 +0200 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> Message-ID: <252790EC-6A2D-4DFA-B2A0-8D0F8E169E30@sbc.su.se> Yeah, absolutely, Jay! it would be wonderful to have this for BioPerl. Dave On May 5, 2010, at 14:12, Chris Fields wrote: > On May 4, 2010, at 11:24 PM, Jay Hannah wrote: > >> On May 4, 2010, at 10:30 PM, Jay Hannah wrote: >>> http://sourceforge.net/projects/smolder/ >> >> Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) >> >> http://search.cpan.org/perldoc?Smolder >> http://github.com/mpeters/smolder >> >> Jay Hannah >> http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? > > chris From cjfields at illinois.edu Wed May 5 10:46:23 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 09:46:23 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub Message-ID: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> All, I would like to finalize moving over to git/github very soon. We're sort of in limbo on this, so it needs to progress forward. We'll need to do some initial cleanup after the move (Heikki is already doing a few things on the test repo, which we'll need to diff over to the new one). So with that in mind, here are my thoughts. This is copied over to this wiki page, in case you don't want to reply here: http://www.bioperl.org/wiki/From_SVN_to_Git (thanks Mark!) 1) Timeline When? Sooner the better (weeks as opposed to months). Our anon. svn is down, likely permanently (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). 2) Migration strategy Now mainly worked out using svn2git, which is very fast. We would need to make the svn repo on dev read-only during this transition. My guess is it would take very little time. Do we want to retain the git-SVN metadata on commits? This is viewable with our current read-only mirror on github: http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca 3) Developers Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly The current authors file used for mapping commit authors to emails used their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I think, once one has signed up with github, you can add that same address to your current ones, and it should map to your github account. If we use dev.open-bio.org as our central git repo, we won't need to go through with that, but we will need a viewable version of dev available somehow (mirrored on github or otherwise). Speaking of... 4) Development strategy Are we sticking with a single centralized repo (SVN-like)? Will that be github, or will github be a downstream repo to our work on dev? We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). Git makes it very easy to make branches and merge in code to trunk. With that in mind, I would highly suggest we start working on branches for almost everything and merge over to trunk. There is very little to no overhead in doing so with git. I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model Also, several points were raised in a related project (Parrot) considering a move to git/github from svn. One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. 5) Encouraging outside contributors Do we want to adopt a policy similar to Moose? http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod This is easy with github and forks. 6) SVN Read/Write to GitHub It was recently announced that one can access a github repo using subversion as read-only, and just yesterday experimental write to github is allowed: http://github.com/blog/644-subversion-write-support I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? 7) Others? chris From shalabh.sharma7 at gmail.com Wed May 5 10:46:19 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 10:46:19 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: Hi Torsten, Thanks for pointing that out. But this is just a warning, it will not break the script. i found the the point where script is breaking. Its breaking and giving this message: Can't call method "classification" on an undefined value at parseGB.pl line 9, line 10067733. So the script is breaking when its coming to this record: LOCUS S001198291 1521 bp rRNA linear BCT 02-Feb-2009 DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2. ACCESSION AP010656 REGION: 61786..63306 PROJECT GenomeProject:29025 SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". REFERENCE 1 (bases 1 to 1521) AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; TITLE ; JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. Contact:Atsushi Toyoda National Institute of Genetics, Comparative Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, Japan REFERENCE 2 AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor T.D., Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; It is unable to parse this record, but i don't understand why it is doing so? The only reason i can think of is the organism's name which is very long as compared to others. Thanks Shalabh On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < torsten.seemann at infotech.monash.edu.au> wrote: > > i have a huge GenBank file ( downloaded from RDP containing all > > bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > linage (in ORGANISM). > > I am getting the output like: > > S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > > Holophagales Holophagae "Acidobacteria" Bacteria Root > > This is the exact output i want, but i am missing lot of records (they > are > > there in the genbank file but not in my output). > > I also got a warning during parsing: > > --------------------- WARNING --------------------- > > MSG: Unbalanced quote in: > > /db_xref="taxon:35783" /germline" > > /mol_type="genomic DNA" > > /organism="Enterococcus sp." > > /strain="LMG12316"No further qualifiers will be added for this feature > > --------------------------------------------------- > > So i was just wondering that is this warning message causing that problem > or > > i am doing something wrong? > > "Unbalanced quote" means there is not an even number (multiple of 2) > double-quote (") symbols around the tag's value. I can see that the > first line (below) looks problematic: > > YOU HAVE: > > /db_xref="taxon:35783" /germline" > > SHOULD BE: > > /db_xref="taxon:35783" > /germline > > I suspect there is a problem either with RDP's genbank producer, or > Bioperl is having problem with the "germline" qualifier which is a > 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > think in Bioperl this is handled by setting the value to "_no_value" > ?) > > http://www.ncbi.nlm.nih.gov/collab/FT/ > > Qualifier /germline > Definition the sequence presented in the entry has not undergone > somatic > rearrangement as part of an adaptive immune response; it is > the > unrearranged sequence that was inherited from the parental > germline > Value format none > Example /germline > Comment /germline should not be used to indicate that the source of > the sequence is a gamete or germ cell; > /germline and /rearranged cannot be used in the same source > feature; > /germline and /rearranged should only be used for molecules > that > can undergo somatic rearrangements as part of an > adaptive immune > response; these are the T-cell receptor (TCR) and > immunoglobulin > loci in the jawed vertebrates, and the unrelated variable > lymphocyte receptor (VLR) locus in the jawless fish > (lampreys > and hagfish); > /germline and /rearranged should not be used outside of the > Craniata (taxid=89593) > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > From cjfields at illinois.edu Wed May 5 11:32:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 10:32:41 -0500 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Shalabh, What is the source of this file? It's not from GenBank; if I look up the parent sequence using Bio::DB::GenBank it works fine: use Modern::Perl; use Bio::DB::GenBank; my $id = 'AP010656'; my $gb = Bio::DB::GenBank->new(); my $seq = $gb->get_Seq_by_acc($id); say join(',',$seq->species->classification); chris On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > Hi Torsten, > Thanks for pointing that out. But this is just a warning, > it will not break the script. i found the the point where script is > breaking. > Its breaking and giving this message: > Can't call method "classification" on an undefined value at parseGB.pl line > 9, line 10067733. > > So the script is breaking when its coming to this record: > > LOCUS S001198291 1521 bp rRNA linear BCT > 02-Feb-2009 > DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2. > ACCESSION AP010656 REGION: 61786..63306 > PROJECT GenomeProject:29025 > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > "Bacteroidales"; > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > REFERENCE 1 (bases 1 to 1521) > AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; > TITLE ; > JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > Contact:Atsushi Toyoda National Institute of Genetics, > Comparative > Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > Japan > REFERENCE 2 > AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > T.D., > Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > > It is unable to parse this record, but i don't understand why it is doing > so? The only reason i can think of is the organism's name which is very long > as compared to others. > > Thanks > Shalabh > > > > On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > torsten.seemann at infotech.monash.edu.au> wrote: > >>> i have a huge GenBank file ( downloaded from RDP containing all >>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's >> linage (in ORGANISM). >>> I am getting the output like: >>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >>> Holophagales Holophagae "Acidobacteria" Bacteria Root >>> This is the exact output i want, but i am missing lot of records (they >> are >>> there in the genbank file but not in my output). >>> I also got a warning during parsing: >>> --------------------- WARNING --------------------- >>> MSG: Unbalanced quote in: >>> /db_xref="taxon:35783" /germline" >>> /mol_type="genomic DNA" >>> /organism="Enterococcus sp." >>> /strain="LMG12316"No further qualifiers will be added for this feature >>> --------------------------------------------------- >>> So i was just wondering that is this warning message causing that problem >> or >>> i am doing something wrong? >> >> "Unbalanced quote" means there is not an even number (multiple of 2) >> double-quote (") symbols around the tag's value. I can see that the >> first line (below) looks problematic: >> >> YOU HAVE: >> >> /db_xref="taxon:35783" /germline" >> >> SHOULD BE: >> >> /db_xref="taxon:35783" >> /germline >> >> I suspect there is a problem either with RDP's genbank producer, or >> Bioperl is having problem with the "germline" qualifier which is a >> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I >> think in Bioperl this is handled by setting the value to "_no_value" >> ?) >> >> http://www.ncbi.nlm.nih.gov/collab/FT/ >> >> Qualifier /germline >> Definition the sequence presented in the entry has not undergone >> somatic >> rearrangement as part of an adaptive immune response; it is >> the >> unrearranged sequence that was inherited from the parental >> germline >> Value format none >> Example /germline >> Comment /germline should not be used to indicate that the source of >> the sequence is a gamete or germ cell; >> /germline and /rearranged cannot be used in the same source >> feature; >> /germline and /rearranged should only be used for molecules >> that >> can undergo somatic rearrangements as part of an >> adaptive immune >> response; these are the T-cell receptor (TCR) and >> immunoglobulin >> loci in the jawed vertebrates, and the unrelated variable >> lymphocyte receptor (VLR) locus in the jawless fish >> (lampreys >> and hagfish); >> /germline and /rearranged should not be used outside of the >> Craniata (taxid=89593) >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed May 5 11:38:11 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 11:38:11 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Message-ID: Hi Chris, I downloaded this file from RDP, it contain all bacterial 16s. Thanks Shalabh On Wed, May 5, 2010 at 11:32 AM, Chris Fields wrote: > Shalabh, > > What is the source of this file? It's not from GenBank; if I look up the > parent sequence using Bio::DB::GenBank it works fine: > > use Modern::Perl; > use Bio::DB::GenBank; > > my $id = 'AP010656'; > > my $gb = Bio::DB::GenBank->new(); > > my $seq = $gb->get_Seq_by_acc($id); > > say join(',',$seq->species->classification); > > chris > > On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > > > Hi Torsten, > > Thanks for pointing that out. But this is just a warning, > > it will not break the script. i found the the point where script is > > breaking. > > Its breaking and giving this message: > > Can't call method "classification" on an undefined value at parseGB.pl > line > > 9, line 10067733. > > > > So the script is breaking when its coming to this record: > > > > LOCUS S001198291 1521 bp rRNA linear BCT > > 02-Feb-2009 > > DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. > CFP2. > > ACCESSION AP010656 REGION: 61786..63306 > > PROJECT GenomeProject:29025 > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > > "Bacteroidales"; > > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > > REFERENCE 1 (bases 1 to 1521) > > AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; > > TITLE ; > > JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > > Contact:Atsushi Toyoda National Institute of Genetics, > > Comparative > > Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > > Japan > > REFERENCE 2 > > AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > > T.D., > > Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > > > > It is unable to parse this record, but i don't understand why it is doing > > so? The only reason i can think of is the organism's name which is very > long > > as compared to others. > > > > Thanks > > Shalabh > > > > > > > > On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > > torsten.seemann at infotech.monash.edu.au> wrote: > > > >>> i have a huge GenBank file ( downloaded from RDP containing all > >>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > >> linage (in ORGANISM). > >>> I am getting the output like: > >>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > >>> Holophagales Holophagae "Acidobacteria" Bacteria Root > >>> This is the exact output i want, but i am missing lot of records (they > >> are > >>> there in the genbank file but not in my output). > >>> I also got a warning during parsing: > >>> --------------------- WARNING --------------------- > >>> MSG: Unbalanced quote in: > >>> /db_xref="taxon:35783" /germline" > >>> /mol_type="genomic DNA" > >>> /organism="Enterococcus sp." > >>> /strain="LMG12316"No further qualifiers will be added for this feature > >>> --------------------------------------------------- > >>> So i was just wondering that is this warning message causing that > problem > >> or > >>> i am doing something wrong? > >> > >> "Unbalanced quote" means there is not an even number (multiple of 2) > >> double-quote (") symbols around the tag's value. I can see that the > >> first line (below) looks problematic: > >> > >> YOU HAVE: > >> > >> /db_xref="taxon:35783" /germline" > >> > >> SHOULD BE: > >> > >> /db_xref="taxon:35783" > >> /germline > >> > >> I suspect there is a problem either with RDP's genbank producer, or > >> Bioperl is having problem with the "germline" qualifier which is a > >> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > >> think in Bioperl this is handled by setting the value to "_no_value" > >> ?) > >> > >> http://www.ncbi.nlm.nih.gov/collab/FT/ > >> > >> Qualifier /germline > >> Definition the sequence presented in the entry has not undergone > >> somatic > >> rearrangement as part of an adaptive immune response; it is > >> the > >> unrearranged sequence that was inherited from the parental > >> germline > >> Value format none > >> Example /germline > >> Comment /germline should not be used to indicate that the source > of > >> the sequence is a gamete or germ cell; > >> /germline and /rearranged cannot be used in the same source > >> feature; > >> /germline and /rearranged should only be used for molecules > >> that > >> can undergo somatic rearrangements as part of an > >> adaptive immune > >> response; these are the T-cell receptor (TCR) and > >> immunoglobulin > >> loci in the jawed vertebrates, and the unrelated variable > >> lymphocyte receptor (VLR) locus in the jawless fish > >> (lampreys > >> and hagfish); > >> /germline and /rearranged should not be used outside of the > >> Craniata (taxid=89593) > >> > >> > >> --Torsten Seemann > >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > >> University, AUSTRALIA > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed May 5 12:01:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 11:01:55 -0500 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Message-ID: <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> Shalabh, There are several problems with this file that make it somewhat problematic and somewhat non-GenBank like. It does parse (it has seq data) but doesn't catch the SOURCE/ORGANISM b/c of the somewhat non-canonical way of displaying the classification: SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". It's different enough from the NCBI version (from here: http://www.ncbi.nlm.nih.gov/nuccore/212548595) that it's probably breaking the parser: SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Bacteria; Bacteroidetes; Bacteroidia; Bacteroidales; Candidatus Azobacteroides. Please file this as a bug, we can take a look at it. It's a bit non-standard so I can't promise it'll be fixed unless it's fairly easy to do. chris On May 5, 2010, at 10:38 AM, shalabh sharma wrote: > Hi Chris, > I downloaded this file from RDP, it contain all bacterial 16s. > > Thanks > Shalabh > > > On Wed, May 5, 2010 at 11:32 AM, Chris Fields wrote: > >> Shalabh, >> >> What is the source of this file? It's not from GenBank; if I look up the >> parent sequence using Bio::DB::GenBank it works fine: >> >> use Modern::Perl; >> use Bio::DB::GenBank; >> >> my $id = 'AP010656'; >> >> my $gb = Bio::DB::GenBank->new(); >> >> my $seq = $gb->get_Seq_by_acc($id); >> >> say join(',',$seq->species->classification); >> >> chris >> >> On May 5, 2010, at 9:46 AM, shalabh sharma wrote: >> >>> Hi Torsten, >>> Thanks for pointing that out. But this is just a warning, >>> it will not break the script. i found the the point where script is >>> breaking. >>> Its breaking and giving this message: >>> Can't call method "classification" on an undefined value at parseGB.pl >> line >>> 9, line 10067733. >>> >>> So the script is breaking when its coming to this record: >>> >>> LOCUS S001198291 1521 bp rRNA linear BCT >>> 02-Feb-2009 >>> DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. >> CFP2. >>> ACCESSION AP010656 REGION: 61786..63306 >>> PROJECT GenomeProject:29025 >>> SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 >>> ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 >>> Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; >>> "Bacteroidales"; >>> "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". >>> REFERENCE 1 (bases 1 to 1521) >>> AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; >>> TITLE ; >>> JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. >>> Contact:Atsushi Toyoda National Institute of Genetics, >>> Comparative >>> Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, >>> Japan >>> REFERENCE 2 >>> AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor >>> T.D., >>> Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; >>> >>> It is unable to parse this record, but i don't understand why it is doing >>> so? The only reason i can think of is the organism's name which is very >> long >>> as compared to others. >>> >>> Thanks >>> Shalabh >>> >>> >>> >>> On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < >>> torsten.seemann at infotech.monash.edu.au> wrote: >>> >>>>> i have a huge GenBank file ( downloaded from RDP containing all >>>>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's >>>> linage (in ORGANISM). >>>>> I am getting the output like: >>>>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >>>>> Holophagales Holophagae "Acidobacteria" Bacteria Root >>>>> This is the exact output i want, but i am missing lot of records (they >>>> are >>>>> there in the genbank file but not in my output). >>>>> I also got a warning during parsing: >>>>> --------------------- WARNING --------------------- >>>>> MSG: Unbalanced quote in: >>>>> /db_xref="taxon:35783" /germline" >>>>> /mol_type="genomic DNA" >>>>> /organism="Enterococcus sp." >>>>> /strain="LMG12316"No further qualifiers will be added for this feature >>>>> --------------------------------------------------- >>>>> So i was just wondering that is this warning message causing that >> problem >>>> or >>>>> i am doing something wrong? >>>> >>>> "Unbalanced quote" means there is not an even number (multiple of 2) >>>> double-quote (") symbols around the tag's value. I can see that the >>>> first line (below) looks problematic: >>>> >>>> YOU HAVE: >>>> >>>> /db_xref="taxon:35783" /germline" >>>> >>>> SHOULD BE: >>>> >>>> /db_xref="taxon:35783" >>>> /germline >>>> >>>> I suspect there is a problem either with RDP's genbank producer, or >>>> Bioperl is having problem with the "germline" qualifier which is a >>>> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I >>>> think in Bioperl this is handled by setting the value to "_no_value" >>>> ?) >>>> >>>> http://www.ncbi.nlm.nih.gov/collab/FT/ >>>> >>>> Qualifier /germline >>>> Definition the sequence presented in the entry has not undergone >>>> somatic >>>> rearrangement as part of an adaptive immune response; it is >>>> the >>>> unrearranged sequence that was inherited from the parental >>>> germline >>>> Value format none >>>> Example /germline >>>> Comment /germline should not be used to indicate that the source >> of >>>> the sequence is a gamete or germ cell; >>>> /germline and /rearranged cannot be used in the same source >>>> feature; >>>> /germline and /rearranged should only be used for molecules >>>> that >>>> can undergo somatic rearrangements as part of an >>>> adaptive immune >>>> response; these are the T-cell receptor (TCR) and >>>> immunoglobulin >>>> loci in the jawed vertebrates, and the unrelated variable >>>> lymphocyte receptor (VLR) locus in the jawless fish >>>> (lampreys >>>> and hagfish); >>>> /germline and /rearranged should not be used outside of the >>>> Craniata (taxid=89593) >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed May 5 12:10:33 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 12:10:33 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> Message-ID: Hi Chris, I will do that, so how i can solve my problem, do you have any suggestion? I am thinking of taking all the accessions from the file i have and use Bio::DB::Genbank to get classification. Thanks shalabh On Wed, May 5, 2010 at 12:01 PM, Chris Fields wrote: > Shalabh, > > There are several problems with this file that make it somewhat problematic > and somewhat non-GenBank like. It does parse (it has seq data) but doesn't > catch the SOURCE/ORGANISM b/c of the somewhat non-canonical way of > displaying the classification: > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > > It's different enough from the NCBI version (from here: > http://www.ncbi.nlm.nih.gov/nuccore/212548595) that it's probably breaking > the parser: > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Bacteria; Bacteroidetes; Bacteroidia; Bacteroidales; Candidatus > Azobacteroides. > > Please file this as a bug, we can take a look at it. It's a bit > non-standard so I can't promise it'll be fixed unless it's fairly easy to > do. > > chris > > On May 5, 2010, at 10:38 AM, shalabh sharma wrote: > > > Hi Chris, > > I downloaded this file from RDP, it contain all bacterial 16s. > > > > Thanks > > Shalabh > > > > > > On Wed, May 5, 2010 at 11:32 AM, Chris Fields > wrote: > > > >> Shalabh, > >> > >> What is the source of this file? It's not from GenBank; if I look up > the > >> parent sequence using Bio::DB::GenBank it works fine: > >> > >> use Modern::Perl; > >> use Bio::DB::GenBank; > >> > >> my $id = 'AP010656'; > >> > >> my $gb = Bio::DB::GenBank->new(); > >> > >> my $seq = $gb->get_Seq_by_acc($id); > >> > >> say join(',',$seq->species->classification); > >> > >> chris > >> > >> On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > >> > >>> Hi Torsten, > >>> Thanks for pointing that out. But this is just a warning, > >>> it will not break the script. i found the the point where script is > >>> breaking. > >>> Its breaking and giving this message: > >>> Can't call method "classification" on an undefined value at parseGB.pl > >> line > >>> 9, line 10067733. > >>> > >>> So the script is breaking when its coming to this record: > >>> > >>> LOCUS S001198291 1521 bp rRNA linear BCT > >>> 02-Feb-2009 > >>> DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. > >> CFP2. > >>> ACCESSION AP010656 REGION: 61786..63306 > >>> PROJECT GenomeProject:29025 > >>> SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. > CFP2 > >>> ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > >>> Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > >>> "Bacteroidales"; > >>> "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > >>> REFERENCE 1 (bases 1 to 1521) > >>> AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki > Y.; > >>> TITLE ; > >>> JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > >>> Contact:Atsushi Toyoda National Institute of Genetics, > >>> Comparative > >>> Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > >>> Japan > >>> REFERENCE 2 > >>> AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > >>> T.D., > >>> Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > >>> > >>> It is unable to parse this record, but i don't understand why it is > doing > >>> so? The only reason i can think of is the organism's name which is very > >> long > >>> as compared to others. > >>> > >>> Thanks > >>> Shalabh > >>> > >>> > >>> > >>> On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > >>> torsten.seemann at infotech.monash.edu.au> wrote: > >>> > >>>>> i have a huge GenBank file ( downloaded from RDP containing all > >>>>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > >>>> linage (in ORGANISM). > >>>>> I am getting the output like: > >>>>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > >>>>> Holophagales Holophagae "Acidobacteria" Bacteria Root > >>>>> This is the exact output i want, but i am missing lot of records > (they > >>>> are > >>>>> there in the genbank file but not in my output). > >>>>> I also got a warning during parsing: > >>>>> --------------------- WARNING --------------------- > >>>>> MSG: Unbalanced quote in: > >>>>> /db_xref="taxon:35783" /germline" > >>>>> /mol_type="genomic DNA" > >>>>> /organism="Enterococcus sp." > >>>>> /strain="LMG12316"No further qualifiers will be added for this > feature > >>>>> --------------------------------------------------- > >>>>> So i was just wondering that is this warning message causing that > >> problem > >>>> or > >>>>> i am doing something wrong? > >>>> > >>>> "Unbalanced quote" means there is not an even number (multiple of 2) > >>>> double-quote (") symbols around the tag's value. I can see that the > >>>> first line (below) looks problematic: > >>>> > >>>> YOU HAVE: > >>>> > >>>> /db_xref="taxon:35783" /germline" > >>>> > >>>> SHOULD BE: > >>>> > >>>> /db_xref="taxon:35783" > >>>> /germline > >>>> > >>>> I suspect there is a problem either with RDP's genbank producer, or > >>>> Bioperl is having problem with the "germline" qualifier which is a > >>>> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > >>>> think in Bioperl this is handled by setting the value to "_no_value" > >>>> ?) > >>>> > >>>> http://www.ncbi.nlm.nih.gov/collab/FT/ > >>>> > >>>> Qualifier /germline > >>>> Definition the sequence presented in the entry has not undergone > >>>> somatic > >>>> rearrangement as part of an adaptive immune response; it is > >>>> the > >>>> unrearranged sequence that was inherited from the parental > >>>> germline > >>>> Value format none > >>>> Example /germline > >>>> Comment /germline should not be used to indicate that the > source > >> of > >>>> the sequence is a gamete or germ cell; > >>>> /germline and /rearranged cannot be used in the same source > >>>> feature; > >>>> /germline and /rearranged should only be used for molecules > >>>> that > >>>> can undergo somatic rearrangements as part of an > >>>> adaptive immune > >>>> response; these are the T-cell receptor (TCR) and > >>>> immunoglobulin > >>>> loci in the jawed vertebrates, and the unrelated variable > >>>> lymphocyte receptor (VLR) locus in the jawless fish > >>>> (lampreys > >>>> and hagfish); > >>>> /germline and /rearranged should not be used outside of the > >>>> Craniata (taxid=89593) > >>>> > >>>> > >>>> --Torsten Seemann > >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > >>>> University, AUSTRALIA > >>>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jay at jays.net Wed May 5 12:28:10 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 11:28:10 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> Message-ID: <512A88E4-85A0-4841-B6A7-9915FE0800BA@jays.net> On May 5, 2010, at 10:59 AM, Jay Hannah wrote: > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah Oops. Should have checked Smolder before sending that email... Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah $ prove -v t/email_signatures.t t/email_signatures.t .. 1..7 ok 1 - $work->[0]->{Outlook} email signatures up to date ok 2 - $work->[0]->{Netmail} email signatures up to date ok 3 - $work->[1]->{Lotus_Notes} email signatures up to date not ok 4 - $home->[0]->{MacMini_Mail.app} email signatures up to date ok 5 - $home->[0]->{MacMini_Entourage.app} email signatures up to date ok 6 - $home->[0]->{laptop_Mail.app} email signatures up to date ok 7 - $home->[0]->{laptop_Entourage.app} email signatures up to date # Failed test '$home->[0]->{MacMini_Mail.app} email signatures up to date' # at t/email_signatures.t line 5. # Looks like you failed 1 test of 7. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/7 subtests Test Summary Report ------------------- t/email_signatures.t (Wstat: 256 Tests: 7 Failed: 1) Failed test: 4 Non-zero exit status: 1 Files=1, Tests=7, 0 wallclock secs ( 0.03 usr 0.01 sys + 0.03 cusr 0.00 csys = 0.07 CPU) Result: FAIL From jay at jays.net Wed May 5 11:59:37 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 10:59:37 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> Message-ID: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> On May 5, 2010, at 7:12 AM, Chris Fields wrote: > I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? I would definitely start with trunk and see how it goes. Last night I tried to smoke all our old $work[0] tags and failed impressively. Our tests were (and probably still are) too reliant on 3rd party black boxes being online and responsive, and servers tend to move and get reconfigured over the years. Presumably BioPerl and Moose and more self-contained (unless external deps are explicitly enabled), so perhaps historical smoking would work fairly well. In Moose land the request is that I smoke not only Moose, but everything on CPAN that *depends on Moose*: export MOOSE_TEST_MD=1; prove xt/test-my-dependents.t Which should be ... educational. :) While exciting, I don't think that concept translates to the BioPerl monolith. If I'm the only one smoking, you'll get a very limited number of architecture + perl version combinations reported. Which begs the question of how to harness a broader tester pool. It's great that 342 systems smoked our latest CPAN upload: http://static.cpantesters.org/distro/B/bioperl.html But the crazy I'm embarking on would mean several smokes each day (every svn/git commit?), compared to the cpantesters who haven't had a new CPAN release to smoke since Sep 2009 (1.6.1). Maybe I'd just do one or two a day or something? Whoever wanted to could report into our central Smolder server using their architectures + perl versions. A volunteer would just install Smolder from CPAN and run this in their bioperl-live directory: prove -I . --recurse --archive test_run.tar.gz smolder_smoke_signal --server smolder.jays.net \ --username MyUserName --password MyPass \ --file test_run.tar.gz --project bioperl-live --tags trunk Deep ponderings, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From David.Messina at sbc.su.se Wed May 5 17:27:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 23:27:24 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> > Do we want to retain the git-SVN metadata on commits? What are the tradeoffs with this? >From the little reading I've done, it seems that space and clutter are the chief drawbacks, but that it's easy to strip this metadata out later. Does that jibe with your impression? > Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly My github account name is: DaveMessina Do I have an @bioperl.org address? I tried sending mail to a few likely permutations without success. In any case, I added dave_messina -at- bioperl.org as an email address on my github account. > Are we sticking with a single centralized repo (SVN-like)? I am a total git novice, but it's my understanding that it's still a good idea, particularly with a big many-author project like BioPerl, to have a primary, official repo. But I'd be interested in hearing more discussion on this. We're at a good place to make large-ish changes to how we do things, I think. > Will that be github, or will github be a downstream repo to our work on dev? My only concern with github being primary is in case something happens to github. Not likely, I know, but it seems prudent to maintain a certain amount of control over our destiny. So I'm inclined to make dev be primary and github downstream, with the assumption that it'd trivial to abandon dev and make github primary in the future if we want. Or would it be enough to auto-mirror to dev.open-bio.org, which could serve as a fallback in case github goes offline, temporarily or permanently? > We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). Are there any git-familiar folks out there who could comment on the pros and cons of this? Perhaps some of the other Bio* projects who have switched to git could advise. Right now, without further technical details, I think it'd be better to have one true primary just because it's less confusing and easier to manage, particularly if we're to follow a model like the one mentioned just below: > I would highly suggest we start working on branches for almost everything and merge over to trunk. > [...] > I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model Yep, that looks good to me, too. > One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. We should try to make sure we have this sorted before going "live". > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? Yes! We want more people to jump in ? one of the benefits of git and github is that they encourage this. > 6) SVN Read/Write to GitHub > > I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? Read-only for sure ? that seems harmless, and we want to give people lots of ways to get BioPerl. Write ? let's play with it a bit, making a few test commits to bioperl-test, and see what happens. It would be nice if we don't force everyone who contributes to BioPerl to have to switch over to git immediately. Me included. :) > 7) Others? What happens when we start splitting up bioperl into separate distros? Do we put them each into a separate repo? Dave From David.Messina at sbc.su.se Wed May 5 17:40:46 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 23:40:46 +0200 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> Message-ID: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> > Presumably BioPerl and Moose and more self-contained (unless external deps are explicitly enabled), so perhaps historical smoking would work fairly well. Very few of BioPerl's tests rely on outside servers, and those that do have to be turned on explicitly with a network-tests flag. So hopefully that won't be an issue. > In Moose land the request is that I smoke not only Moose, but everything on CPAN that *depends on Moose*: > [...] > While exciting, I don't think that concept translates to the BioPerl monolith. Agreed, not really. Except for some of the GMOD stuff. And anyway this could always be done later if desired. Probably much later. :) > Whoever wanted to could report into our central Smolder server using their architectures + perl versions. A volunteer would just install Smolder from CPAN and run this in their bioperl-live directory: > > prove -I . --recurse --archive test_run.tar.gz > smolder_smoke_signal --server smolder.jays.net \ > --username MyUserName --password MyPass \ > --file test_run.tar.gz --project bioperl-live --tags trunk Would the reporter need to have any special setup to do this? Could this kind of reporting be written into the BioPerl Build.PL as a user-settable option (just like the options for installing scripts or running network tests)? If so, then we could get lots of feedback on trunk (master) commits and not just releases. Dave From jason at bioperl.org Wed May 5 18:45:41 2010 From: jason at bioperl.org (Jason Stajich) Date: Wed, 05 May 2010 15:45:41 -0700 Subject: [Bioperl-l] Modules in Bio:Tree In-Reply-To: <4BE1D0E2.9010500@mail.mcgill.ca> References: <4BE1D0E2.9010500@mail.mcgill.ca> Message-ID: <4BE1F515.7090604@bioperl.org> Please use the mailing list for questions. The nodes are objects not strings you print - as it shows in http://bioperl.org/wiki/HOWTO:Trees#Example_Code you access information from them with the object methods like 'id' so print $leaf->id, "\n" would probably accomplish what you are looking for right now. -jason Sudeep Mehrotra wrote, On 5/5/10 1:11 PM: > Hello Jason, > I am using the Bio:Tree modules to get a list of all the leaves in > their respective clusters. I looked at the examples and followed the > functions of various modules but I am not able to get the desired result. > > My input looks as follows: > ((((Candidatus_Korarchaeum)Korarchaeota,((((Cenarchaeum_symbiosum)Cenarchaeum)Cenarchaeaceae)Cenarchaeales,((((Nitrosopumilus_maritimus)Nitrosopumilus)Nitrosopumilaceae)Nitrosopumilales)marine_archaeal_group_1)Thaumarchaeota,(((((Archaeoglobus_fulgidus)Archaeoglobus)Archaeoglobaceae)Archaeoglobales)Archaeoglobi, > > and so on.... > > Code is like this: > $input = new Bio::TreeIO(-file =>"$file1",-format => "newick"); > $tree = $input->next_tree; > @leaves = $tree->get_leaf_nodes(); > foreach $leaf (@leaves) > { > print "$leaf\n"; > } > The ouput I get is: > Bio::Tree::Node=HASH(0xa783e0) > Bio::Tree::Node=HASH(0xa78710) > Bio::Tree::Node=HASH(0xa78ab0) > > Not sure what I am doing wrong. > > Objective is to get a cluster of all the leaves. > > Thanks From florent.angly at gmail.com Wed May 5 20:16:05 2010 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 06 May 2010 10:16:05 +1000 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <4BE20A45.5090206@gmail.com> Hi Chris, On 06/05/10 00:46, Chris Fields wrote: > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I think, once one has signed up with github, you can add that same address to your current ones, and it should map to your github account. If we use dev.open-bio.org as our central git repo, we won't need to go through with that, but we will need a viewable version of dev available somehow (mirrored on github or otherwise). Speaking of... > I have a GitHub account, fangly, on which I just added the email address fangly at bioperl.org . Thanks for your efforts working on the Git migration. Florent From jay at jays.net Wed May 5 23:18:47 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:18:47 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: I smoked trunk a few times. Check out all the pretty buttons and graphs and such: http://biobase2.ist.unomaha.edu:8080/app/projects/smoke_reports/1 How you too can submit smoke results: http://jays.net/wiki/Smolder Neat? Not? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Wed May 5 23:31:05 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:31:05 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: On May 5, 2010, at 4:40 PM, Dave Messina wrote: > Very few of BioPerl's tests rely on outside servers, and those that do have to be turned on explicitly with a network-tests flag. So hopefully that won't be an issue. I said "no" to the network tests for my smoke runs. Haven't really examined the results enough to know if the failures are my fault or what. Since I always use bioperl-live out of SVN (soon git) I may not be following the ./Build.PL procedure correctly. > Agreed, not really. Except for some of the GMOD stuff. And anyway this could always be done later if desired. Probably much later. :) Ya. Some day http://smolder.open-bio.org hosting jillions of projects would be dreamy! :) Any open-bio.org projects using TAP other than BioPerl? Smolder can host anything TAP, and TAP producers are available in at least 17 languages: http://testanything.org/wiki/index.php/TAP_Producers > Would the reporter need to have any special setup to do this? LWP::UserAgent or Smolder's smolder_smoke_signal are the two methods I've successfully executed so far: http://jays.net/wiki/Smolder > Could this kind of reporting be written into the BioPerl Build.PL as a user-settable option (just like the options for installing scripts or running network tests)? > > If so, then we could get lots of feedback on trunk (master) commits and not just releases. Ya, wow. I've never built BioPerl "the right way" (I'm an SVN/git junkie) so I'm not sure how this would get put into Build.PL. Would you prompt the user, something like "Since you just installed BioPerl, we'd like to connect to the Internet and report in your test results. Is this ok? [yes] " ? It would be very cool to collect and trend thousands of reports, assuming it can be 100% automated for the user. Thanks for the feedback! :) Time to putter my motorcycle home before it gets too cold. G'night, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Wed May 5 23:43:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 22:43:14 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. chris On May 5, 2010, at 10:18 PM, Jay Hannah wrote: > I smoked trunk a few times. Check out all the pretty buttons and graphs and such: > > http://biobase2.ist.unomaha.edu:8080/app/projects/smoke_reports/1 > > How you too can submit smoke results: > > http://jays.net/wiki/Smolder > > Neat? Not? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Wed May 5 23:55:40 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:55:40 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: On May 5, 2010, at 10:43 PM, Chris Fields wrote: > Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. Ya, seems like the way to go. LWP is all over inside BioPerl already, whereas Smolder itself has 147 dependencies, most of which probably aren't relevant to most BioPerl users. :) http://deps.cpantesters.org/?module=Smolder;perl=latest So a stand-alone script that could be run whenever, plus (eventually) a prompt in Build.PL asking about running it? Not sure if Build.PL can somehow use the "prove --archive" hook to store the results during the normal installation run through all the tests... Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From lincoln.stein at gmail.com Thu May 6 08:01:09 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 6 May 2010 08:01:09 -0400 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: My github username is lstein and I've just added lstein at bioperl.org to my linked email addresses. I hope I have a bioperl.org address; I never use it! Lincoln On Wed, May 5, 2010 at 10:46 AM, Chris Fields wrote: > All, > > I would like to finalize moving over to git/github very soon. We're sort > of in limbo on this, so it needs to progress forward. We'll need to do some > initial cleanup after the move (Heikki is already doing a few things on the > test repo, which we'll need to diff over to the new one). > > So with that in mind, here are my thoughts. This is copied over to this > wiki page, in case you don't want to reply here: > > http://www.bioperl.org/wiki/From_SVN_to_Git > > (thanks Mark!) > > 1) Timeline > > When? Sooner the better (weeks as opposed to months). Our anon. svn is > down, likely permanently ( > http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > 2) Migration strategy > > Now mainly worked out using svn2git, which is very fast. We would need to > make the svn repo on dev read-only during this transition. My guess is it > would take very little time. Do we want to retain the git-SVN metadata on > commits? This is viewable with our current read-only mirror on github: > > > http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca > > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on > github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used > their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I > think, once one has signed up with github, you can add that same address to > your current ones, and it should map to your github account. If we use > dev.open-bio.org as our central git repo, we won't need to go through with > that, but we will need a viewable version of dev available somehow (mirrored > on github or otherwise). Speaking of... > > 4) Development strategy > > Are we sticking with a single centralized repo (SVN-like)? Will that be > github, or will github be a downstream repo to our work on dev? We could > feasibly have github be an active, forkable repo that could be > bidirectionally synced with dev, but I'm not sure of the logistics on this > (this popped up before with svn migration and was rejected b/c it was > considered too difficult to maintain). > > Git makes it very easy to make branches and merge in code to trunk. With > that in mind, I would highly suggest we start working on branches for almost > everything and merge over to trunk. There is very little to no overhead in > doing so with git. > > I like this strategy (Mark Jensen pointed this out): > http://nvie.com/git-model > > Also, several points were raised in a related project (Parrot) considering > a move to git/github from svn. One in particular was that git allows > destructive commits. Jonathan Leto indicated we can set up specific > branches that don't allow this, using commit hooks, so my guess is the > master branch and release branches wouldn't allow rewinds. > > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? > > http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod > > This is easy with github and forks. > > 6) SVN Read/Write to GitHub > > It was recently announced that one can access a github repo using > subversion as read-only, and just yesterday experimental write to github is > allowed: > > http://github.com/blog/644-subversion-write-support > > I can see allowing read-only svn, but write support is still experimental. > Do we want to allow that? > > 7) Others? > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Thu May 6 09:01:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 08:01:56 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> Message-ID: <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> (comments interspersed below) On May 5, 2010, at 4:27 PM, Dave Messina wrote: >> Do we want to retain the git-SVN metadata on commits? > > What are the tradeoffs with this? > > From the little reading I've done, it seems that space and clutter are the chief drawbacks, but that it's easy to strip this metadata out later. Does that jibe with your impression? I don't really see much use for it personally, beyond retaining the SVN commit #. >> Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly > > My github account name is: DaveMessina > > Do I have an @bioperl.org address? I tried sending mail to a few likely permutations without success. In any case, I added dave_messina -at- bioperl.org as an email address on my github account. I think if you have a bioperl dev account you should have a bioperl.org set up. That's one thing I'm not absolutely sure of. >> Are we sticking with a single centralized repo (SVN-like)? > > I am a total git novice, but it's my understanding that it's still a good idea, particularly with a big many-author project like BioPerl, to have a primary, official repo. But I'd be interested in hearing more discussion on this. We're at a good place to make large-ish changes to how we do things, I think. > > >> Will that be github, or will github be a downstream repo to our work on dev? > > My only concern with github being primary is in case something happens to github. Not likely, I know, but it seems prudent to maintain a certain amount of control over our destiny. > > So I'm inclined to make dev be primary and github downstream, with the assumption that it'd trivial to abandon dev and make github primary in the future if we want. > > Or would it be enough to auto-mirror to dev.open-bio.org, which could serve as a fallback in case github goes offline, temporarily or permanently? Well, the nice thing about git is essentially everyone who pulls has a copy of the repo. It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. We could also use alternate mirrors for github besides dev. http://repo.or.cz/w is one example. >> We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). > > Are there any git-familiar folks out there who could comment on the pros and cons of this? Perhaps some of the other Bio* projects who have switched to git could advise. > > Right now, without further technical details, I think it'd be better to have one true primary just because it's less confusing and easier to manage, particularly if we're to follow a model like the one mentioned just below: We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. >> I would highly suggest we start working on branches for almost everything and merge over to trunk. >> [...] >> I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model > > Yep, that looks good to me, too. > > > >> One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. > > We should try to make sure we have this sorted before going "live". Would be adding a pre-commit hook to disallow this. I'll look into it. >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? > > Yes! > > We want more people to jump in ? one of the benefits of git and github is that they encourage this. > > > >> 6) SVN Read/Write to GitHub >> >> I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? > > Read-only for sure ? that seems harmless, and we want to give people lots of ways to get BioPerl. > > Write ? let's play with it a bit, making a few test commits to bioperl-test, and see what happens. It would be nice if we don't force everyone who contributes to BioPerl to have to switch over to git immediately. Me included. :) Sounds good to me. >> 7) Others? > > What happens when we start splitting up bioperl into separate distros? Do we put them each into a separate repo? Yes. > Dave Thanks! chris From cjfields at illinois.edu Thu May 6 10:19:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 09:19:06 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: <3E35F38F-29A0-4419-AE24-AD25A0D6A6A1@illinois.edu> prove generally is just a perl script frontend for Test::Harness and App::Prove, correct? It is included in core from perl 5 on. Here is the code for 'prove' on my local setup: use strict; use App::Prove; my $app = App::Prove->new; $app->process_args(@ARGV); exit( $app->run ? 0 : 1 ); We could add a 'Build smoke' or somesuch that does this internally. I'm tending to shift away from Bio::Root::Build for such things at the moment, but maybe add something there? chris On May 5, 2010, at 10:55 PM, Jay Hannah wrote: > On May 5, 2010, at 10:43 PM, Chris Fields wrote: >> Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. > > Ya, seems like the way to go. LWP is all over inside BioPerl already, whereas Smolder itself has 147 dependencies, most of which probably aren't relevant to most BioPerl users. :) > > http://deps.cpantesters.org/?module=Smolder;perl=latest > > So a stand-alone script that could be run whenever, plus (eventually) a prompt in Build.PL asking about running it? Not sure if Build.PL can somehow use the "prove --archive" hook to store the results during the normal installation run through all the tests... > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Thu May 6 10:50:42 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 6 May 2010 09:50:42 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> Message-ID: <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> Chris, I added 'jhannah at bioperl.org' to my github list of email addresses. Can you add jhannah to the list of github committers in case github becomes the master repo? I need to clean up branches 'jhannah' and 'yapc10hackathon' whenever the transition is official and the master repo is declared (github or open-bio.org). Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Thu May 6 10:56:25 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 6 May 2010 09:56:25 -0500 Subject: [Bioperl-l] new core developers Rob Buels and Dave Messina In-Reply-To: References: Message-ID: On May 2, 2010, at 2:28 PM, Mark A. Jensen wrote: > On behalf of the core team, I am delighted to announce two new members: Rob Buels and Dave Messina. Woot! Congrats! Suddenly we WILL have a core dev at YAPC::NA for the hackathon! I'm now expecting great things from us. :) http://bioperl.org/wiki/YAPC Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Thu May 6 11:02:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 10:02:36 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> Message-ID: Done. I think, unless there are a terrible number of objections, we'll push this in the next week or two. Need to look into the pre-commit hook setup for non-destructive commits, post-commit hook for posting commits to bioperl-guts, etc. chris On May 6, 2010, at 9:50 AM, Jay Hannah wrote: > Chris, > > I added 'jhannah at bioperl.org' to my github list of email addresses. Can you add jhannah to the list of github committers in case github becomes the master repo? > > I need to clean up branches 'jhannah' and 'yapc10hackathon' whenever the transition is official and the master repo is declared (github or open-bio.org). > > Thanks, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki.lehvaslaiho at gmail.com Thu May 6 13:26:48 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 6 May 2010 20:26:48 +0300 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: On 5 May 2010 17:46, Chris Fields wrote: > All, > > I would like to finalize moving over to git/github very soon. We're sort > of in limbo on this, so it needs to progress forward. We'll need to do some > initial cleanup after the move (Heikki is already doing a few things on the > test repo, which we'll need to diff over to the new one). > Do not worry about those, I'll move them into the final repo once it is there. I am just making sure everything works. > So with that in mind, here are my thoughts. This is copied over to this > wiki page, in case you don't want to reply here: > > http://www.bioperl.org/wiki/From_SVN_to_Git > > (thanks Mark!) > > 1) Timeline > > When? Sooner the better (weeks as opposed to months). Our anon. svn is > down, likely permanently ( > http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > ASAP. > 2) Migration strategy > > Now mainly worked out using svn2git, which is very fast. We would need to > make the svn repo on dev read-only during this transition. My guess is it > would take very little time. Do we want to retain the git-SVN metadata on > commits? This is viewable with our current read-only mirror on github: > > > http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca > > Keep it. It does no harm. > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on > github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used > their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I > think, once one has signed up with github, you can add that same address to > your current ones, and it should map to your github account. If we use > dev.open-bio.org as our central git repo, we won't need to go through with > that, but we will need a viewable version of dev available somehow (mirrored > on github or otherwise). Speaking of... > Let's go for github as the main repo. It adds visibility and has the coolness factor that helps. > 4) Development strategy > > Are we sticking with a single centralized repo (SVN-like)? Will that be > github, or will github be a downstream repo to our work on dev? We could > feasibly have github be an active, forkable repo that could be > bidirectionally synced with dev, but I'm not sure of the logistics on this > (this popped up before with svn migration and was rejected b/c it was > considered too difficult to maintain). > > Git makes it very easy to make branches and merge in code to trunk. With > that in mind, I would highly suggest we start working on branches for almost > everything and merge over to trunk. There is very little to no overhead in > doing so with git. > > I like this strategy (Mark Jensen pointed this out): > http://nvie.com/git-model > Lets try to follow this strategy. I do not think moving away from svn and going decentralized at one go would work at all. > Also, several points were raised in a related project (Parrot) considering > a move to git/github from svn. One in particular was that git allows > destructive commits. Jonathan Leto indicated we can set up specific > branches that don't allow this, using commit hooks, so my guess is the > master branch and release branches wouldn't allow rewinds. > I would not worry too much about that. With git we'll have dozens if not not hundreds of full copies of the repo as a backup. > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? > > http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod > Interesting and educational document. Let's learn as much a we can from it. This is easy with github and forks. > The more the merrier. BTW, I can see Moose using Shipit, http://search.cpan.org/~bradfitz/ShipIt-0.55/ that might be worth using in BioPerl. > 6) SVN Read/Write to GitHub > > It was recently announced that one can access a github repo using > subversion as read-only, and just yesterday experimental write to github is > allowed: > > http://github.com/blog/644-subversion-write-support > > I can see allowing read-only svn, but write support is still experimental. > Do we want to allow that? > Why not is someone insists on using it. Once people get over the initial problems of moving to a different mind set in git, very few will want to use svn. There might be situtations when git does not work, however, so lets allow for svn usage. > > 7) Others? > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Thu May 6 14:35:55 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 6 May 2010 20:35:55 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> Message-ID: [ git-SVN metadata ] > I don't really see much use for it personally, beyond retaining the SVN commit #. Oh well heck, in that case we may as well ditch it. If there's some way we could easily keep an inactive, archived version with the SVN to github commit # mapping, that would be a nice safety measure, but if it's too much trouble we needn't bother. [ github or dev as primary ] > It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. Great, okay, sounds like there won't be any problem there. [ single repo? ] > We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. Sounds like a plan. I'm pretty swamped until late next week, but if there's anything I can do to help at that time, just holler... Dave From cseligman at earthlink.net Thu May 6 15:23:40 2010 From: cseligman at earthlink.net (Chet Seligman) Date: Thu, 6 May 2010 12:23:40 -0700 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 Message-ID: <001b01caed51$a2e745c0$e8b5d140$@net> I need some help in installing this as it is not in the Active-perl repository. Here's what I have done: 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz 2. Extracted it into an empty directory IN 3. Planned to install by specifying the ppd file directly: ppm install c:\IN\whatever module-name.ppd However, there is no .ppd file extracted. I'd appreciate it if someone would explain how to get Bio::Graphics installed? Chet From scott at scottcain.net Thu May 6 15:44:04 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 6 May 2010 15:44:04 -0400 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 In-Reply-To: <001b01caed51$a2e745c0$e8b5d140$@net> References: <001b01caed51$a2e745c0$e8b5d140$@net> Message-ID: Hi Chet, Install it via the cpan shell: $ cpan cpan> install Bio::Graphics Scott On Thu, May 6, 2010 at 3:23 PM, Chet Seligman wrote: > I need some help in installing this as it is not in the Active-perl > repository. Here's what I have done: > 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz > 2. Extracted it into an empty directory IN > 3. Planned to install by specifying the ppd file directly: > ppm install c:\IN\whatever module-name.ppd > > However, there is no .ppd file extracted. > > I'd appreciate it if someone would explain how to get Bio::Graphics > installed? > > Chet > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Thu May 6 15:57:03 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 6 May 2010 15:57:03 -0400 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 In-Reply-To: <002301caed55$53bfc400$fb3f4c00$@net> References: <001b01caed51$a2e745c0$e8b5d140$@net> <002301caed55$53bfc400$fb3f4c00$@net> Message-ID: Hi Chet, Please keep your responses on the bioperl mailing list. As long as you install BioPerl and GD before you try to install Bio::Graphics from cpan, yes, it is perfectly doable. You need to do that in the cmd shell. GD needs to be installed from ppm because it requires compiled code. Scott On Thu, May 6, 2010 at 3:50 PM, Chet Seligman wrote: > Hey Scott: > Is your suggestion doable in Windows? > > How? > > Chet > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Scott Cain > Sent: Thursday, May 06, 2010 12:44 PM > To: Chet Seligman > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Installing Bio-Graphics-2.06 > > Hi Chet, > > Install it via the cpan shell: > > $ cpan > cpan> install Bio::Graphics > > Scott > > > On Thu, May 6, 2010 at 3:23 PM, Chet Seligman > wrote: >> I need some help in installing this as it is not in the Active-perl >> repository. Here's what I have done: >> 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz >> 2. Extracted it into an empty directory IN >> 3. Planned to install by specifying the ppd file directly: >> ppm install c:\IN\whatever module-name.ppd >> >> However, there is no .ppd file extracted. >> >> I'd appreciate it if someone would explain how to get Bio::Graphics >> installed? >> >> Chet >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Thu May 6 16:04:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 15:04:39 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> Message-ID: <48C987D6-A7F2-4FBC-AB75-38F0B234961C@illinois.edu> On May 6, 2010, at 1:35 PM, Dave Messina wrote: > [ git-SVN metadata ] > >> I don't really see much use for it personally, beyond retaining the SVN commit #. > > Oh well heck, in that case we may as well ditch it. > > If there's some way we could easily keep an inactive, archived version with the SVN to github commit # mapping, that would be a nice safety measure, but if it's too much trouble we needn't bother. I think we'll keep it in for the SVN commits. Better to have it just in case. > [ github or dev as primary ] > >> It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. > > Great, okay, sounds like there won't be any problem there. > > > [ single repo? ] > >> We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. > > Sounds like a plan. > > > I'm pretty swamped until late next week, but if there's anything I can do to help at that time, just holler... > > > Dave Okay, will prep another email for the final push over to git. chris From cjfields at illinois.edu Thu May 6 16:13:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 15:13:44 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> On May 6, 2010, at 12:26 PM, Heikki Lehvaslaiho wrote: > ... >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? >> >> http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod >> > > Interesting and educational document. Let's learn as much a we can from it. > > This is easy with github and forks. >> > > The more the merrier. > > BTW, I can see Moose using Shipit, > http://search.cpan.org/~bradfitz/ShipIt-0.55/ > that might be worth using in BioPerl. I agree. Have thought about that, primarily for easier releases down the road. >> 6) SVN Read/Write to GitHub >> >> It was recently announced that one can access a github repo using >> subversion as read-only, and just yesterday experimental write to github is >> allowed: >> >> http://github.com/blog/644-subversion-write-support >> >> I can see allowing read-only svn, but write support is still experimental. >> Do we want to allow that? >> > > Why not is someone insists on using it. Once people get over the initial > problems of moving to a different mind set in git, very few will want to use > svn. There might be situtations when git does not work, however, so lets > allow for svn usage. Nothing really stopping it, unless we add something to a pre-commit hook that prevents it somehow. I'm thinking a move in the next 5 days, maybe starting Monday? I'll try getting a post out on it. chris From rmb32 at cornell.edu Thu May 6 17:09:03 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 06 May 2010 14:09:03 -0700 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> Message-ID: <4BE32FEF.6080707@cornell.edu> The branching model at http://nvie.com/git-model is a good one, but the diagram might be a little intimidating for devs that are new to git. Note that the only branches that most devs will need to be concerned with are the feature branches (sometimes called topic branches), and the main development branch. The other branches are mostly concerned with making releases. To weigh in on other issues on this thread: * Might as well keep the svn metadata, it doesn't hurt and could help in any situations that call for historical digging around. * I don't think we should allow any svn write support. Anybody that truly cannot get over the hump can send patches to the list. Thanks so much for heading this up Chris. Rob From cjfields at illinois.edu Thu May 6 17:28:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 16:28:25 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <4BE32FEF.6080707@cornell.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> Message-ID: <9676F5A9-A778-4440-95EF-14282DF72454@illinois.edu> On May 6, 2010, at 4:09 PM, Robert Buels wrote: > The branching model at http://nvie.com/git-model is a good one, but the diagram might be a little intimidating for devs that are new to git. > > Note that the only branches that most devs will need to be concerned with are the feature branches (sometimes called topic branches), and the main development branch. The other branches are mostly concerned with making releases. > > To weigh in on other issues on this thread: > > * Might as well keep the svn metadata, it doesn't hurt and could help in > any situations that call for historical digging around. > * I don't think we should allow any svn write support. Anybody that > truly cannot get over the hump can send patches to the list. > > Thanks so much for heading this up Chris. > > Rob One stumbling block that I'm seeing is there is a current lack of pre-commit hook support in github (to prevent destructive or history-changing commits). I don't think this will be a problem, but it's worth noting. post-commit is fine. chris From David.Messina at sbc.su.se Thu May 6 17:59:56 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 6 May 2010 23:59:56 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <4BE32FEF.6080707@cornell.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> Message-ID: <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> > * I don't think we should allow any svn write support. Anybody that > truly cannot get over the hump can send patches to the list. Unless svn commits are somehow problematic, is there another reason to disallow it? We're switching to git soon and with little advance notice. We'd be asking all the devs to make the move on our schedule. Dave From dimitark at bii.a-star.edu.sg Thu May 6 22:25:23 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 07 May 2010 10:25:23 +0800 Subject: [Bioperl-l] about Genewise Message-ID: <4BE37A13.6010309@bii.a-star.edu.sg> Hi guys, i have a question about Genewise. Is it possible to get the percent identity between query and target? I am now trying to figure that out. I found no such method so i suppose i should calculate it myself. Thank you for your time and help. Greetings Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From dimitark at bii.a-star.edu.sg Fri May 7 01:03:58 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 07 May 2010 13:03:58 +0800 Subject: [Bioperl-l] more genewise Message-ID: <4BE39F3E.4090204@bii.a-star.edu.sg> Hi guys, another question about genewise. Is it possible to get the query seq and the protein translation of the target seq somehow? So, up to now i could not find a way to get the percent identity between query and target(the protein translation) :( I spent some time on CPAN and perldoc and even checked the code of several modules but still no solution. Then i decided to extract the sequences out of the output file and compare them somehow but i could not find a way and for that. I found that the module 'Bio::Tools::Run::Genewise' is creating internal temp output file which i cant access so i can parse it myself and extract whatever. Because with current implementation i cant access that temp output i hacked a bit 'Bio::Tools::Run::Genewise' so i can pass my output file to the constructor, like that: my $factory = Bio::Tools::Run::Genewise->new( output => $tmpout); #not "-output" cos the module currently doesnt like it I modified the BEGIN section and the '_run' subroutine. My lines and the originals are marked : -------------- BEGIN { @GENEWISE_PARAMS = qw( DYMEM CODON GENE CFREQ SPLICE GENESTATS INIT SUBS INDEL INTRON NULL INSERT SPLICE_MAX_COLLAR SPLICE_MIN_COLLAR GW_EDGEQUERY GW_EDGETARGET GW_SPLICESPREAD KBYTE HNAME ALG BLOCK DIVIDE GENER U V S T G E M); @GENEWISE_SWITCHES = qw(HELP SILENT QUIET ERROROFFSTD TREV PSEUDO NOSPLICE_GTAG SPLICE_GTAG NOGWHSP GWHSP TFOR TABS BOTH HMMER ); $OK_FIELD{OUTPUT}++; *#dimitar * # Authorize attribute fields foreach my $attr ( @GENEWISE_PARAMS, @GENEWISE_SWITCHES, @OTHER_SWITCHES) { $OK_FIELD{$attr}++; } } ----------------------- ----------------------- my ($tfh1,$outfile1) = $self->io->tempfile(-dir=>$self->tempdir); $self->debug("genewise command = $commandstring"); my $outfile2=$self->output; *#dimitar* # my $status = system("$commandstring > $outfile1"); *#original* my $status = system("$commandstring > $outfile2 "); *#dimitar* $self->throw("Genewies call $commandstring crashed: $? \n") unless $status==0; # my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile1); *#original* my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile2); *#dimitar* ----------------------- More the method 'cds' from 'Bio::SeqFeature::Gene::Exon/I' gives nothing back it doesnt matter what i tried. And i tried a lot :) Fortunately for me i dont need that for now. But tried and didnt work so had to say. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From O.Niehuis.zfmk at uni-bonn.de Fri May 7 02:34:54 2010 From: O.Niehuis.zfmk at uni-bonn.de (Dr. Oliver Niehuis) Date: Fri, 7 May 2010 08:34:54 +0200 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifying alignment parameters Message-ID: Hi, I have a question about how to specify parameters for the alignment program MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run MAFFT with the following alignment parameters: --maxiterate 1000 --localpair Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module before, I specified the MAFFT run parameters as follows: @params = ('localpair', 'maxiterate' => 1000); $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); Unfortunately, this code causes an exception error: ------------- EXCEPTION ------------- MSG: Unallowed parameter: LOCALPAIR ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/ Generate_FASTA_files_of_orthologs.pl:55 ------------------------------------- I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT module, but only when leaving the @params array empty; MAFFT then runs with the default parameters. Has anyone an idea how I can specify run parameters for MAFFT via the Bio::Tools::Run::Alignment::MAFFT module? Any help is much appreciated! Best wishes, Oliver From biopython at maubp.freeserve.co.uk Fri May 7 04:51:38 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 May 2010 09:51:38 +0100 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> Message-ID: On Thu, May 6, 2010 at 10:59 PM, Dave Messina wrote: >> * I don't think we should allow any svn write support. ?Anybody that >> ?truly cannot get over the hump can send patches to the list. > > Unless svn commits are somehow problematic, is there another reason to disallow it? >From my reading of the github blog post, svn merges are potentially problematic. http://github.com/blog/644-subversion-write-support Peter From maj at fortinbras.us Fri May 7 07:53:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 07:53:55 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters In-Reply-To: References: Message-ID: Hi Oliver, This module looks like it needs some updating. Here's a hack that should make it work (or at least prevent that exception); put the following lines before the new() call: push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_PARAMS, 'MAXITERATE'; push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, 'LOCALPAIR'; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; HTH, Mark ----- Original Message ----- From: "Dr. Oliver Niehuis" To: Sent: Friday, May 07, 2010 2:34 AM Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters > Hi, > > I have a question about how to specify parameters for the alignment program > MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run > MAFFT with the following alignment parameters: > > --maxiterate 1000 --localpair > > Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module > before, I specified the MAFFT run parameters as follows: > > @params = ('localpair', 'maxiterate' => 1000); > $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); > > Unfortunately, this code causes an exception error: > > ------------- EXCEPTION ------------- > MSG: Unallowed parameter: LOCALPAIR ! > STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ > Bio/Tools/Run/Alignment/MAFFT.pm:211 > STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ > Tools/Run/Alignment/MAFFT.pm:196 > STACK toplevel /Users/Oliver/Desktop/Orthologs/ > Generate_FASTA_files_of_orthologs.pl:55 > ------------------------------------- > > I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT > module, but only when leaving the @params array empty; MAFFT then runs with > the default parameters. > > Has anyone an idea how I can specify run parameters for MAFFT via the > Bio::Tools::Run::Alignment::MAFFT module? > > Any help is much appreciated! > > Best wishes, > Oliver > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri May 7 08:12:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 7 May 2010 07:12:05 -0500 Subject: [Bioperl-l] more genewise In-Reply-To: <4BE39F3E.4090204@bii.a-star.edu.sg> References: <4BE39F3E.4090204@bii.a-star.edu.sg> Message-ID: <4899F495-FA46-4030-B984-EEFF81579C27@illinois.edu> Dimitar, It would be better if you could create a bug report describing the problem (with minimal example data and code) and provide a diff file or patch. This gives us a chance to do some code review and commit the patch if it passes tests. Here's a HOWTO on this: http://www.bioperl.org/wiki/HOWTO:SubmitPatch Let us know when it's submitted and we can take a look. chris On May 7, 2010, at 12:03 AM, Dimitar Kenanov wrote: > Hi guys, > another question about genewise. Is it possible to get the query seq and the protein translation of the target seq somehow? > > So, up to now i could not find a way to get the percent identity between query and target(the protein translation) :( I spent some time on CPAN and perldoc and even checked the code of several modules but still no solution. Then i decided to extract the sequences out of the output file and compare them somehow but i could not find a way and for that. I found that the module 'Bio::Tools::Run::Genewise' is creating internal temp output file which i cant access so i can parse it myself and extract whatever. > > Because with current implementation i cant access that temp output i hacked a bit 'Bio::Tools::Run::Genewise' so i can pass my output file to the constructor, like that: > > my $factory = Bio::Tools::Run::Genewise->new( output => $tmpout); #not "-output" cos the module currently doesnt like it > > I modified the BEGIN section and the '_run' subroutine. My lines and the originals are marked : > -------------- > BEGIN { > @GENEWISE_PARAMS = qw( DYMEM CODON GENE CFREQ SPLICE GENESTATS INIT > SUBS INDEL INTRON NULL INSERT SPLICE_MAX_COLLAR SPLICE_MIN_COLLAR > GW_EDGEQUERY GW_EDGETARGET GW_SPLICESPREAD > KBYTE HNAME ALG BLOCK DIVIDE GENER U V S T G E M); > > @GENEWISE_SWITCHES = qw(HELP SILENT QUIET ERROROFFSTD TREV PSEUDO NOSPLICE_GTAG > SPLICE_GTAG NOGWHSP GWHSP > TFOR TABS BOTH HMMER ); > > $OK_FIELD{OUTPUT}++; *#dimitar > * # Authorize attribute fields > foreach my $attr ( @GENEWISE_PARAMS, @GENEWISE_SWITCHES, > @OTHER_SWITCHES) { $OK_FIELD{$attr}++; } > } > ----------------------- > ----------------------- > my ($tfh1,$outfile1) = $self->io->tempfile(-dir=>$self->tempdir); > $self->debug("genewise command = $commandstring"); > my $outfile2=$self->output; *#dimitar* > # my $status = system("$commandstring > $outfile1"); *#original* > my $status = system("$commandstring > $outfile2 "); *#dimitar* > $self->throw("Genewies call $commandstring crashed: $? \n") unless $status==0; > > # my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile1); *#original* > my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile2); *#dimitar* > ----------------------- > > More the method 'cds' from 'Bio::SeqFeature::Gene::Exon/I' gives nothing back it doesnt matter what i tried. And i tried a lot :) Fortunately for me i dont need that for now. But tried and didnt work so had to say. > > Cheers > Dimitar > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri May 7 11:34:09 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 11:34:09 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters In-Reply-To: <332A01DD-64DA-41EC-B5CE-2BC74BE78038@uni-bonn.de> References: <332A01DD-64DA-41EC-B5CE-2BC74BE78038@uni-bonn.de> Message-ID: <9764564B5CC44A89883498C6309DA045@NewLife> Hi Oliver, I think so, looking at the module again. Instead of the lines in the previous post, put push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, '(LOCALPAIR', 'MAXITERATE'); $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; and create your @params array with @params = ('localpair' => 1, 'maxiterate' => 1000); The switches need to be set with something that returns true, I believe. I *think* this should work for you. But if you would, please submit your original problem as a bug at http://bugzilla.bioperl.org. The module definitely needs some tender loving care. Thanks Mark ----- Original Message ----- From: Dr. Oliver Niehuis To: Mark A. Jensen Sent: Friday, May 07, 2010 11:07 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters Dear Mark, Thanks for your quick reply and the MAFFT module hack. I added your code to my script and it seems to works, except that I can't specify the number of iterations (at least, I don't know how). I can specify my @params = ('localpair', 'maxiterate'); but when I assign 1000 to 'maxiterate' (i.e. 'maxiterate' => 1000), I get again an exception error, complaining about 1000 being an unallowed parameter. ------------- EXCEPTION ------------- MSG: Unallowed parameter: 1000 ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/Generate_FASTA_files_of_orthologs.pl:61 ------------------------------------- Do you know how to fix this? Best wishes, Oliver Am 07.05.2010 um 13:53 schrieb Mark A. Jensen: Hi Oliver, This module looks like it needs some updating. Here's a hack that should make it work (or at least prevent that exception); put the following lines before the new() call: push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_PARAMS, 'MAXITERATE'; push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, 'LOCALPAIR'; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; HTH, Mark ----- Original Message ----- From: "Dr. Oliver Niehuis" To: Sent: Friday, May 07, 2010 2:34 AM Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters Hi, I have a question about how to specify parameters for the alignment program MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run MAFFT with the following alignment parameters: --maxiterate 1000 --localpair Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module before, I specified the MAFFT run parameters as follows: @params = ('localpair', 'maxiterate' => 1000); $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); Unfortunately, this code causes an exception error: ------------- EXCEPTION ------------- MSG: Unallowed parameter: LOCALPAIR ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/ Generate_FASTA_files_of_orthologs.pl:55 ------------------------------------- I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT module, but only when leaving the @params array empty; MAFFT then runs with the default parameters. Has anyone an idea how I can specify run parameters for MAFFT via the Bio::Tools::Run::Alignment::MAFFT module? Any help is much appreciated! Best wishes, Oliver _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Fri May 7 12:42:38 2010 From: hartzell at alerce.com (George Hartzell) Date: Fri, 7 May 2010 09:42:38 -0700 Subject: [Bioperl-l] [job] Contract programmer in Bioinformatics at Genentech. Message-ID: <19428.17150.181595.755965@gargle.gargle.HOWL> Genentech's Bioinformatics department seeks an experienced software engineer for a six month contract. Modern Perl (or enlightened, or ..., just not circa 1998) style is required. We build tools to support our Research labs, collecting, storing, massaging, and presenting information to computer-philes and -phobes. We have more to do than we can handle, you'll be pitching in. Exactly what you'd be doing will be a function of your skills and our needs, and will probably vary a bit over the six month period. You write tests, sometimes even before you write code. You're not afraid of a little SQL and are comfortable collaborating with folks who were born speaking it. You're familiar with things like Moose, Rose::DB::Object, CGI::Application, NYTProf, and their ilk (or brethren) and more importantly are excited about learning more about them and using them in real-world work. Smoothing out our in-house DPAN, setting up an automated build/smoke system (we have Hudson handling Java builds already) and helping with some other infrastructure stuff is also on the table. You'll be working more-or-less full time in South San Fransisco, there's the potential for a bit of telecommuting once things get running smoothly but the bulk of the job is onsite. Things that you should be comfortable with include: Perl ("modern") SQL, object relational mappers Web application (CGI::Application, or similar) CPAN, Module::Build, Dist::Zilla, etc.... Linux Software engineering in a professional environment. Experience in bioinformatics, biology, or supporting scientists would be helpful but is not required. Please send cover letters and resumes to my work address: georgewh at gene.com (the ability to follow directions is important). Bonus points for easy formats (PDF is great!), demerits for sending me stuff in DOS specific archive formats. g. From qqq2395 at gmail.com Thu May 6 14:51:13 2010 From: qqq2395 at gmail.com (visitor555) Date: Thu, 6 May 2010 11:51:13 -0700 (PDT) Subject: [Bioperl-l] Bio::Align - alignment by position? Message-ID: <28478022.post@talk.nabble.com> Hi, I have a list alignment positions and I want to get each column them from the alignment. If I slice the alignment the sequence with gaps in these positions disappear. I can rotate on each seq and then split the sequence. Is there better way to go over the alignment position by position? thanks ! -- View this message in context: http://old.nabble.com/Bio%3A%3AAlign---alignment-by-position--tp28478022p28478022.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jillianrowe91286 at gmail.com Mon May 3 08:42:56 2010 From: jillianrowe91286 at gmail.com (mindlessbrain) Date: Mon, 3 May 2010 05:42:56 -0700 (PDT) Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall Message-ID: <28434717.post@talk.nabble.com> Hey all, I'm trying to run some code for StandAloneBLast in Windows Vista: [code] #!/usr/bin/perl use Bio::DB::SwissProt; use Bio::Tools::Run::StandAloneBlast; BEGIN { $ENV{PATH}="D:/blast-2.2.23+/bin/:"; } my $database = new Bio::DB::SwissProt; my $query = $database->get_Seq_by_id('TAUD_ECOLI'); my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastp', 'database' => 'swissprot', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); my $result = $blast_report->next_result; while( my $hit = $result->next_hit()) { print "\thit name: ", $hit->name(), " significance: ", $hit->significance(), "\n"; } [/code] I installed BLAST from the NCBI website. I get this when I run dir on the bin: D:\blast-2.2.23+\bin>dir Volume in drive D has no label. Volume Serial Number is 224C-0190 Directory of D:\blast-2.2.23+\bin 05/03/2010 03:02 PM . 05/03/2010 03:02 PM .. 03/08/2010 11:09 PM 2,789,376 blastdbcheck.exe 03/08/2010 11:09 PM 4,009,984 blastdbcmd.exe 03/08/2010 11:09 PM 1,810,432 blastdb_aliastool.exe 03/08/2010 11:09 PM 6,225,920 blastn.exe 03/08/2010 11:09 PM 6,221,824 blastp.exe 03/08/2010 11:09 PM 6,213,632 blastx.exe 03/08/2010 11:09 PM 5,316,608 blast_formatter.exe 03/08/2010 11:09 PM 3,215,360 convert2blastmask.exe 03/08/2010 11:09 PM 3,211,264 dustmasker.exe 03/08/2010 11:09 PM 51,178 legacy_blast.pl 03/08/2010 11:09 PM 3,866,624 makeblastdb.exe 03/08/2010 11:09 PM 3,612,672 makembindex.exe 03/08/2010 11:09 PM 6,344,704 psiblast.exe 03/08/2010 11:09 PM 6,201,344 rpsblast.exe 03/08/2010 11:09 PM 6,205,440 rpstblastn.exe 03/08/2010 11:09 PM 3,608,576 segmasker.exe 03/08/2010 11:09 PM 6,320,128 tblastn.exe 03/08/2010 11:09 PM 6,209,536 tblastx.exe 03/08/2010 11:09 PM 10,010 update_blastdb.pl 03/08/2010 11:09 PM 3,530,752 windowmasker.exe 20 File(s) 84,975,364 bytes 2 Dir(s) 122,390,626,304 bytes free I have an ncbi.ini file in my windows directory that contains: [NCBI] DATA=D:\blast-2.2.23+\data [BLAST] BLASTDB=D:\blast-2.2.23+\db Here's what my environmental variables looks like: http://old.nabble.com/file/p28434717/environmental%2Bvariables.jpg Help would be very, very appreciated! -- View this message in context: http://old.nabble.com/Bio%3A%3ATools%3A%3ARun%3A%3AStandAloneBlast-can%27t-find-path-to-blastall-tp28434717p28434717.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Fri May 7 16:07:58 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 16:07:58 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall In-Reply-To: <28434717.post@talk.nabble.com> References: <28434717.post@talk.nabble.com> Message-ID: <670B2E492D9E4D158618EC4750C595AF@NewLife> You've got blast+, so have a look at Bio::Tools::Run::StandAloneBlastPlus, should solve it. MAJ ----- Original Message ----- From: "mindlessbrain" To: Sent: Monday, May 03, 2010 8:42 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall > > Hey all, > > I'm trying to run some code for StandAloneBLast in Windows Vista: > > [code] > #!/usr/bin/perl > > use Bio::DB::SwissProt; > use Bio::Tools::Run::StandAloneBlast; > > BEGIN > { > $ENV{PATH}="D:/blast-2.2.23+/bin/:"; > } > > my $database = new Bio::DB::SwissProt; > my $query = $database->get_Seq_by_id('TAUD_ECOLI'); > > my $factory = Bio::Tools::Run::StandAloneBlast->new( > 'program' => 'blastp', > 'database' => 'swissprot', > _READMETHOD => "Blast" > ); > my $blast_report = $factory->blastall($query); > my $result = $blast_report->next_result; > while( my $hit = $result->next_hit()) { > print "\thit name: ", $hit->name(), > " significance: ", $hit->significance(), "\n"; > } > [/code] > > I installed BLAST from the NCBI website. I get this when I run dir on the > bin: > > D:\blast-2.2.23+\bin>dir > Volume in drive D has no label. > Volume Serial Number is 224C-0190 > > Directory of D:\blast-2.2.23+\bin > > 05/03/2010 03:02 PM . > 05/03/2010 03:02 PM .. > 03/08/2010 11:09 PM 2,789,376 blastdbcheck.exe > 03/08/2010 11:09 PM 4,009,984 blastdbcmd.exe > 03/08/2010 11:09 PM 1,810,432 blastdb_aliastool.exe > 03/08/2010 11:09 PM 6,225,920 blastn.exe > 03/08/2010 11:09 PM 6,221,824 blastp.exe > 03/08/2010 11:09 PM 6,213,632 blastx.exe > 03/08/2010 11:09 PM 5,316,608 blast_formatter.exe > 03/08/2010 11:09 PM 3,215,360 convert2blastmask.exe > 03/08/2010 11:09 PM 3,211,264 dustmasker.exe > 03/08/2010 11:09 PM 51,178 legacy_blast.pl > 03/08/2010 11:09 PM 3,866,624 makeblastdb.exe > 03/08/2010 11:09 PM 3,612,672 makembindex.exe > 03/08/2010 11:09 PM 6,344,704 psiblast.exe > 03/08/2010 11:09 PM 6,201,344 rpsblast.exe > 03/08/2010 11:09 PM 6,205,440 rpstblastn.exe > 03/08/2010 11:09 PM 3,608,576 segmasker.exe > 03/08/2010 11:09 PM 6,320,128 tblastn.exe > 03/08/2010 11:09 PM 6,209,536 tblastx.exe > 03/08/2010 11:09 PM 10,010 update_blastdb.pl > 03/08/2010 11:09 PM 3,530,752 windowmasker.exe > 20 File(s) 84,975,364 bytes > 2 Dir(s) 122,390,626,304 bytes free > > I have an ncbi.ini file in my windows directory that contains: > [NCBI] > DATA=D:\blast-2.2.23+\data > [BLAST] > BLASTDB=D:\blast-2.2.23+\db > > Here's what my environmental variables looks like: > > http://old.nabble.com/file/p28434717/environmental%2Bvariables.jpg > > Help would be very, very appreciated! > > > -- > View this message in context: > http://old.nabble.com/Bio%3A%3ATools%3A%3ARun%3A%3AStandAloneBlast-can%27t-find-path-to-blastall-tp28434717p28434717.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From manchunjohn-ma at uiowa.edu Fri May 7 16:17:52 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Fri, 7 May 2010 15:17:52 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> Hi, Right now I'm migrating some of my bioperl scripts from remote to stand-alone BLAST, and stumbled at how RemoteBlast->submit_blast and the StandAloneNCBIBlast->blastall deal with an array parameter. Common code for both versions: My p3_machine=Tools::Run::Primer3(@p3_parameters); [...] My $primer3_results=$p3_machine->run($seq); My $p3_results=$primers3_results->next_primer(); My @temp_primer_info=$p3_results->get_primer; My %primer_info; $primer_info{primer}[0]=$temp_primer_info[0]->seq; $primer_info{primer}[1]=$temp_primer_info[1]->seq; $primer_into{primer}[0]->display_id('F'); $primer_into{primer}[1]->display_id('R'); Code using RemoteBlast: My $remote_blast_machine=Tools::Run::RemoteBlast->new(@remote_blast_params) ; [Parameter setting skipped] $my $r=$remote_blast_machine->submit_blast(@primer_info{primer}); [etc, etc for iteration] Using this code, I have been able to put both sequences forth to the NCBI server and obtain results accordingly; each result object contains hits from an input sequence. However, when I switched to StandAlongBlast this way: My $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_blast _params); My $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); While (my $result=$blast_report->next_result()){ [etc, etc for iteration] } There is only one result object for sequence "F"-- and even so the loop went through twice. I would first suspect I made a mistake first-- but where? John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 From sumanth41277 at yahoo.com Fri May 7 17:34:53 2010 From: sumanth41277 at yahoo.com (polsum) Date: Fri, 7 May 2010 14:34:53 -0700 (PDT) Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU Message-ID: <28491725.post@talk.nabble.com> Hi - We have a pretty powerful computer with Dual-Quadcore intel Xeon w5580 prcoessor with 24 GB ram. When I use Bioperl programs for routine operations like Blastn and blast parsing etc. the programs dont seem to utilize the computer power to the fullest. I mean they just use one of the 8 cores and only 8GB of RAM. Is there a way to ask Perl to use all the available power? I have 64 bit windows and 64 bit Ubuntu and Ubuntu is definitely faster but still it also doesnt use entire cores of the cpu. thanks in advance -- View this message in context: http://old.nabble.com/Bio-Perl-and-multiple-cores-of-CPU-tp28491725p28491725.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Fri May 7 17:46:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 7 May 2010 16:46:24 -0500 Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU In-Reply-To: <28491725.post@talk.nabble.com> References: <28491725.post@talk.nabble.com> Message-ID: You can specify the number of processors to use. With legacy BLAST this is -a 8, with BLAST+ I think this is -num_threads 8 (with the explicit caveat I haven't tried the latter much, so no guarantees, we're not liable for explosions and such). chris On May 7, 2010, at 4:34 PM, polsum wrote: > Hi - We have a pretty powerful computer with Dual-Quadcore intel Xeon w5580 > prcoessor with 24 GB ram. When I use Bioperl programs for routine operations > like Blastn and blast parsing etc. the programs dont seem to utilize the > computer power to the fullest. I mean they just use one of the 8 cores and > only 8GB of RAM. Is there a way to ask Perl to use all the available power? > I have 64 bit windows and 64 bit Ubuntu and Ubuntu is definitely faster but > still it also doesnt use entire cores of the cpu. > > thanks in advance > -- > View this message in context: http://old.nabble.com/Bio-Perl-and-multiple-cores-of-CPU-tp28491725p28491725.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri May 7 18:14:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 00:14:24 +0200 Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU In-Reply-To: References: <28491725.post@talk.nabble.com> Message-ID: On May 7, 2010, at 11:46 PM, Chris Fields wrote: > With legacy BLAST this is -a 8, with BLAST+ I think this is -num_threads 8 (with the explicit caveat I haven't tried the latter much, so no guarantees, we're not liable for explosions and such). Once other caveat if you use BLAST+: be sure you have the latest version 2.2.23. In my informal testing, the num_threads option wasn't working correctly in 2.2.22. Blast parsing will still be single-threaded, by the way. BioPerl programs, like everything else unfortunately, need to explicitly spawn multiple threads or forks to take advantage of multiple cores. While I've never done it myself, I ran across this post which may be helpful in case you want to try it: http://computationalbiologynews.blogspot.com/2008/07/harnessing-power-of-multicore.html Dave From David.Messina at sbc.su.se Fri May 7 18:34:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 00:34:10 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> Hi John, You're right that passing parameters should work similarly for both RemoteBlast and StandAloneBlast, but without seeing exactly the parameter array you're passing, it's not possible to identify the problem. Could you perhaps post a small, but complete test program that demonstrates the problem? Dave PS ? is this the actual code you ran? > My $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_blast_params); > My $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); > While (my $result=$blast_report->next_result()){ > [etc, etc for iteration] > } I'm guessing you were paraphrasing, but I ask because My, with a capital "M", will generate an error, you're calling Tools::Run::StandAloneBlast instead of Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), i.e. it should be: my $Stand_alone_blast_machine = Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); From florent.angly at gmail.com Sat May 8 00:42:18 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 08 May 2010 14:42:18 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: References: <28491725.post@talk.nabble.com> Message-ID: <4BE4EBAA.5010709@gmail.com> Hi all, I am working on updating some of the Bio::Assembly::* modules right now. I need to sort a list of IDs. These IDs could be numbers, "words" or a mix of the two, for example: @arr = ('singlet1', 'contig10', 'contig2', '101', '3'); I cannot sort them with the numerical sort: sort { $a <=> $b } @array This would generates warnings because some of'singlet1' the IDs are numbers. I cannot sort them lexically: sort @array Lexical sorting would not take into account numbers properly and result in: singlet1 contig10 contig2 3 101 So, what I really need is natural sorting, which is not in any core function of Perl. I'd like to use the CPAN module Sort::Naturally for this purpose: nsort @arr The results would be what we expect, i.e.: 3 101 contig2 contig10 singlet1 Can I add this module as an additional dependency of BioPerl? I imagine that some other modules might want to use this. On the assembly side, it would be used by the writing methods of Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around my problem that doesn't require any external module? Florent From manchunjohn-ma at uiowa.edu Sat May 8 17:37:13 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 16:37:13 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> Hi, And that's my problem here: I checked the BLAST output, and the two sequences did get aligned-- just that SearchIO, in whatever flavour (I tried blast, blasttable and blastxml) didn't see to do to the next result when next_result() is called. It knows there're two results, but still getting the first result on the second call. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 4:33 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Hi John, Please remember to keep Cc'ing the mailing list so that everyone can participate in the discussion. If I understand your question correctly, yes, you can iterate through the blast results in a report called $blast_report using next_result. If you haven't already, you may want to look at the SearchIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SearchIO (although the BioPerl website appears to be temporarily offline, so check back a little later.) Dave On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > Hi, > > I have did some more investigation and found that the issue is > probably that of SearchIO rather than StandAloneBlast--in case I made > a mistake, so if I parsed a standard @array of Bio::Seq objects into > StandAloneBlast (blastn with SearchIO output), the result for each of > the seqs in the array can be assessed by $blast_report->next_result, > right? > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, May 07, 2010 5:34 PM > To: Ma, Man Chun John > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Array Handling Differences between > RemoteBlast and StandAloneBlast > > Hi John, > > You're right that passing parameters should work similarly for both > RemoteBlast and StandAloneBlast, but without seeing exactly the > parameter array you're passing, it's not possible to identify the > problem. > > Could you perhaps post a small, but complete test program that > demonstrates the problem? > > > Dave > > > PS - is this the actual code you ran? > >> My >> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bl >> a >> st_params); My >> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >> While (my $result=$blast_report->next_result()){ >> [etc, etc for iteration] >> } > > I'm guessing you were paraphrasing, but I ask because My, with a > capital "M", will generate an error, you're calling > Tools::Run::StandAloneBlast instead of > Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), i.e. it should be: > > my $Stand_alone_blast_machine = > Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); > > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5095 (20100507) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > From David.Messina at sbc.su.se Sat May 8 17:32:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:32:42 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> Hi John, Please remember to keep Cc'ing the mailing list so that everyone can participate in the discussion. If I understand your question correctly, yes, you can iterate through the blast results in a report called $blast_report using next_result. If you haven't already, you may want to look at the SearchIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SearchIO (although the BioPerl website appears to be temporarily offline, so check back a little later.) Dave On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > Hi, > > I have did some more investigation and found that the issue is probably > that of SearchIO rather than StandAloneBlast--in case I made a mistake, > so if I parsed a standard @array of Bio::Seq objects into > StandAloneBlast (blastn with SearchIO output), the result for each of > the seqs in the array can be assessed by $blast_report->next_result, > right? > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, May 07, 2010 5:34 PM > To: Ma, Man Chun John > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast > and StandAloneBlast > > Hi John, > > You're right that passing parameters should work similarly for both > RemoteBlast and StandAloneBlast, but without seeing exactly the > parameter array you're passing, it's not possible to identify the > problem. > > Could you perhaps post a small, but complete test program that > demonstrates the problem? > > > Dave > > > PS - is this the actual code you ran? > >> My >> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bla >> st_params); My >> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >> While (my $result=$blast_report->next_result()){ >> [etc, etc for iteration] >> } > > I'm guessing you were paraphrasing, but I ask because My, with a capital > "M", will generate an error, you're calling Tools::Run::StandAloneBlast > instead of Bio::Tools::Run::StandAloneBlast, and there's no method call > to new(), i.e. it should be: > > my $Stand_alone_blast_machine = > Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); > > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5095 (20100507) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > From cjfields at illinois.edu Sat May 8 15:41:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 14:41:58 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> Lincoln, Just an update, I've added you, as well as Dave and Florent. Still not sure about the bioperl.org address myself, but it seems to work for Dave and others. We posted to root-l and Chris D. to make sure that's correct or if we should be using open-bio.org instead, but I believe it is. chris On May 6, 2010, at 7:01 AM, Lincoln Stein wrote: > My github username is lstein and I've just added lstein at bioperl.org to my > linked email addresses. I hope I have a bioperl.org address; I never use it! > > Lincoln > > On Wed, May 5, 2010 at 10:46 AM, Chris Fields wrote: > >> All, >> >> I would like to finalize moving over to git/github very soon. We're sort >> of in limbo on this, so it needs to progress forward. We'll need to do some >> initial cleanup after the move (Heikki is already doing a few things on the >> test repo, which we'll need to diff over to the new one). >> >> So with that in mind, here are my thoughts. This is copied over to this >> wiki page, in case you don't want to reply here: >> >> http://www.bioperl.org/wiki/From_SVN_to_Git >> >> (thanks Mark!) >> >> 1) Timeline >> >> When? Sooner the better (weeks as opposed to months). Our anon. svn is >> down, likely permanently ( >> http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). >> >> 2) Migration strategy >> >> Now mainly worked out using svn2git, which is very fast. We would need to >> make the svn repo on dev read-only during this transition. My guess is it >> would take very little time. Do we want to retain the git-SVN metadata on >> commits? This is viewable with our current read-only mirror on github: >> >> >> http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca >> >> 3) Developers >> >> Not everyone has a github account. Recent ones who I couldn't find on >> github: dmessina, fangly >> >> The current authors file used for mapping commit authors to emails used >> their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I >> think, once one has signed up with github, you can add that same address to >> your current ones, and it should map to your github account. If we use >> dev.open-bio.org as our central git repo, we won't need to go through with >> that, but we will need a viewable version of dev available somehow (mirrored >> on github or otherwise). Speaking of... >> >> 4) Development strategy >> >> Are we sticking with a single centralized repo (SVN-like)? Will that be >> github, or will github be a downstream repo to our work on dev? We could >> feasibly have github be an active, forkable repo that could be >> bidirectionally synced with dev, but I'm not sure of the logistics on this >> (this popped up before with svn migration and was rejected b/c it was >> considered too difficult to maintain). >> >> Git makes it very easy to make branches and merge in code to trunk. With >> that in mind, I would highly suggest we start working on branches for almost >> everything and merge over to trunk. There is very little to no overhead in >> doing so with git. >> >> I like this strategy (Mark Jensen pointed this out): >> http://nvie.com/git-model >> >> Also, several points were raised in a related project (Parrot) considering >> a move to git/github from svn. One in particular was that git allows >> destructive commits. Jonathan Leto indicated we can set up specific >> branches that don't allow this, using commit hooks, so my guess is the >> master branch and release branches wouldn't allow rewinds. >> >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? >> >> http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod >> >> This is easy with github and forks. >> >> 6) SVN Read/Write to GitHub >> >> It was recently announced that one can access a github repo using >> subversion as read-only, and just yesterday experimental write to github is >> allowed: >> >> http://github.com/blog/644-subversion-write-support >> >> I can see allowing read-only svn, but write support is still experimental. >> Do we want to allow that? >> >> 7) Others? >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 8 15:23:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 14:23:35 -0500 Subject: [Bioperl-l] GitHub migration Wednesday Message-ID: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> Seems like we're all pretty much in agreement that this needs to happen sooner than later. So, I'm scheduling the git/github migration aggressively, for this Wednesday. Key steps: 1) Notify the list prior to locking the svn repo and/or making it read-only. 2) We need to set up post-commit hooks to forward commit messages on to bioperl-guts and elsewhere. I have tried this out off github and so far it's a little problematic (not working off bioperl-test, but working off my own github commits). 3) The current bioperl github repos will all be replaced with their live counterparts (branches and all), generated off the latest SVN via svn2git (including metadata). I'll have to reinstate collaborators at that time, but the author mapping should be the same as before (DEVACCOUNT at bioperl.org, where DEVACCOUNT is one's user name on dev.open-bio.org). 4) Update the wiki pages as needed to point to the github repo instead of the code.open-bio.org one. Also, I'm sure this will catch many devs not paying attention to the list by surprise, so we'll need a developer migration page set up. Anything else? chris From cjfields at illinois.edu Sat May 8 16:33:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 15:33:36 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <7EC12A62-249D-4816-9FDD-6D321095AA4B@illinois.edu> I don't have a problem with this personally, seeing how complex the code can get for natural sorting. It would become a recommended module, though, not a full dependency. chris On May 7, 2010, at 11:42 PM, Florent Angly wrote: > Hi all, > > I am working on updating some of the Bio::Assembly::* modules right now. > I need to sort a list of IDs. These IDs could be numbers, "words" or a mix of the two, for example: @arr = ('singlet1', 'contig10', 'contig2', '101', '3'); > > I cannot sort them with the numerical sort: sort { $a <=> $b } @array > This would generates warnings because some of'singlet1' the IDs are numbers. > > I cannot sort them lexically: sort @array > Lexical sorting would not take into account numbers properly and result in: > singlet1 contig10 contig2 3 101 > > So, what I really need is natural sorting, which is not in any core function of Perl. I'd like to use the CPAN module Sort::Naturally for this purpose: nsort @arr > The results would be what we expect, i.e.: > 3 101 contig2 contig10 singlet1 > > Can I add this module as an additional dependency of BioPerl? I imagine that some other modules might want to use this. On the assembly side, it would be used by the writing methods of Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around my problem that doesn't require any external module? > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Sat May 8 17:47:07 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:47:07 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> There was a report last week of a possible problem with BLAST parsing introduced in the last few days. I don't know what the status of that is, but it's possible that it's related. In any case, if you post your code and the blast report you're parsing, we might be able to diagnose the problem. Also, what version of BioPerl are you using? Dave On May 8, 2010, at 11:37 PM, Ma, Man Chun John wrote: > Hi, > > And that's my problem here: I checked the BLAST output, and the two > sequences did get aligned-- just that SearchIO, in whatever flavour (I > tried blast, blasttable and blastxml) didn't see to do to the next > result when next_result() is called. It knows there're two results, but > still getting the first result on the second call. > > Cheers, > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Saturday, May 08, 2010 4:33 PM > To: Ma, Man Chun John > Cc: BioPerl List > Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast > and StandAloneBlast > > Hi John, > > Please remember to keep Cc'ing the mailing list so that everyone can > participate in the discussion. > > If I understand your question correctly, yes, you can iterate through > the blast results in a report called $blast_report using next_result. > > If you haven't already, you may want to look at the SearchIO HOWTO: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > (although the BioPerl website appears to be temporarily offline, so > check back a little later.) > > > Dave > > > > On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > >> Hi, >> >> I have did some more investigation and found that the issue is >> probably that of SearchIO rather than StandAloneBlast--in case I made >> a mistake, so if I parsed a standard @array of Bio::Seq objects into >> StandAloneBlast (blastn with SearchIO output), the result for each of >> the seqs in the array can be assessed by $blast_report->next_result, >> right? >> >> >> John MC Ma >> Graduate Assistant >> Kwitek Lab >> Department of Internal Medicine >> 3125E MERF >> 375 Newton Road >> Iowa City IA 52242 >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, May 07, 2010 5:34 PM >> To: Ma, Man Chun John >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Array Handling Differences between >> RemoteBlast and StandAloneBlast >> >> Hi John, >> >> You're right that passing parameters should work similarly for both >> RemoteBlast and StandAloneBlast, but without seeing exactly the >> parameter array you're passing, it's not possible to identify the >> problem. >> >> Could you perhaps post a small, but complete test program that >> demonstrates the problem? >> >> >> Dave >> >> >> PS - is this the actual code you ran? >> >>> My >>> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bl >>> a >>> st_params); My >>> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >>> While (my $result=$blast_report->next_result()){ >>> [etc, etc for iteration] >>> } >> >> I'm guessing you were paraphrasing, but I ask because My, with a >> capital "M", will generate an error, you're calling >> Tools::Run::StandAloneBlast instead of >> Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), > i.e. it should be: >> >> my $Stand_alone_blast_machine = >> Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); >> >> >> >> __________ Information from ESET NOD32 Antivirus, version of virus >> signature database 5095 (20100507) __________ >> >> The message was checked by ESET NOD32 Antivirus. >> >> http://www.eset.com >> > From cjfields at illinois.edu Sat May 8 14:59:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 13:59:13 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> Message-ID: <73BDDA86-F487-484F-A87C-1DF37CDEA7D8@illinois.edu> On May 7, 2010, at 3:51 AM, Peter wrote: > On Thu, May 6, 2010 at 10:59 PM, Dave Messina wrote: >>> * I don't think we should allow any svn write support. Anybody that >>> truly cannot get over the hump can send patches to the list. >> >> Unless svn commits are somehow problematic, is there another reason to disallow it? > >> From my reading of the github blog post, svn merges are potentially problematic. > http://github.com/blog/644-subversion-write-support > > Peter Yes, they're still working out the kinks. I think we would only support read until the bugs get worked out of write. chris From David.Messina at sbc.su.se Sat May 8 17:33:53 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:33:53 +0200 Subject: [Bioperl-l] wiki offline? Message-ID: <064068F0-FF78-4557-9356-54CB1DB1783B@sbc.su.se> Hi, The BioPerl website appears to be down, at least from my spot on the net ? could someone please look into it? Thanks, Dave From David.Messina at sbc.su.se Sat May 8 16:07:02 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 22:07:02 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> Message-ID: <9A27A797-027E-445D-A8C3-6A7B6FBF4F13@sbc.su.se> Thanks, Chris. It took a few days for github to "notice" my @bioperl.org address and connect it to my commits. Since Lincoln added his @bioperl.org email to github a little later than I did, it may just be still trickling through the github pipes. Dave From florent.angly at gmail.com Sat May 8 07:34:15 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 08 May 2010 21:34:15 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <4BE54C37.7020304@gmail.com> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. It looks like the Bio::SeqIO modules tests could use it as well. Cheers, Florent From David.Messina at sbc.su.se Sat May 8 18:40:22 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 9 May 2010 00:40:22 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Message-ID: Hi John, Your blast report works fine for me with the following code taken from the Bio::SearchIO HOWTO: #!usr/bin/perl use strict; use warnings; use Bio::SearchIO; my $in = Bio::SearchIO->new('-file' => 'blastout', '-format' => 'blast'); while(my $result = $in->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; } } } ## Here is the output: Query=F Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Query=F Hit=ref|NC_005117.2|NC_005117 Length=18 Percent_id=100 Query=F Hit=ref|NC_005105.2|NC_005105 Length=18 Percent_id=100 Query=R Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Dave From manchunjohn-ma at uiowa.edu Sat May 8 18:43:11 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 17:43:11 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> Hi Dave, Yes, I tried to write a separate script to parse all those files, and they came out fine. It just happens when I run the entire target script; and if I replace the StandAloneBlast part with the standard RemoteBlast code, it's file, too. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 5:40 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Hi John, Your blast report works fine for me with the following code taken from the Bio::SearchIO HOWTO: #!usr/bin/perl use strict; use warnings; use Bio::SearchIO; my $in = Bio::SearchIO->new('-file' => 'blastout', '-format' => 'blast'); while(my $result = $in->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; } } } ## Here is the output: Query=F Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Query=F Hit=ref|NC_005117.2|NC_005117 Length=18 Percent_id=100 Query=F Hit=ref|NC_005105.2|NC_005105 Length=18 Percent_id=100 Query=R Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Dave From David.Messina at sbc.su.se Sat May 8 18:58:41 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 9 May 2010 00:58:41 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <41281436-08D3-46F9-BDD0-A8D5306DB412@sbc.su.se> I cannot help you without seeing the code. It sounds like you've already tested the parsing part in a script by itself and that works. If you haven't already, you can test the running Blast part in its own script and see if that works. If both parts work separately, then there's something wrong with the way they have been put together. Dave From jason at bioperl.org Sat May 8 12:06:28 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 08 May 2010 09:06:28 -0700 Subject: [Bioperl-l] Bio::Align - alignment by position? In-Reply-To: <28478022.post@talk.nabble.com> References: <28478022.post@talk.nabble.com> Message-ID: <4BE58C04.8090901@bioperl.org> Not clear what you want to make. You want a new alignment that only contains the columns in your list or You want to extract each column in your list one by one? visitor555 wrote, On 5/6/10 11:51 AM: > Hi, > > I have a list alignment positions and I want to get each column them from > the alignment. If I slice the alignment the sequence with gaps in these > positions disappear. I can rotate on each seq and then split the sequence. > Is there better way to go over the alignment position by position? > > thanks ! > From jason at bioperl.org Sat May 8 12:12:26 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 08 May 2010 09:12:26 -0700 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <4BE58D6A.9080601@bioperl.org> Unless necessary I don't know if adding yet another dependency is warranted here. I don't know how complicated the words will be but can't you just strip out the numbers and do this in a schwartzian transformation? #!/usr/bin/perl -w use strict; my @arr = qw(single1 contig10 101 contig2 3); my @sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ /(\d+)/, $_] } @arr; print join("\n", at sorted),"\n"; But I'm not sure how do you want to sort 10 vs contig10 vs singlet10 reliably? -jason Florent Angly wrote, On 5/7/10 9:42 PM: > Hi all, > > I am working on updating some of the Bio::Assembly::* modules right now. > I need to sort a list of IDs. These IDs could be numbers, "words" or a > mix of the two, for example: @arr = ('singlet1', 'contig10', > 'contig2', '101', '3'); > > I cannot sort them with the numerical sort: sort { $a <=> $b } @array > This would generates warnings because some of'singlet1' the IDs are > numbers. > > I cannot sort them lexically: sort @array > Lexical sorting would not take into account numbers properly and > result in: > singlet1 contig10 contig2 3 101 > > So, what I really need is natural sorting, which is not in any core > function of Perl. I'd like to use the CPAN module Sort::Naturally for > this purpose: nsort @arr > The results would be what we expect, i.e.: > 3 101 contig2 contig10 singlet1 > > Can I add this module as an additional dependency of BioPerl? I > imagine that some other modules might want to use this. On the > assembly side, it would be used by the writing methods of > Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around > my problem that doesn't require any external module? > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 8 19:47:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 18:47:58 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE54C37.7020304@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> Message-ID: To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. chris On May 8, 2010, at 6:34 AM, Florent Angly wrote: > Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. > > It looks like the Bio::SeqIO modules tests could use it as well. > > Cheers, > > Florent > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 8 20:02:28 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 19:02:28 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> Message-ID: <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. chris On May 8, 2010, at 6:47 PM, Chris Fields wrote: > To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. > > chris > > On May 8, 2010, at 6:34 AM, Florent Angly wrote: > >> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >> >> It looks like the Bio::SeqIO modules tests could use it as well. >> >> Cheers, >> >> Florent >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sat May 8 19:30:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 8 May 2010 19:30:48 -0400 Subject: [Bioperl-l] GitHub migration Wednesday In-Reply-To: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> References: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> Message-ID: <9B5043D308B942AEB4F9AA199470812B@NewLife> Sail on, great Ship of State. ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Saturday, May 08, 2010 3:23 PM Subject: [Bioperl-l] GitHub migration Wednesday > Seems like we're all pretty much in agreement that this needs to happen sooner > than later. So, I'm scheduling the git/github migration aggressively, for > this Wednesday. Key steps: > > 1) Notify the list prior to locking the svn repo and/or making it read-only. > > 2) We need to set up post-commit hooks to forward commit messages on to > bioperl-guts and elsewhere. I have tried this out off github and so far it's > a little problematic (not working off bioperl-test, but working off my own > github commits). > > 3) The current bioperl github repos will all be replaced with their live > counterparts (branches and all), generated off the latest SVN via svn2git > (including metadata). I'll have to reinstate collaborators at that time, but > the author mapping should be the same as before (DEVACCOUNT at bioperl.org, where > DEVACCOUNT is one's user name on dev.open-bio.org). > > 4) Update the wiki pages as needed to point to the github repo instead of the > code.open-bio.org one. Also, I'm sure this will catch many devs not paying > attention to the list by surprise, so we'll need a developer migration page > set up. > > Anything else? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From manchunjohn-ma at uiowa.edu Sat May 8 17:59:08 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 16:59:08 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Hi, I use bioperl-live 16950 with blast 2.2.23 I haven't been able to put together a simplier script with problem at this time, so I'd put the BLASTn outputs (in blast, blasttable and blastxml formats) here-- they look perfectly normal except that look like 2 separate output files appended together. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 4:47 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast There was a report last week of a possible problem with BLAST parsing introduced in the last few days. I don't know what the status of that is, but it's possible that it's related. In any case, if you post your code and the blast report you're parsing, we might be able to diagnose the problem. Also, what version of BioPerl are you using? Dave On May 8, 2010, at 11:37 PM, Ma, Man Chun John wrote: > Hi, > > And that's my problem here: I checked the BLAST output, and the two > sequences did get aligned-- just that SearchIO, in whatever flavour (I > tried blast, blasttable and blastxml) didn't see to do to the next > result when next_result() is called. It knows there're two results, > but still getting the first result on the second call. > > Cheers, > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Saturday, May 08, 2010 4:33 PM > To: Ma, Man Chun John > Cc: BioPerl List > Subject: Re: [Bioperl-l] Array Handling Differences between > RemoteBlast and StandAloneBlast > > Hi John, > > Please remember to keep Cc'ing the mailing list so that everyone can > participate in the discussion. > > If I understand your question correctly, yes, you can iterate through > the blast results in a report called $blast_report using next_result. > > If you haven't already, you may want to look at the SearchIO HOWTO: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > (although the BioPerl website appears to be temporarily offline, so > check back a little later.) > > > Dave > > > > On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > >> Hi, >> >> I have did some more investigation and found that the issue is >> probably that of SearchIO rather than StandAloneBlast--in case I made >> a mistake, so if I parsed a standard @array of Bio::Seq objects into >> StandAloneBlast (blastn with SearchIO output), the result for each of >> the seqs in the array can be assessed by $blast_report->next_result, >> right? >> >> >> John MC Ma >> Graduate Assistant >> Kwitek Lab >> Department of Internal Medicine >> 3125E MERF >> 375 Newton Road >> Iowa City IA 52242 >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, May 07, 2010 5:34 PM >> To: Ma, Man Chun John >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Array Handling Differences between >> RemoteBlast and StandAloneBlast >> >> Hi John, >> >> You're right that passing parameters should work similarly for both >> RemoteBlast and StandAloneBlast, but without seeing exactly the >> parameter array you're passing, it's not possible to identify the >> problem. >> >> Could you perhaps post a small, but complete test program that >> demonstrates the problem? >> >> >> Dave >> >> >> PS - is this the actual code you ran? >> >>> My >>> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_b >>> l >>> a >>> st_params); My >>> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >>> While (my $result=$blast_report->next_result()){ >>> [etc, etc for iteration] >>> } >> >> I'm guessing you were paraphrasing, but I ask because My, with a >> capital "M", will generate an error, you're calling >> Tools::Run::StandAloneBlast instead of >> Bio::Tools::Run::StandAloneBlast, and there's no method call to >> new(), > i.e. it should be: >> >> my $Stand_alone_blast_machine = >> Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); >> >> >> >> __________ Information from ESET NOD32 Antivirus, version of virus >> signature database 5095 (20100507) __________ >> >> The message was checked by ESET NOD32 Antivirus. >> >> http://www.eset.com >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: blasttable Type: application/octet-stream Size: 842 bytes Desc: blasttable URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blast.xml Type: text/xml Size: 7598 bytes Desc: blast.xml URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastout Type: application/octet-stream Size: 3576 bytes Desc: blastout URL: From florent.angly at gmail.com Sun May 9 01:12:03 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sun, 09 May 2010 15:12:03 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE58D6A.9080601@bioperl.org> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE58D6A.9080601@bioperl.org> Message-ID: <4BE64423.1040104@gmail.com> Within one assembly file, contig IDs typically tend to follow one formatting convention. The two most popular ones are a numerical ID, or an alphanumeric ID, such as 'contig13'. The later case already requires natural sorting. There is no way to know in advance what format to expect, and in fact, the format being specified by the user, it could be arbitrarily complicated, although probably, IDs would be sorted naturally. I will follow Chris's recommendation of using Sort::Naturally as a recommended package. The users who don't have this dependency will have their IDs sorted in a safe way, lexically. Florent On 09/05/10 02:12, Jason Stajich wrote: > Unless necessary I don't know if adding yet another dependency is > warranted here. > > I don't know how complicated the words will be but can't you just > strip out the numbers and do this in a schwartzian transformation? > > #!/usr/bin/perl -w > use strict; > my @arr = qw(single1 contig10 101 contig2 3); > my @sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ > /(\d+)/, $_] } @arr; > print join("\n", at sorted),"\n"; > > But I'm not sure how do you want to sort > 10 vs contig10 vs singlet10 reliably? > > -jason > > Florent Angly wrote, On 5/7/10 9:42 PM: >> Hi all, >> >> I am working on updating some of the Bio::Assembly::* modules right now. >> I need to sort a list of IDs. These IDs could be numbers, "words" or >> a mix of the two, for example: @arr = ('singlet1', >> 'contig10', 'contig2', '101', '3'); >> >> I cannot sort them with the numerical sort: sort { $a <=> $b } @array >> This would generates warnings because some of'singlet1' the IDs are >> numbers. >> >> I cannot sort them lexically: sort @array >> Lexical sorting would not take into account numbers properly and >> result in: >> singlet1 contig10 contig2 3 101 >> >> So, what I really need is natural sorting, which is not in any core >> function of Perl. I'd like to use the CPAN module Sort::Naturally for >> this purpose: nsort @arr >> The results would be what we expect, i.e.: >> 3 101 contig2 contig10 singlet1 >> >> Can I add this module as an additional dependency of BioPerl? I >> imagine that some other modules might want to use this. On the >> assembly side, it would be used by the writing methods of >> Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around >> my problem that doesn't require any external module? >> >> Florent >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Sun May 9 03:26:19 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sun, 09 May 2010 17:26:19 +1000 Subject: [Bioperl-l] Read/write round-tripping Was: Re: New Bioperl dependency? Sort::Naturally In-Reply-To: <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> Message-ID: <4BE6639B.6060004@gmail.com> Chris, I've thought some more on the problem and I now agree with you that round-tripping at the object-level is more powerful. It has the problem that some objects are given IDs dynamically every time, which means that identical input files won't have an identical object. > is_deeply( $obj_out , $obj_in , 'deep compare' ); > not ok 1 - deep compare > # Failed test 'deep compare' > # at ./test_roundtrip.pl line 33. > # Structures begin differing at: > # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '56438592' > # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '54980512' > 1..1 > # Looks like you failed 1 test of 1. And when I re-run this again: > not ok 1 - deep compare > # Failed test 'deep compare' > # at ./test_roundtrip.pl line 33. > # Structures begin differing at: > # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '47763264' > # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '46305184' > 1..1 > # Looks like you failed 1 test of 1. Note how the value of _btree changes everytime. Maybe using Test::Deep would be a good approach (http://search.cpan.org/~fdaly/Test-Deep-0.106/lib/Test/Deep.pod): > Where it becomes more interesting is in allowing you to do something > besides simple exact comparisons. With strings, the |eq| operator > checks that 2 strings are exactly equal but sometimes that's not what > you want. When you don't know exactly what the string should be but > you do know some things about how it should look, |eq| is no good and > you must use pattern matching instead. Test::Deep provides pattern > matching for complex data structures Florent On 09/05/10 10:02, Chris Fields wrote: > Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. > > chris > > On May 8, 2010, at 6:47 PM, Chris Fields wrote: > > >> To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. >> >> chris >> >> On May 8, 2010, at 6:34 AM, Florent Angly wrote: >> >> >>> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >>> >>> It looks like the Bio::SeqIO modules tests could use it as well. >>> >>> Cheers, >>> >>> Florent >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From ibi2008006 at iiita.ac.in Sun May 9 10:46:28 2010 From: ibi2008006 at iiita.ac.in (roserp) Date: Sun, 9 May 2010 07:46:28 -0700 (PDT) Subject: [Bioperl-l] where to find standard substitution matrices Message-ID: <28503204.post@talk.nabble.com> hi , I want blosum62, blosum80 , pam30, and pam70 matrices. I am getting different values in different sites for these matrices. can anyone suggest some authenticated site for getting these ?? thanks in advance -- View this message in context: http://old.nabble.com/where-to-find-standard-substitution-matrices-tp28503204p28503204.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From razi.khaja at gmail.com Sun May 9 15:23:47 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 15:23:47 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: Attached (blast.pm.diff) is a patch that fixes Heikki's problem. Can someone advise an appropriate way to have this patch applied, given that it is an amendment to a previous patch? Thanks Razi ---------- Forwarded message ---------- From: Heikki Lehvaslaiho Date: Wed, May 5, 2010 at 2:11 AM Subject: Re: [Bioperl-l] BLAST parsing broken To: Razi Khaja Hi Raja, Thanks for trying to fix this. I am attaching an example output file to this message. I just tested again that master from git repository fails to get query ID, but the previous version works. bala ~/src/bioperl-live> git checkout master Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp output Switched to branch 'master' When I started using the latest mpiBLAST code a few months ago I did compare the 0 output from it to standard NCBI blast and they were identical. Also, I've noticed a discrepancy between within bioperl blast parsing that I have not had time to work on. Would you be interested in having a look? I am creating output from mpiBLAST in 0 format and then converting it into tab-delimited 8 format. I am unable to get 100% similarity for all cases when I compare the conversion to the output straight from mpiBLAST in format 8. Sometimes the mismatch and gap values are off by one. I am attaching a script that does the conversion. It is the same one I was using when I noticed the problem above. I was going to put the code into bioperl but that got delayed when I noticed the discrepancies. Cheers, -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 4 May 2010 20:55, Razi Khaja wrote: > That is odd. Heikki, do you have a blast output file that produces this > error? > Could you attach the file and either send to the list or myself (if the > list > does not accept attachments). > Thanks, > Razi > > > On Mon, May 3, 2010 at 8:08 AM, Chris Fields > wrote: > > > Odd, I ran tests on that prior to commit. I'll work on fixing that (in > svn > > of course, until the migration is complete). > > > > chris > > > > On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > > > > > Chris, > > > > > > latest additions to Bio::SearchIO::blast.pm broke the parsing of > normal > > > blast output. $result->query_name returns now undef. > > > > > > (Using the anonymous git now). This change still works: > > > > > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > > Author: cjfields > > > Date: Sun Dec 20 04:39:58 2009 +0000 > > > > > > Robson's patch for buggy blastpgp output > > > > > > But this does not: > > > > > > commit 9a89c3434597104dd50553e3562983d78d14a544 > > > Author: cjfields > > > Date: Thu Apr 15 04:21:17 2010 +0000 > > > > > > [bug 3031] > > > > > > patches for catching algorithm ref, courtesy Razi Khaja. > > > > > > That makes it easy to find the diffs: > > > > > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > > > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > > > index 378023a..6f7eeeb 100644 > > > --- a/Bio/SearchIO/blast.pm > > > +++ b/Bio/SearchIO/blast.pm > > > @@ -209,6 +209,7 @@ BEGIN { > > > > > > 'BlastOutput_program' => 'RESULT-algorithm_name', > > > 'BlastOutput_version' => > 'RESULT-algorithm_version', > > > + 'BlastOutput_algorithm-reference' => > > 'RESULT-algorithm_reference', > > > 'BlastOutput_query-def' => 'RESULT-query_name', > > > 'BlastOutput_query-len' => 'RESULT-query_length', > > > 'BlastOutput_query-acc' => 'RESULT-query_accession', > > > @@ -504,6 +505,26 @@ sub next_result { > > > } > > > ); > > > } > > > + # parse the BLAST algorithm reference > > > + elsif(/^Reference:\s+(.*)$/) { > > > + # want to preserve newlines for the BLAST algorithm > > reference > > > + my $algorithm_reference = "$1\n"; > > > + $_ = $self->_readline; > > > + # while the current line, does not match an empty line, a > > RID:, > > > or a Database:, we are still looking at the > > > + # algorithm_reference, append it to what we parsed so far > > > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > > > + $algorithm_reference .= "$_"; > > > + $_ = $self->_readline; > > > + } > > > + # if we exited the while loop, we saw an empty line, a > RID:, > > or > > > a Database:, so push it back > > > + $self->_pushback($_); > > > + $self->element( > > > + { > > > + 'Name' => 'BlastOutput_algorithm-reference', > > > + 'Data' => $algorithm_reference > > > + } > > > + ); > > > + } > > > # added Windows workaround for bug 1985 > > > elsif (/^(Searching|Results from round)/) { > > > next unless $1 =~ /Results from round/; > > > > > > > > > I am not sure why reference parsing messes things up. Maybe it eats too > > many > > > lines from the result file. > > > > > > Yours, > > > > > > -Heikki > > > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > > #4216 > > > 4700 King Abdullah University of Science and Technology (KAUST) > > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: mpiblast.out Type: application/octet-stream Size: 34844 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastparser028.pl Type: application/x-perl Size: 2024 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blast.pm.diff Type: text/x-patch Size: 994 bytes Desc: not available URL: From cjfields at illinois.edu Sun May 9 16:43:29 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 15:43:29 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> If the patch is against main trunk it isn't a problem, otherwise the diff should be vs. that code. chris On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > Can someone advise an appropriate way to have this patch applied, given that > it is an amendment to a previous patch? > Thanks > Razi > > > ---------- Forwarded message ---------- > From: Heikki Lehvaslaiho > Date: Wed, May 5, 2010 at 2:11 AM > Subject: Re: [Bioperl-l] BLAST parsing broken > To: Razi Khaja > > > Hi Raja, > > Thanks for trying to fix this. > > I am attaching an example output file to this message. I just tested again > that master from git repository fails to get query ID, but the previous > version works. > > bala ~/src/bioperl-live> git checkout master > Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > output > Switched to branch 'master' > > When I started using the latest mpiBLAST code a few months ago I did compare > the 0 output from it to standard NCBI blast and they were identical. > > > > > Also, I've noticed a discrepancy between within bioperl blast parsing that > I have not had time to work on. Would you be interested in having a look? > > I am creating output from mpiBLAST in 0 format and then converting it into > tab-delimited 8 format. I am unable to get 100% similarity for all cases > when I compare the conversion to the output straight from mpiBLAST in format > 8. Sometimes the mismatch and gap values are off by one. > > I am attaching a script that does the conversion. It is the same one I was > using when I noticed the problem above. I was going to put the code into > bioperl but that got delayed when I noticed the discrepancies. > > > Cheers, > > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > On 4 May 2010 20:55, Razi Khaja wrote: > >> That is odd. Heikki, do you have a blast output file that produces this >> error? >> Could you attach the file and either send to the list or myself (if the >> list >> does not accept attachments). >> Thanks, >> Razi >> >> >> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >> wrote: >> >>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in >> svn >>> of course, until the migration is complete). >>> >>> chris >>> >>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>> >>>> Chris, >>>> >>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >> normal >>>> blast output. $result->query_name returns now undef. >>>> >>>> (Using the anonymous git now). This change still works: >>>> >>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>> Author: cjfields >>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>> >>>> Robson's patch for buggy blastpgp output >>>> >>>> But this does not: >>>> >>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>> Author: cjfields >>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>> >>>> [bug 3031] >>>> >>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>> >>>> That makes it easy to find the diffs: >>>> >>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>> index 378023a..6f7eeeb 100644 >>>> --- a/Bio/SearchIO/blast.pm >>>> +++ b/Bio/SearchIO/blast.pm >>>> @@ -209,6 +209,7 @@ BEGIN { >>>> >>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>> 'BlastOutput_version' => >> 'RESULT-algorithm_version', >>>> + 'BlastOutput_algorithm-reference' => >>> 'RESULT-algorithm_reference', >>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>> @@ -504,6 +505,26 @@ sub next_result { >>>> } >>>> ); >>>> } >>>> + # parse the BLAST algorithm reference >>>> + elsif(/^Reference:\s+(.*)$/) { >>>> + # want to preserve newlines for the BLAST algorithm >>> reference >>>> + my $algorithm_reference = "$1\n"; >>>> + $_ = $self->_readline; >>>> + # while the current line, does not match an empty line, a >>> RID:, >>>> or a Database:, we are still looking at the >>>> + # algorithm_reference, append it to what we parsed so far >>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { >>>> + $algorithm_reference .= "$_"; >>>> + $_ = $self->_readline; >>>> + } >>>> + # if we exited the while loop, we saw an empty line, a >> RID:, >>> or >>>> a Database:, so push it back >>>> + $self->_pushback($_); >>>> + $self->element( >>>> + { >>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>> + 'Data' => $algorithm_reference >>>> + } >>>> + ); >>>> + } >>>> # added Windows workaround for bug 1985 >>>> elsif (/^(Searching|Results from round)/) { >>>> next unless $1 =~ /Results from round/; >>>> >>>> >>>> I am not sure why reference parsing messes things up. Maybe it eats too >>> many >>>> lines from the result file. >>>> >>>> Yours, >>>> >>>> -Heikki >>>> >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>> >>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>> #4216 >>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Sun May 9 17:15:38 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 17:15:38 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: Hi Chris, The patch is against the main trunk. I checked out version 11326 of the repository today. Razi On Sun, May 9, 2010 at 4:43 PM, Chris Fields wrote: > If the patch is against main trunk it isn't a problem, otherwise the diff > should be vs. that code. > > chris > > On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > > > Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > > Can someone advise an appropriate way to have this patch applied, given > that > > it is an amendment to a previous patch? > > Thanks > > Razi > > > > > > ---------- Forwarded message ---------- > > From: Heikki Lehvaslaiho > > Date: Wed, May 5, 2010 at 2:11 AM > > Subject: Re: [Bioperl-l] BLAST parsing broken > > To: Razi Khaja > > > > > > Hi Raja, > > > > Thanks for trying to fix this. > > > > I am attaching an example output file to this message. I just tested > again > > that master from git repository fails to get query ID, but the previous > > version works. > > > > bala ~/src/bioperl-live> git checkout master > > Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > > output > > Switched to branch 'master' > > > > When I started using the latest mpiBLAST code a few months ago I did > compare > > the 0 output from it to standard NCBI blast and they were identical. > > > > > > > > > > Also, I've noticed a discrepancy between within bioperl blast parsing > that > > I have not had time to work on. Would you be interested in having a look? > > > > I am creating output from mpiBLAST in 0 format and then converting it > into > > tab-delimited 8 format. I am unable to get 100% similarity for all cases > > when I compare the conversion to the output straight from mpiBLAST in > format > > 8. Sometimes the mismatch and gap values are off by one. > > > > I am attaching a script that does the conversion. It is the same one I > was > > using when I noticed the problem above. I was going to put the code into > > bioperl but that got delayed when I noticed the discrepancies. > > > > > > Cheers, > > > > > > -Heikki > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > #4216 > > 4700 King Abdullah University of Science and Technology (KAUST) > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > > > > > On 4 May 2010 20:55, Razi Khaja wrote: > > > >> That is odd. Heikki, do you have a blast output file that produces this > >> error? > >> Could you attach the file and either send to the list or myself (if the > >> list > >> does not accept attachments). > >> Thanks, > >> Razi > >> > >> > >> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > >> wrote: > >> > >>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in > >> svn > >>> of course, until the migration is complete). > >>> > >>> chris > >>> > >>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>> > >>>> Chris, > >>>> > >>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >> normal > >>>> blast output. $result->query_name returns now undef. > >>>> > >>>> (Using the anonymous git now). This change still works: > >>>> > >>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>> Author: cjfields > >>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>> > >>>> Robson's patch for buggy blastpgp output > >>>> > >>>> But this does not: > >>>> > >>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>> Author: cjfields > >>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>> > >>>> [bug 3031] > >>>> > >>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>> > >>>> That makes it easy to find the diffs: > >>>> > >>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>> index 378023a..6f7eeeb 100644 > >>>> --- a/Bio/SearchIO/blast.pm > >>>> +++ b/Bio/SearchIO/blast.pm > >>>> @@ -209,6 +209,7 @@ BEGIN { > >>>> > >>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>> 'BlastOutput_version' => > >> 'RESULT-algorithm_version', > >>>> + 'BlastOutput_algorithm-reference' => > >>> 'RESULT-algorithm_reference', > >>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>> @@ -504,6 +505,26 @@ sub next_result { > >>>> } > >>>> ); > >>>> } > >>>> + # parse the BLAST algorithm reference > >>>> + elsif(/^Reference:\s+(.*)$/) { > >>>> + # want to preserve newlines for the BLAST algorithm > >>> reference > >>>> + my $algorithm_reference = "$1\n"; > >>>> + $_ = $self->_readline; > >>>> + # while the current line, does not match an empty line, a > >>> RID:, > >>>> or a Database:, we are still looking at the > >>>> + # algorithm_reference, append it to what we parsed so far > >>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) > { > >>>> + $algorithm_reference .= "$_"; > >>>> + $_ = $self->_readline; > >>>> + } > >>>> + # if we exited the while loop, we saw an empty line, a > >> RID:, > >>> or > >>>> a Database:, so push it back > >>>> + $self->_pushback($_); > >>>> + $self->element( > >>>> + { > >>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>> + 'Data' => $algorithm_reference > >>>> + } > >>>> + ); > >>>> + } > >>>> # added Windows workaround for bug 1985 > >>>> elsif (/^(Searching|Results from round)/) { > >>>> next unless $1 =~ /Results from round/; > >>>> > >>>> > >>>> I am not sure why reference parsing messes things up. Maybe it eats > too > >>> many > >>>> lines from the result file. > >>>> > >>>> Yours, > >>>> > >>>> -Heikki > >>>> > >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>> > >>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>> #4216 > >>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >_______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun May 9 17:30:52 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 16:30:52 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: Then something is wrong, as current trunk is at r16969. Where are you pulling your code from? Our only working anon. server is the sync'ed github one. chris On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > Hi Chris, > The patch is against the main trunk. I checked out version 11326 of the > repository today. > Razi > > > On Sun, May 9, 2010 at 4:43 PM, Chris Fields wrote: > >> If the patch is against main trunk it isn't a problem, otherwise the diff >> should be vs. that code. >> >> chris >> >> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: >> >>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. >>> Can someone advise an appropriate way to have this patch applied, given >> that >>> it is an amendment to a previous patch? >>> Thanks >>> Razi >>> >>> >>> ---------- Forwarded message ---------- >>> From: Heikki Lehvaslaiho >>> Date: Wed, May 5, 2010 at 2:11 AM >>> Subject: Re: [Bioperl-l] BLAST parsing broken >>> To: Razi Khaja >>> >>> >>> Hi Raja, >>> >>> Thanks for trying to fix this. >>> >>> I am attaching an example output file to this message. I just tested >> again >>> that master from git repository fails to get query ID, but the previous >>> version works. >>> >>> bala ~/src/bioperl-live> git checkout master >>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp >>> output >>> Switched to branch 'master' >>> >>> When I started using the latest mpiBLAST code a few months ago I did >> compare >>> the 0 output from it to standard NCBI blast and they were identical. >>> >>> >>> >>> >>> Also, I've noticed a discrepancy between within bioperl blast parsing >> that >>> I have not had time to work on. Would you be interested in having a look? >>> >>> I am creating output from mpiBLAST in 0 format and then converting it >> into >>> tab-delimited 8 format. I am unable to get 100% similarity for all cases >>> when I compare the conversion to the output straight from mpiBLAST in >> format >>> 8. Sometimes the mismatch and gap values are off by one. >>> >>> I am attaching a script that does the conversion. It is the same one I >> was >>> using when I noticed the problem above. I was going to put the code into >>> bioperl but that got delayed when I noticed the discrepancies. >>> >>> >>> Cheers, >>> >>> >>> -Heikki >>> >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +966 545 595 849 office: +966 2 808 2429 >>> >>> Computational Bioscience Research Centre (CBRC), Building #2, Office >> #4216 >>> 4700 King Abdullah University of Science and Technology (KAUST) >>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>> >>> >>> >>> On 4 May 2010 20:55, Razi Khaja wrote: >>> >>>> That is odd. Heikki, do you have a blast output file that produces this >>>> error? >>>> Could you attach the file and either send to the list or myself (if the >>>> list >>>> does not accept attachments). >>>> Thanks, >>>> Razi >>>> >>>> >>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >>>> wrote: >>>> >>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in >>>> svn >>>>> of course, until the migration is complete). >>>>> >>>>> chris >>>>> >>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>>>> >>>>>> Chris, >>>>>> >>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >>>> normal >>>>>> blast output. $result->query_name returns now undef. >>>>>> >>>>>> (Using the anonymous git now). This change still works: >>>>>> >>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>> Author: cjfields >>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>>>> >>>>>> Robson's patch for buggy blastpgp output >>>>>> >>>>>> But this does not: >>>>>> >>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>>>> Author: cjfields >>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>>>> >>>>>> [bug 3031] >>>>>> >>>>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>>>> >>>>>> That makes it easy to find the diffs: >>>>>> >>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>>>> index 378023a..6f7eeeb 100644 >>>>>> --- a/Bio/SearchIO/blast.pm >>>>>> +++ b/Bio/SearchIO/blast.pm >>>>>> @@ -209,6 +209,7 @@ BEGIN { >>>>>> >>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>>>> 'BlastOutput_version' => >>>> 'RESULT-algorithm_version', >>>>>> + 'BlastOutput_algorithm-reference' => >>>>> 'RESULT-algorithm_reference', >>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>>>> @@ -504,6 +505,26 @@ sub next_result { >>>>>> } >>>>>> ); >>>>>> } >>>>>> + # parse the BLAST algorithm reference >>>>>> + elsif(/^Reference:\s+(.*)$/) { >>>>>> + # want to preserve newlines for the BLAST algorithm >>>>> reference >>>>>> + my $algorithm_reference = "$1\n"; >>>>>> + $_ = $self->_readline; >>>>>> + # while the current line, does not match an empty line, a >>>>> RID:, >>>>>> or a Database:, we are still looking at the >>>>>> + # algorithm_reference, append it to what we parsed so far >>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) >> { >>>>>> + $algorithm_reference .= "$_"; >>>>>> + $_ = $self->_readline; >>>>>> + } >>>>>> + # if we exited the while loop, we saw an empty line, a >>>> RID:, >>>>> or >>>>>> a Database:, so push it back >>>>>> + $self->_pushback($_); >>>>>> + $self->element( >>>>>> + { >>>>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>>>> + 'Data' => $algorithm_reference >>>>>> + } >>>>>> + ); >>>>>> + } >>>>>> # added Windows workaround for bug 1985 >>>>>> elsif (/^(Searching|Results from round)/) { >>>>>> next unless $1 =~ /Results from round/; >>>>>> >>>>>> >>>>>> I am not sure why reference parsing messes things up. Maybe it eats >> too >>>>> many >>>>>> lines from the result file. >>>>>> >>>>>> Yours, >>>>>> >>>>>> -Heikki >>>>>> >>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>>> >>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>>> #4216 >>>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Sun May 9 19:48:28 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 19:48:28 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: I checked out bioperl-live from github: svn checkout http://svn.github.com/bioperl/bioperl-live.git I just checked it out again, a few seconds ago and by default I got revision 11326. Razi On Sun, May 9, 2010 at 5:30 PM, Chris Fields wrote: > Then something is wrong, as current trunk is at r16969. Where are you > pulling your code from? Our only working anon. server is the sync'ed github > one. > > chris > > On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > > > Hi Chris, > > The patch is against the main trunk. I checked out version 11326 of the > > repository today. > > Razi > > > > > > On Sun, May 9, 2010 at 4:43 PM, Chris Fields > wrote: > > > >> If the patch is against main trunk it isn't a problem, otherwise the > diff > >> should be vs. that code. > >> > >> chris > >> > >> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > >> > >>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > >>> Can someone advise an appropriate way to have this patch applied, given > >> that > >>> it is an amendment to a previous patch? > >>> Thanks > >>> Razi > >>> > >>> > >>> ---------- Forwarded message ---------- > >>> From: Heikki Lehvaslaiho > >>> Date: Wed, May 5, 2010 at 2:11 AM > >>> Subject: Re: [Bioperl-l] BLAST parsing broken > >>> To: Razi Khaja > >>> > >>> > >>> Hi Raja, > >>> > >>> Thanks for trying to fix this. > >>> > >>> I am attaching an example output file to this message. I just tested > >> again > >>> that master from git repository fails to get query ID, but the previous > >>> version works. > >>> > >>> bala ~/src/bioperl-live> git checkout master > >>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > >>> output > >>> Switched to branch 'master' > >>> > >>> When I started using the latest mpiBLAST code a few months ago I did > >> compare > >>> the 0 output from it to standard NCBI blast and they were identical. > >>> > >>> > >>> > >>> > >>> Also, I've noticed a discrepancy between within bioperl blast parsing > >> that > >>> I have not had time to work on. Would you be interested in having a > look? > >>> > >>> I am creating output from mpiBLAST in 0 format and then converting it > >> into > >>> tab-delimited 8 format. I am unable to get 100% similarity for all > cases > >>> when I compare the conversion to the output straight from mpiBLAST in > >> format > >>> 8. Sometimes the mismatch and gap values are off by one. > >>> > >>> I am attaching a script that does the conversion. It is the same one I > >> was > >>> using when I noticed the problem above. I was going to put the code > into > >>> bioperl but that got delayed when I noticed the discrepancies. > >>> > >>> > >>> Cheers, > >>> > >>> > >>> -Heikki > >>> > >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>> cell: +966 545 595 849 office: +966 2 808 2429 > >>> > >>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >> #4216 > >>> 4700 King Abdullah University of Science and Technology (KAUST) > >>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>> > >>> > >>> > >>> On 4 May 2010 20:55, Razi Khaja wrote: > >>> > >>>> That is odd. Heikki, do you have a blast output file that produces > this > >>>> error? > >>>> Could you attach the file and either send to the list or myself (if > the > >>>> list > >>>> does not accept attachments). > >>>> Thanks, > >>>> Razi > >>>> > >>>> > >>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > >>>> wrote: > >>>> > >>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that > (in > >>>> svn > >>>>> of course, until the migration is complete). > >>>>> > >>>>> chris > >>>>> > >>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>>>> > >>>>>> Chris, > >>>>>> > >>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >>>> normal > >>>>>> blast output. $result->query_name returns now undef. > >>>>>> > >>>>>> (Using the anonymous git now). This change still works: > >>>>>> > >>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>> Author: cjfields > >>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>>>> > >>>>>> Robson's patch for buggy blastpgp output > >>>>>> > >>>>>> But this does not: > >>>>>> > >>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>>>> Author: cjfields > >>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>>>> > >>>>>> [bug 3031] > >>>>>> > >>>>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>>>> > >>>>>> That makes it easy to find the diffs: > >>>>>> > >>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>>>> index 378023a..6f7eeeb 100644 > >>>>>> --- a/Bio/SearchIO/blast.pm > >>>>>> +++ b/Bio/SearchIO/blast.pm > >>>>>> @@ -209,6 +209,7 @@ BEGIN { > >>>>>> > >>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>>>> 'BlastOutput_version' => > >>>> 'RESULT-algorithm_version', > >>>>>> + 'BlastOutput_algorithm-reference' => > >>>>> 'RESULT-algorithm_reference', > >>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>>>> @@ -504,6 +505,26 @@ sub next_result { > >>>>>> } > >>>>>> ); > >>>>>> } > >>>>>> + # parse the BLAST algorithm reference > >>>>>> + elsif(/^Reference:\s+(.*)$/) { > >>>>>> + # want to preserve newlines for the BLAST algorithm > >>>>> reference > >>>>>> + my $algorithm_reference = "$1\n"; > >>>>>> + $_ = $self->_readline; > >>>>>> + # while the current line, does not match an empty line, > a > >>>>> RID:, > >>>>>> or a Database:, we are still looking at the > >>>>>> + # algorithm_reference, append it to what we parsed so > far > >>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ > /^Database:/) > >> { > >>>>>> + $algorithm_reference .= "$_"; > >>>>>> + $_ = $self->_readline; > >>>>>> + } > >>>>>> + # if we exited the while loop, we saw an empty line, a > >>>> RID:, > >>>>> or > >>>>>> a Database:, so push it back > >>>>>> + $self->_pushback($_); > >>>>>> + $self->element( > >>>>>> + { > >>>>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>>>> + 'Data' => $algorithm_reference > >>>>>> + } > >>>>>> + ); > >>>>>> + } > >>>>>> # added Windows workaround for bug 1985 > >>>>>> elsif (/^(Searching|Results from round)/) { > >>>>>> next unless $1 =~ /Results from round/; > >>>>>> > >>>>>> > >>>>>> I am not sure why reference parsing messes things up. Maybe it eats > >> too > >>>>> many > >>>>>> lines from the result file. > >>>>>> > >>>>>> Yours, > >>>>>> > >>>>>> -Heikki > >>>>>> > >>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>>> > >>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>>>> #4216 > >>>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun May 9 20:39:33 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 19:39:33 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> Ok, that's fine. It may be something off with revision numbers when using svn with github (git doesn't have incremental revisions, but a SHA). Committed the patch to dev svn, in r16970. chris On May 9, 2010, at 6:48 PM, Razi Khaja wrote: > I checked out bioperl-live from github: > svn checkout http://svn.github.com/bioperl/bioperl-live.git > > I just checked it out again, a few seconds ago and by default I got revision > 11326. > Razi > > > On Sun, May 9, 2010 at 5:30 PM, Chris Fields wrote: > >> Then something is wrong, as current trunk is at r16969. Where are you >> pulling your code from? Our only working anon. server is the sync'ed github >> one. >> >> chris >> >> On May 9, 2010, at 4:15 PM, Razi Khaja wrote: >> >>> Hi Chris, >>> The patch is against the main trunk. I checked out version 11326 of the >>> repository today. >>> Razi >>> >>> >>> On Sun, May 9, 2010 at 4:43 PM, Chris Fields >> wrote: >>> >>>> If the patch is against main trunk it isn't a problem, otherwise the >> diff >>>> should be vs. that code. >>>> >>>> chris >>>> >>>> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: >>>> >>>>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. >>>>> Can someone advise an appropriate way to have this patch applied, given >>>> that >>>>> it is an amendment to a previous patch? >>>>> Thanks >>>>> Razi >>>>> >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: Heikki Lehvaslaiho >>>>> Date: Wed, May 5, 2010 at 2:11 AM >>>>> Subject: Re: [Bioperl-l] BLAST parsing broken >>>>> To: Razi Khaja >>>>> >>>>> >>>>> Hi Raja, >>>>> >>>>> Thanks for trying to fix this. >>>>> >>>>> I am attaching an example output file to this message. I just tested >>>> again >>>>> that master from git repository fails to get query ID, but the previous >>>>> version works. >>>>> >>>>> bala ~/src/bioperl-live> git checkout master >>>>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp >>>>> output >>>>> Switched to branch 'master' >>>>> >>>>> When I started using the latest mpiBLAST code a few months ago I did >>>> compare >>>>> the 0 output from it to standard NCBI blast and they were identical. >>>>> >>>>> >>>>> >>>>> >>>>> Also, I've noticed a discrepancy between within bioperl blast parsing >>>> that >>>>> I have not had time to work on. Would you be interested in having a >> look? >>>>> >>>>> I am creating output from mpiBLAST in 0 format and then converting it >>>> into >>>>> tab-delimited 8 format. I am unable to get 100% similarity for all >> cases >>>>> when I compare the conversion to the output straight from mpiBLAST in >>>> format >>>>> 8. Sometimes the mismatch and gap values are off by one. >>>>> >>>>> I am attaching a script that does the conversion. It is the same one I >>>> was >>>>> using when I noticed the problem above. I was going to put the code >> into >>>>> bioperl but that got delayed when I noticed the discrepancies. >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> >>>>> -Heikki >>>>> >>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>> >>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>> #4216 >>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>> >>>>> >>>>> >>>>> On 4 May 2010 20:55, Razi Khaja wrote: >>>>> >>>>>> That is odd. Heikki, do you have a blast output file that produces >> this >>>>>> error? >>>>>> Could you attach the file and either send to the list or myself (if >> the >>>>>> list >>>>>> does not accept attachments). >>>>>> Thanks, >>>>>> Razi >>>>>> >>>>>> >>>>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >>>>>> wrote: >>>>>> >>>>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that >> (in >>>>>> svn >>>>>>> of course, until the migration is complete). >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>>>>>> >>>>>>>> Chris, >>>>>>>> >>>>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >>>>>> normal >>>>>>>> blast output. $result->query_name returns now undef. >>>>>>>> >>>>>>>> (Using the anonymous git now). This change still works: >>>>>>>> >>>>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>>>> Author: cjfields >>>>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>>>>>> >>>>>>>> Robson's patch for buggy blastpgp output >>>>>>>> >>>>>>>> But this does not: >>>>>>>> >>>>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>>>>>> Author: cjfields >>>>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>>>>>> >>>>>>>> [bug 3031] >>>>>>>> >>>>>>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>>>>>> >>>>>>>> That makes it easy to find the diffs: >>>>>>>> >>>>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>>>>>> index 378023a..6f7eeeb 100644 >>>>>>>> --- a/Bio/SearchIO/blast.pm >>>>>>>> +++ b/Bio/SearchIO/blast.pm >>>>>>>> @@ -209,6 +209,7 @@ BEGIN { >>>>>>>> >>>>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>>>>>> 'BlastOutput_version' => >>>>>> 'RESULT-algorithm_version', >>>>>>>> + 'BlastOutput_algorithm-reference' => >>>>>>> 'RESULT-algorithm_reference', >>>>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>>>>>> @@ -504,6 +505,26 @@ sub next_result { >>>>>>>> } >>>>>>>> ); >>>>>>>> } >>>>>>>> + # parse the BLAST algorithm reference >>>>>>>> + elsif(/^Reference:\s+(.*)$/) { >>>>>>>> + # want to preserve newlines for the BLAST algorithm >>>>>>> reference >>>>>>>> + my $algorithm_reference = "$1\n"; >>>>>>>> + $_ = $self->_readline; >>>>>>>> + # while the current line, does not match an empty line, >> a >>>>>>> RID:, >>>>>>>> or a Database:, we are still looking at the >>>>>>>> + # algorithm_reference, append it to what we parsed so >> far >>>>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ >> /^Database:/) >>>> { >>>>>>>> + $algorithm_reference .= "$_"; >>>>>>>> + $_ = $self->_readline; >>>>>>>> + } >>>>>>>> + # if we exited the while loop, we saw an empty line, a >>>>>> RID:, >>>>>>> or >>>>>>>> a Database:, so push it back >>>>>>>> + $self->_pushback($_); >>>>>>>> + $self->element( >>>>>>>> + { >>>>>>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>>>>>> + 'Data' => $algorithm_reference >>>>>>>> + } >>>>>>>> + ); >>>>>>>> + } >>>>>>>> # added Windows workaround for bug 1985 >>>>>>>> elsif (/^(Searching|Results from round)/) { >>>>>>>> next unless $1 =~ /Results from round/; >>>>>>>> >>>>>>>> >>>>>>>> I am not sure why reference parsing messes things up. Maybe it eats >>>> too >>>>>>> many >>>>>>>> lines from the result file. >>>>>>>> >>>>>>>> Yours, >>>>>>>> >>>>>>>> -Heikki >>>>>>>> >>>>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>>>>> >>>>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>>>>> #4216 >>>>>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> >>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cmb433 at nyu.edu Sun May 9 22:22:52 2010 From: cmb433 at nyu.edu (bergeycm) Date: Sun, 9 May 2010 19:22:52 -0700 (PDT) Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely Message-ID: <28506482.post@talk.nabble.com> Hi all, I'm attempting to query GenBank for all sequences' lengths for a given taxon. I'm using get_Stream_by_query(), but only to grab the species, length, and accession. The genus of interest has almost 500,000 GB entries, though, and my code hangs up at odd points in the info-gathering loop. (Often after only 300 or 400 iterations.) The problem is that $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back undefined. I've tried wrapping the next_seq portion of the code in an eval block, but to no avail. Is there a way to split a query into a bunch of small streams that aren't too much to ask? Or is there a way to pick up a dropped SeqIO stream? I think the connection is timing out and the stream is being lost. Any advice is greatly appreciated, as I'm fairly new to BioPerl. - bergeycm use Bio::DB::GenBank; use Bio::DB::Query::GenBank; # Get general things ready to go for querying GenBank my %options; $options{'-maxids'} = '500000'; # There are presently 460,184 sequences $options{'-db'} = 'nucleotide'; $options{'-query'} = "Pongo [ORGN]"; # Orangutans my $query_obj = Bio::DB::Query::GenBank->new(%options); my $total = $query_obj->count; my $gb_obj = Bio::DB::GenBank->new(); my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); # Restrict info to just what I'll be using. No sequence necessary. my $builder = $stream_obj->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('species','length','accession'); my $c = 0; for (1 .. $total) { eval { my $seq_obj = $stream_obj->next_seq; my $flavor = $seq_obj->species; print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", $seq_obj->length, "\t", $seq_obj->accession, "\n"; }; if ($@) { print $!, '\n'; } # Pause for a little over a third of a second select(undef, undef, undef, 0.35); $c++; } -- View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From robert.bradbury at gmail.com Mon May 10 01:38:09 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 10 May 2010 01:38:09 -0400 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <28506482.post@talk.nabble.com> References: <28506482.post@talk.nabble.com> Message-ID: I don't know whether this is related or not. But the last time I tried to fetch a moderately large genome (NS_000198 for *Podospera anserina*) it failed [1]. It takes a *very* long time and eventually springs an "Out of Memory" error. This is on a Pentium IV Prescott which only has a 4GB address space (configured for 3GB for user programs) and after running a long strace on the perl process it seemed that what was happening was that it was never properly returning and merging memory from the sequence chunks which were being fetched. The final program address was brk(0xafd8c000) or 2,950,217,728 which is probably the maximum amount of data space a user program can have considering that one needs room for the stack. After that the mmap2() calls started failing with ENOMEM. If Bio::DB::GenBank::Query is intelligent enough to only fetch just the requested fields you should be ok. But if it fetches the entire GenBank record and simply throws away the sequence information and you are running into large sequences (say a big chunk of a chromosome) and this ends up hitting the memory/swap space limits on your machine that could be a problem. If the program is running for a long time I'd be inclined to check my system logs to see if one is running out of memory/swap. You can also watch the process using ps to determine if the VSZ grows continuously. I think I mentioned this before on the BioPerl list but never had a clear understanding of what was going on and may not have filed a bug report. I think I eventually worked around it, perhaps by fetching the offending (large) sequence using wget or a browser. Robert 1. Given that NS_000198 is only ~7MB (4.6 million actual bases) the BioPerl memory management has to be really poor in merging/reusing if the fetch uses ~3GB. From bhakti.dwivedi at gmail.com Mon May 10 11:22:41 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Mon, 10 May 2010 11:22:41 -0400 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface Message-ID: Does anyone know why the blast results vary for a query sequence when search is conducted using a web-based interface versus a Command line interface? For example, my web-based blast top hits do not match the top hits of the command line blast (blastcl3). I am using the default settings in both. not sure why the results are different Even if the hit is there, the e-value, bit score etc are different for the same hsp regions identified within the hit. is there a difference in the blast algorithm? or is it the database? Thanks! From cjfields at illinois.edu Mon May 10 12:28:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:28:15 -0500 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: References: Message-ID: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> The default web-based parameters differ than those via blastcl3, so if you are using the defaults for both they may differ somewhat. chris On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: > Does anyone know why the blast results vary for a query sequence when search > is conducted using a web-based interface versus a Command line interface? > > For example, my web-based blast top hits do not match the top hits of the > command line blast (blastcl3). I am using the default settings in both. > not sure why the results are different Even if the hit is there, the > e-value, bit score etc are different for the same hsp regions identified > within the hit. is there a difference in the blast algorithm? or is it the > database? > > Thanks! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 12:31:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:31:15 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: References: <28506482.post@talk.nabble.com> Message-ID: On May 10, 2010, at 12:38 AM, Robert Bradbury wrote: > I don't know whether this is related or not. But the last time I tried to > fetch a moderately large genome (NS_000198 for *Podospera anserina*) it > failed [1]. It takes a *very* long time and eventually springs an "Out of > Memory" error. This is on a Pentium IV Prescott which only has a 4GB > address space (configured for 3GB for user programs) and after running a > long strace on the perl process it seemed that what was happening was that > it was never properly returning and merging memory from the sequence chunks > which were being fetched. The final program address was brk(0xafd8c000) or > 2,950,217,728 which is probably the maximum amount of data space a user > program can have considering that one needs room for the stack. After that > the mmap2() calls started failing with ENOMEM. That's odd. What OS? > If Bio::DB::GenBank::Query is intelligent enough to only fetch just the > requested fields you should be ok. But if it fetches the entire GenBank > record and simply throws away the sequence information and you are running > into large sequences (say a big chunk of a chromosome) and this ends up > hitting the memory/swap space limits on your machine that could be a > problem. Yes, that may happen, as (at the moment) we push everything into memory; there are no lazy or DB-linked Seq instances, at least not yet. Very large sequences take a lot of time (object instantiation) and a lot of memory. To tell the truth, that seems to be the default of most toolkits, but we have recently talked about possible ways to deal with it, just need the tuits for it (as with anything). The other alternative is to pull the sequences down locally as a raw text file. This can still be done within BioPerl, just using Bio::DB::EUtilities: my $in = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nuccore', -email => 'cjfields at bioperl.org', -rettype => 'gbwithparts', -id => 'NS_000198'); $in->get_Response(-file => "$id.gb"); > If the program is running for a long time I'd be inclined to check my system > logs to see if one is running out of memory/swap. You can also watch the > process using ps to determine if the VSZ grows continuously. > > I think I mentioned this before on the BioPerl list but never had a clear > understanding of what was going on and may not have filed a bug report. I > think I eventually worked around it, perhaps by fetching the offending > (large) sequence using wget or a browser. You can still file a bug on it; does help with keeping track (just reporting it here doesn't help much, it gets lost in the shuffle). > Robert > > 1. Given that NS_000198 is only ~7MB (4.6 million actual bases) the BioPerl > memory management has to be really poor in merging/reusing if the fetch uses > ~3GB. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l BioPerl stores everything in memory, but I've worked with 4.6Mbp genomes quite a bit on my MB Pro. However, the default mode for Bio;:DB::GenBank is to pull down everything using 'gbwithparts'. This file is much larger doing so (sequence is ~34Mbp, file is ~51 MB). Maybe that's the problem? If you can please file a bug report, along with the relevant information. That helps us determine the best course of action. chris From cjfields at illinois.edu Mon May 10 12:32:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:32:43 -0500 Subject: [Bioperl-l] Read/write round-tripping Was: Re: New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE6639B.6060004@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> <4BE6639B.6060004@gmail.com> Message-ID: <4B47AB3F-3190-4ACC-8235-8F5D6DBE7DC6@illinois.edu> If there is dynamic ID assignment I would assume you can't compare them between runs, so using is_deeply() won't work as advertised since we already know the ID will change between runs anyway, it's a self-fulfilling prophecy. Also, is_deeply() here is inspecting the SF::Collection blessed hash directly (the _btree is a tied DB_File hash), not sure that's what you want either. So at this point I would have to ask myself: 1) Is the dynamic ID assignment a bug (e.g. should we be using a fixed ID of some sort)? If not, we can't expect these to match across runs, so is_deeply won't work. 2) Would it make more sense to explicitly inspect the handled objects (SF::Collection) directly via method calls? For instance, if I want to see whether a set of features falls within a region, is that reproducible between runs? Either way, I'm not sure what using Test::Deeply would gain you, as it's still meant to inspect complex data structures, just with a bit more sugar than Test::More and is_deeply(). Per #2 above, I would be more explicit in inspecting the SF::Collection: my $collection = $contig->get_features_collection; # check that IDs in SF::Collection conform to a regex using like() # inspect other things about the collection... chris On May 9, 2010, at 2:26 AM, Florent Angly wrote: > Chris, > > I've thought some more on the problem and I now agree with you that round-tripping at the object-level is more powerful. > > It has the problem that some objects are given IDs dynamically every time, which means that identical input files won't have an identical object. > >> is_deeply( $obj_out , $obj_in , 'deep compare' ); > >> not ok 1 - deep compare >> # Failed test 'deep compare' >> # at ./test_roundtrip.pl line 33. >> # Structures begin differing at: >> # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '56438592' >> # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '54980512' >> 1..1 >> # Looks like you failed 1 test of 1. > > > And when I re-run this again: > >> not ok 1 - deep compare >> # Failed test 'deep compare' >> # at ./test_roundtrip.pl line 33. >> # Structures begin differing at: >> # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '47763264' >> # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '46305184' >> 1..1 >> # Looks like you failed 1 test of 1. > > Note how the value of _btree changes everytime. > > Maybe using Test::Deep would be a good approach (http://search.cpan.org/~fdaly/Test-Deep-0.106/lib/Test/Deep.pod): >> Where it becomes more interesting is in allowing you to do something besides simple exact comparisons. With strings, the |eq| operator checks that 2 strings are exactly equal but sometimes that's not what you want. When you don't know exactly what the string should be but you do know some things about how it should look, |eq| is no good and you must use pattern matching instead. Test::Deep provides pattern matching for complex data structures > > Florent > > > > > On 09/05/10 10:02, Chris Fields wrote: >> Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. >> >> chris >> >> On May 8, 2010, at 6:47 PM, Chris Fields wrote: >> >> >>> To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. >>> >>> chris >>> >>> On May 8, 2010, at 6:34 AM, Florent Angly wrote: >>> >>> >>>> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >>>> >>>> It looks like the Bio::SeqIO modules tests could use it as well. >>>> >>>> Cheers, >>>> >>>> Florent >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 12:58:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:58:07 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <28506482.post@talk.nabble.com> References: <28506482.post@talk.nabble.com> Message-ID: <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> 500000 sequences is way too many to request, even in a loop. Under most circumstances this is breaking NCBI's eutils policies: http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements so don't be too surprised this is failing (this would be around 1000 queried of 500 sequences per query). You could try pulling down the raw sequence via batch entrez or using Bio::DB::EUtilities (which should die if an error occurs). chris On May 9, 2010, at 9:22 PM, bergeycm wrote: > > Hi all, > > I'm attempting to query GenBank for all sequences' lengths for a given > taxon. I'm using get_Stream_by_query(), but only to grab the species, > length, and accession. The genus of interest has almost 500,000 GB entries, > though, and my code hangs up at odd points in the info-gathering loop. > (Often after only 300 or 400 iterations.) The problem is that > $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back > undefined. > > I've tried wrapping the next_seq portion of the code in an eval block, but > to no avail. Is there a way to split a query into a bunch of small streams > that aren't too much to ask? Or is there a way to pick up a dropped SeqIO > stream? I think the connection is timing out and the stream is being lost. > Any advice is greatly appreciated, as I'm fairly new to BioPerl. > > - bergeycm > > > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > > # Get general things ready to go for querying GenBank > my %options; > $options{'-maxids'} = '500000'; # There are presently 460,184 sequences > $options{'-db'} = 'nucleotide'; > $options{'-query'} = "Pongo [ORGN]"; # Orangutans > > > my $query_obj = Bio::DB::Query::GenBank->new(%options); > my $total = $query_obj->count; > > my $gb_obj = Bio::DB::GenBank->new(); > my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); > > # Restrict info to just what I'll be using. No sequence necessary. > my $builder = $stream_obj->sequence_builder(); > $builder->want_none(); > $builder->add_wanted_slot('species','length','accession'); > > my $c = 0; > > for (1 .. $total) { > eval { > my $seq_obj = $stream_obj->next_seq; > my $flavor = $seq_obj->species; > print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", > $seq_obj->length, "\t", $seq_obj->accession, "\n"; > }; > > if ($@) { > print $!, '\n'; > } > > # Pause for a little over a third of a second > select(undef, undef, undef, 0.35); > > $c++; > } > > > > -- > View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 13:07:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 12:07:00 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> References: <28506482.post@talk.nabble.com> <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> Message-ID: <58E399D4-A884-4DC1-A5C6-8B0CBDDB173A@illinois.edu> (addendum added, sent too early) On May 10, 2010, at 11:58 AM, Chris Fields wrote: > 500000 sequences is way too many to request, even in a loop. Under most circumstances this is breaking NCBI's eutils policies: > > http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements > > so don't be too surprised this is failing (this would be around 1000 queried of 500 sequences per query). > > You could try pulling down the raw sequence via batch entrez or using Bio::DB::EUtilities (which should die if an error occurs). But you may still run into issues with eutils at some point, particularly if running this at peak times. > > chris > > On May 9, 2010, at 9:22 PM, bergeycm wrote: > >> >> Hi all, >> >> I'm attempting to query GenBank for all sequences' lengths for a given >> taxon. I'm using get_Stream_by_query(), but only to grab the species, >> length, and accession. The genus of interest has almost 500,000 GB entries, >> though, and my code hangs up at odd points in the info-gathering loop. >> (Often after only 300 or 400 iterations.) The problem is that >> $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back >> undefined. >> >> I've tried wrapping the next_seq portion of the code in an eval block, but >> to no avail. Is there a way to split a query into a bunch of small streams >> that aren't too much to ask? Or is there a way to pick up a dropped SeqIO >> stream? I think the connection is timing out and the stream is being lost. >> Any advice is greatly appreciated, as I'm fairly new to BioPerl. >> >> - bergeycm >> >> >> >> use Bio::DB::GenBank; >> use Bio::DB::Query::GenBank; >> >> >> # Get general things ready to go for querying GenBank >> my %options; >> $options{'-maxids'} = '500000'; # There are presently 460,184 sequences >> $options{'-db'} = 'nucleotide'; >> $options{'-query'} = "Pongo [ORGN]"; # Orangutans >> >> >> my $query_obj = Bio::DB::Query::GenBank->new(%options); >> my $total = $query_obj->count; >> >> my $gb_obj = Bio::DB::GenBank->new(); >> my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); >> >> # Restrict info to just what I'll be using. No sequence necessary. >> my $builder = $stream_obj->sequence_builder(); >> $builder->want_none(); >> $builder->add_wanted_slot('species','length','accession'); >> >> my $c = 0; >> >> for (1 .. $total) { >> eval { >> my $seq_obj = $stream_obj->next_seq; >> my $flavor = $seq_obj->species; >> print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", >> $seq_obj->length, "\t", $seq_obj->accession, "\n"; >> }; >> >> if ($@) { >> print $!, '\n'; >> } >> >> # Pause for a little over a third of a second >> select(undef, undef, undef, 0.35); >> >> $c++; >> } >> >> >> >> -- >> View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Mon May 10 13:14:36 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Mon, 10 May 2010 18:14:36 +0100 Subject: [Bioperl-l] Bio::Align - alignment by position? In-Reply-To: References: Message-ID: <003701caf064$441c4660$cc54d320$%yin@ucd.ie> Hi, When you use $aln->slice(), there is a third optional parameter to keep gap-only columns in newly created slice, e.g. $aln2=$aln->slice(20,30,1); By defining the third parameter, you can keep gap-only sub sequences. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From bhakti.dwivedi at gmail.com Mon May 10 14:35:37 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Mon, 10 May 2010 14:35:37 -0400 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> References: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> Message-ID: Thanks Chris! I changed few parameter values in blastcl3 and now the results are same. Any particular reason to set the default differently in web-based and command-line blast search? Bhakti On Mon, May 10, 2010 at 12:28 PM, Chris Fields wrote: > The default web-based parameters differ than those via blastcl3, so if you > are using the defaults for both they may differ somewhat. > > chris > > On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: > > > Does anyone know why the blast results vary for a query sequence when > search > > is conducted using a web-based interface versus a Command line interface? > > > > For example, my web-based blast top hits do not match the top hits of > the > > command line blast (blastcl3). I am using the default settings in both. > > not sure why the results are different Even if the hit is there, the > > e-value, bit score etc are different for the same hsp regions identified > > within the hit. is there a difference in the blast algorithm? or is it > the > > database? > > > > Thanks! > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 10 15:47:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 14:47:56 -0500 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: References: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> Message-ID: you would need to ask NCBI that. chris On May 10, 2010, at 1:35 PM, Bhakti Dwivedi wrote: > Thanks Chris! I changed few parameter values in blastcl3 and now the > results are same. Any particular reason to set the default differently in > web-based and command-line blast search? > > Bhakti > > > > On Mon, May 10, 2010 at 12:28 PM, Chris Fields wrote: > >> The default web-based parameters differ than those via blastcl3, so if you >> are using the defaults for both they may differ somewhat. >> >> chris >> >> On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: >> >>> Does anyone know why the blast results vary for a query sequence when >> search >>> is conducted using a web-based interface versus a Command line interface? >>> >>> For example, my web-based blast top hits do not match the top hits of >> the >>> command line blast (blastcl3). I am using the default settings in both. >>> not sure why the results are different Even if the hit is there, the >>> e-value, bit score etc are different for the same hsp regions identified >>> within the hit. is there a difference in the blast algorithm? or is it >> the >>> database? >>> >>> Thanks! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dimitark at bii.a-star.edu.sg Mon May 10 22:03:51 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 11 May 2010 10:03:51 +0800 Subject: [Bioperl-l] StandAloneFasta and Too many open files Message-ID: <4BE8BB07.3040407@bii.a-star.edu.sg> Hi guys, yesterday i got the following error: 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' from the following code: ------------ my $ssout="my_seq_out.txt"; print "SS:$tquery:\n:$tseq:\n"; my @sargs=( 'q' => '', 'E' => '1', 'w' => '100', 'O' => "$ssout", 'program' => "ssearch36", ); my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); $fac_ss->library($tmpseq); my @sreport=$fac_ss->run($tqtmp); foreach my $sr (@sreport){ while(my $result=$sr->next_result){ while(my $hit=$result->next_hit){ while(my $hsp=$hit->next_hsp){ my $iden=$hsp->frac_identical; $rv3=$iden; # print "IDEN:$iden:$rv1\n"; } } } } -------------------- I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. Next i commented out the above code and rewrote my script into the following: -------------- my $ssout="my_seq_out.txt"; my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); system(@sargs) == 0 or die "system @sargs failed: $!"; my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); while(my $result=$sreport->next_result){ # print Dumper($result); while(my $hit=$result->next_hit){ while(my $hsp=$hit->next_hsp){ my $iden=$hsp->frac_identical; $rv3=$iden; # print "IDEN:$iden:$rv1\n"; } } } --------------- Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. Ok that is it. I just wanted to inform about my experience and to report the problem. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From cjfields at illinois.edu Mon May 10 23:04:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 22:04:12 -0500 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <4BE8BB07.3040407@bii.a-star.edu.sg> References: <4BE8BB07.3040407@bii.a-star.edu.sg> Message-ID: <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: > Hi guys, > yesterday i got the following error: > > 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' > > from the following code: > ------------ > my $ssout="my_seq_out.txt"; > print "SS:$tquery:\n:$tseq:\n"; > my @sargs=( > 'q' => '', > 'E' => '1', > 'w' => '100', > 'O' => "$ssout", > 'program' => "ssearch36", > ); > my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); > $fac_ss->library($tmpseq); > my @sreport=$fac_ss->run($tqtmp); > > foreach my $sr (@sreport){ > while(my $result=$sr->next_result){ > while(my $hit=$result->next_hit){ > while(my $hsp=$hit->next_hsp){ > my $iden=$hsp->frac_identical; > $rv3=$iden; > # print "IDEN:$iden:$rv1\n"; > } > } > } > } > -------------------- > I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. > I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. > Next i commented out the above code and rewrote my script into the following: > -------------- > my $ssout="my_seq_out.txt"; > my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); > system(@sargs) == 0 or die "system @sargs failed: $!"; > > my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); > while(my $result=$sreport->next_result){ > # print Dumper($result); > while(my $hit=$result->next_hit){ > while(my $hsp=$hit->next_hsp){ > > my $iden=$hsp->frac_identical; > $rv3=$iden; > # print "IDEN:$iden:$rv1\n"; > } > } > } > --------------- > Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. > > I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. > > Ok that is it. I just wanted to inform about my experience and to report the problem. > > Cheers > Dimitar Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. chris From cjfields at illinois.edu Mon May 10 23:57:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 22:57:18 -0500 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> References: <4BE8BB07.3040407@bii.a-star.edu.sg> <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> Message-ID: <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> Addendum to that last post. On May 10, 2010, at 10:04 PM, Chris Fields wrote: > On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: > >> Hi guys, >> yesterday i got the following error: >> >> 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' >> >> from the following code: >> ------------ >> my $ssout="my_seq_out.txt"; >> print "SS:$tquery:\n:$tseq:\n"; >> my @sargs=( >> 'q' => '', >> 'E' => '1', >> 'w' => '100', >> 'O' => "$ssout", >> 'program' => "ssearch36", >> ); >> my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); >> $fac_ss->library($tmpseq); >> my @sreport=$fac_ss->run($tqtmp); >> >> foreach my $sr (@sreport){ >> while(my $result=$sr->next_result){ >> while(my $hit=$result->next_hit){ >> while(my $hsp=$hit->next_hsp){ >> my $iden=$hsp->frac_identical; >> $rv3=$iden; >> # print "IDEN:$iden:$rv1\n"; >> } >> } >> } >> } >> -------------------- >> I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. >> I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. >> Next i commented out the above code and rewrote my script into the following: >> -------------- >> my $ssout="my_seq_out.txt"; >> my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); >> system(@sargs) == 0 or die "system @sargs failed: $!"; >> >> my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); >> while(my $result=$sreport->next_result){ >> # print Dumper($result); >> while(my $hit=$result->next_hit){ >> while(my $hsp=$hit->next_hsp){ >> >> my $iden=$hsp->frac_identical; >> $rv3=$iden; >> # print "IDEN:$iden:$rv1\n"; >> } >> } >> } >> --------------- >> Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. >> >> I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. >> >> Ok that is it. I just wanted to inform about my experience and to report the problem. >> >> Cheers >> Dimitar > > > Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. > > chris Dimitar, I think Peter had answered this before, might indicate the problem is actually using the 'O' option in output. We can look at possibly just capturing STDOUT instead, but we may not support the use of 'O' if it's as buggy as indicated. http://groups.google.com/group/bioperl-l/msg/25c17748d1ac6ef4 chris From dimitark at bii.a-star.edu.sg Tue May 11 00:24:13 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 11 May 2010 12:24:13 +0800 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> References: <4BE8BB07.3040407@bii.a-star.edu.sg> <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> Message-ID: <4BE8DBED.2000209@bii.a-star.edu.sg> Hi Chris, thank you for the information. I checked it out. I wrote you and the list about that as well. To you on 16.04.2010 and to the list on 23.04.2010. There i explained that i modified the module. Now i pass it the '0' option but this option is not passed to the actual program executed by system. I just add my desired output with "> $output" to the parameter line passed to system. In the email mentioned above i attached the modified version of the module. I was digging again a bit about the module. I found that - line(359): ----------- unless( $outfile ) { open(FASTARUN, "$para |") || $self->throw($@);#original $object=Bio::SearchIO->new(-fh=>\*FASTARUN, #original -format=>"fasta");#original } else { ------------ And here another one when the 'O' is used - line(371): --------- $object = Bio::SearchIO->new(-file=>$self->O, -format=>"fasta"); ---------- May be the problem is here. Because i didnt see anywhere a 'close' for these filehandles. I can test and tell if i was right. Cheers Dimitar On 05/11/2010 11:57 AM, Chris Fields wrote: > Addendum to that last post. > > On May 10, 2010, at 10:04 PM, Chris Fields wrote: > > >> On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: >> >> >>> Hi guys, >>> yesterday i got the following error: >>> >>> 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' >>> >>> from the following code: >>> ------------ >>> my $ssout="my_seq_out.txt"; >>> print "SS:$tquery:\n:$tseq:\n"; >>> my @sargs=( >>> 'q' => '', >>> 'E' => '1', >>> 'w' => '100', >>> 'O' => "$ssout", >>> 'program' => "ssearch36", >>> ); >>> my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); >>> $fac_ss->library($tmpseq); >>> my @sreport=$fac_ss->run($tqtmp); >>> >>> foreach my $sr (@sreport){ >>> while(my $result=$sr->next_result){ >>> while(my $hit=$result->next_hit){ >>> while(my $hsp=$hit->next_hsp){ >>> my $iden=$hsp->frac_identical; >>> $rv3=$iden; >>> # print "IDEN:$iden:$rv1\n"; >>> } >>> } >>> } >>> } >>> -------------------- >>> I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. >>> I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. >>> Next i commented out the above code and rewrote my script into the following: >>> -------------- >>> my $ssout="my_seq_out.txt"; >>> my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq> $ssout"); >>> system(@sargs) == 0 or die "system @sargs failed: $!"; >>> >>> my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); >>> while(my $result=$sreport->next_result){ >>> # print Dumper($result); >>> while(my $hit=$result->next_hit){ >>> while(my $hsp=$hit->next_hsp){ >>> >>> my $iden=$hsp->frac_identical; >>> $rv3=$iden; >>> # print "IDEN:$iden:$rv1\n"; >>> } >>> } >>> } >>> --------------- >>> Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. >>> >>> I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. >>> >>> Ok that is it. I just wanted to inform about my experience and to report the problem. >>> >>> Cheers >>> Dimitar >>> >> >> Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. >> >> chris >> > Dimitar, > > I think Peter had answered this before, might indicate the problem is actually using the 'O' option in output. We can look at possibly just capturing STDOUT instead, but we may not support the use of 'O' if it's as buggy as indicated. > > http://groups.google.com/group/bioperl-l/msg/25c17748d1ac6ef4 > > chris > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From heikki.lehvaslaiho at gmail.com Tue May 11 01:40:14 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Tue, 11 May 2010 08:40:14 +0300 Subject: [Bioperl-l] Github possibilities Message-ID: FYI http://chem-bla-ics.blogspot.com/2010/05/github-simplifies-code-review-and.html -Heikki From heikki.lehvaslaiho at gmail.com Tue May 11 01:43:42 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Tue, 11 May 2010 08:43:42 +0300 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> Message-ID: Thanks Razi and Chris, Blast parsing works again beautifully. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 10 May 2010 03:39, Chris Fields wrote: > Ok, that's fine. It may be something off with revision numbers when using > svn with github (git doesn't have incremental revisions, but a SHA). > Committed the patch to dev svn, in r16970. > > chris > > On May 9, 2010, at 6:48 PM, Razi Khaja wrote: > > > I checked out bioperl-live from github: > > svn checkout http://svn.github.com/bioperl/bioperl-live.git > > > > I just checked it out again, a few seconds ago and by default I got > revision > > 11326. > > Razi > > > > > > On Sun, May 9, 2010 at 5:30 PM, Chris Fields > wrote: > > > >> Then something is wrong, as current trunk is at r16969. Where are you > >> pulling your code from? Our only working anon. server is the sync'ed > github > >> one. > >> > >> chris > >> > >> On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > >> > >>> Hi Chris, > >>> The patch is against the main trunk. I checked out version 11326 of > the > >>> repository today. > >>> Razi > >>> > >>> > >>> On Sun, May 9, 2010 at 4:43 PM, Chris Fields > >> wrote: > >>> > >>>> If the patch is against main trunk it isn't a problem, otherwise the > >> diff > >>>> should be vs. that code. > >>>> > >>>> chris > >>>> > >>>> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > >>>> > >>>>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > >>>>> Can someone advise an appropriate way to have this patch applied, > given > >>>> that > >>>>> it is an amendment to a previous patch? > >>>>> Thanks > >>>>> Razi > >>>>> > >>>>> > >>>>> ---------- Forwarded message ---------- > >>>>> From: Heikki Lehvaslaiho > >>>>> Date: Wed, May 5, 2010 at 2:11 AM > >>>>> Subject: Re: [Bioperl-l] BLAST parsing broken > >>>>> To: Razi Khaja > >>>>> > >>>>> > >>>>> Hi Raja, > >>>>> > >>>>> Thanks for trying to fix this. > >>>>> > >>>>> I am attaching an example output file to this message. I just tested > >>>> again > >>>>> that master from git repository fails to get query ID, but the > previous > >>>>> version works. > >>>>> > >>>>> bala ~/src/bioperl-live> git checkout master > >>>>> Previous HEAD position was 5e278f5... Robson's patch for buggy > blastpgp > >>>>> output > >>>>> Switched to branch 'master' > >>>>> > >>>>> When I started using the latest mpiBLAST code a few months ago I did > >>>> compare > >>>>> the 0 output from it to standard NCBI blast and they were identical. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Also, I've noticed a discrepancy between within bioperl blast > parsing > >>>> that > >>>>> I have not had time to work on. Would you be interested in having a > >> look? > >>>>> > >>>>> I am creating output from mpiBLAST in 0 format and then converting it > >>>> into > >>>>> tab-delimited 8 format. I am unable to get 100% similarity for all > >> cases > >>>>> when I compare the conversion to the output straight from mpiBLAST in > >>>> format > >>>>> 8. Sometimes the mismatch and gap values are off by one. > >>>>> > >>>>> I am attaching a script that does the conversion. It is the same one > I > >>>> was > >>>>> using when I noticed the problem above. I was going to put the code > >> into > >>>>> bioperl but that got delayed when I noticed the discrepancies. > >>>>> > >>>>> > >>>>> Cheers, > >>>>> > >>>>> > >>>>> -Heikki > >>>>> > >>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>> > >>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>>> #4216 > >>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>> > >>>>> > >>>>> > >>>>> On 4 May 2010 20:55, Razi Khaja wrote: > >>>>> > >>>>>> That is odd. Heikki, do you have a blast output file that produces > >> this > >>>>>> error? > >>>>>> Could you attach the file and either send to the list or myself (if > >> the > >>>>>> list > >>>>>> does not accept attachments). > >>>>>> Thanks, > >>>>>> Razi > >>>>>> > >>>>>> > >>>>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > > >>>>>> wrote: > >>>>>> > >>>>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that > >> (in > >>>>>> svn > >>>>>>> of course, until the migration is complete). > >>>>>>> > >>>>>>> chris > >>>>>>> > >>>>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>>>>>> > >>>>>>>> Chris, > >>>>>>>> > >>>>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >>>>>> normal > >>>>>>>> blast output. $result->query_name returns now undef. > >>>>>>>> > >>>>>>>> (Using the anonymous git now). This change still works: > >>>>>>>> > >>>>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>>>> Author: cjfields > >>>>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>>>>>> > >>>>>>>> Robson's patch for buggy blastpgp output > >>>>>>>> > >>>>>>>> But this does not: > >>>>>>>> > >>>>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>>>>>> Author: cjfields > >>>>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>>>>>> > >>>>>>>> [bug 3031] > >>>>>>>> > >>>>>>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>>>>>> > >>>>>>>> That makes it easy to find the diffs: > >>>>>>>> > >>>>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>>>>>> index 378023a..6f7eeeb 100644 > >>>>>>>> --- a/Bio/SearchIO/blast.pm > >>>>>>>> +++ b/Bio/SearchIO/blast.pm > >>>>>>>> @@ -209,6 +209,7 @@ BEGIN { > >>>>>>>> > >>>>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>>>>>> 'BlastOutput_version' => > >>>>>> 'RESULT-algorithm_version', > >>>>>>>> + 'BlastOutput_algorithm-reference' => > >>>>>>> 'RESULT-algorithm_reference', > >>>>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>>>>>> @@ -504,6 +505,26 @@ sub next_result { > >>>>>>>> } > >>>>>>>> ); > >>>>>>>> } > >>>>>>>> + # parse the BLAST algorithm reference > >>>>>>>> + elsif(/^Reference:\s+(.*)$/) { > >>>>>>>> + # want to preserve newlines for the BLAST algorithm > >>>>>>> reference > >>>>>>>> + my $algorithm_reference = "$1\n"; > >>>>>>>> + $_ = $self->_readline; > >>>>>>>> + # while the current line, does not match an empty > line, > >> a > >>>>>>> RID:, > >>>>>>>> or a Database:, we are still looking at the > >>>>>>>> + # algorithm_reference, append it to what we parsed so > >> far > >>>>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ > >> /^Database:/) > >>>> { > >>>>>>>> + $algorithm_reference .= "$_"; > >>>>>>>> + $_ = $self->_readline; > >>>>>>>> + } > >>>>>>>> + # if we exited the while loop, we saw an empty line, > a > >>>>>> RID:, > >>>>>>> or > >>>>>>>> a Database:, so push it back > >>>>>>>> + $self->_pushback($_); > >>>>>>>> + $self->element( > >>>>>>>> + { > >>>>>>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>>>>>> + 'Data' => $algorithm_reference > >>>>>>>> + } > >>>>>>>> + ); > >>>>>>>> + } > >>>>>>>> # added Windows workaround for bug 1985 > >>>>>>>> elsif (/^(Searching|Results from round)/) { > >>>>>>>> next unless $1 =~ /Results from round/; > >>>>>>>> > >>>>>>>> > >>>>>>>> I am not sure why reference parsing messes things up. Maybe it > eats > >>>> too > >>>>>>> many > >>>>>>>> lines from the result file. > >>>>>>>> > >>>>>>>> Yours, > >>>>>>>> > >>>>>>>> -Heikki > >>>>>>>> > >>>>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>>>>> > >>>>>>>> Computational Bioscience Research Centre (CBRC), Building #2, > Office > >>>>>>> #4216 > >>>>>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>> >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cmb433 at nyu.edu Sun May 9 19:40:48 2010 From: cmb433 at nyu.edu (bergeycm) Date: Sun, 9 May 2010 16:40:48 -0700 (PDT) Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely Message-ID: <28506482.post@talk.nabble.com> Hi all, I'm attempting to query GenBank for all sequences' lengths for a given taxon. I'm using get_Stream_by_query(), but only to grab the species, length, and accession. The genus of interest has almost 500,000 GB entries, though, and my code hangs up at odd points in the info-gathering loop. (Often after only 300 or 400 iterations.) The problem is that $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back undefined. I've tried wrapping the next_seq portion of the code in an eval block, but to no avail. Is there a way to split a query into a bunch of small streams that aren't too much to ask? Or is there a way to pick up a dropped SeqIO stream? I think the connection is timing out and the stream is being lost. Any advice is greatly appreciated, as I'm fairly new to BioPerl. - bergeycm use Bio::DB::GenBank; use Bio::DB::Query::GenBank; # Get general things ready to go for querying GenBank my %options; $options{'-maxids'} = '500000'; # There are presently 460,184 sequences $options{'-db'} = 'nucleotide'; $options{'-query'} = "Pongo [ORGN]"; # Orangutans my $query_obj = Bio::DB::Query::GenBank->new(%options); my $total = $query_obj->count; my $gb_obj = Bio::DB::GenBank->new(); my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); # Restrict info to just what I'll be using. No sequence necessary. my $builder = $stream_obj->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('species','length','accession'); my $c = 0; for (1 .. $total) { eval { my $seq_obj = $stream_obj->next_seq; my $flavor = $seq_obj->species; print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", $seq_obj->length, "\t", $seq_obj->accession, "\n"; }; if ($@) { print $!, '\n'; } # Pause for a little over a third of a second select(undef, undef, undef, 0.35); $c++; } -- View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From sudeep.mehrotra at mail.mcgill.ca Tue May 11 09:40:07 2010 From: sudeep.mehrotra at mail.mcgill.ca (Sudeep Mehrotra) Date: Tue, 11 May 2010 09:40:07 -0400 Subject: [Bioperl-l] [Fwd: Re: Modules in Bio:Tree] Message-ID: <4BE95E37.3060702@mail.mcgill.ca> Hello Jason, Your suggestion worked. Thanks. I have two format (NEXUS and NEWICK) for the same tree. I want to obtain a "clade list" in other words is there a way to obtain the leaves which are members of a clade. For example,part of NEXUS file has following entry: other entries ....... 655 Deinococcus_geothermalis, 656 Deinococcus_radiodurans, 657 Thermus_thermophilus, 658 Thermus_sp. ; other entries........ (((((655,656)[])[])[],(((657,658)[])[])[])[])[])[])[]); From the tree I can observe that 657 and 658 are members of a subclade and 655 and 656 are member of another subclade and both these belong to one clade. I want to get this membership information. I tried looking for a module in Bio::Tree but not able to find any. In Bio::NEXUS package there is a module "walk" which I thought would work for me, but it does not. Also, the Bio::NEXUS package is just not working for me. From the documentation the input file they are using it different from what I have. Is there any way I get the membership information as shown earlier. Cheers -- Sudeep Mehrotra (Ph.D. Candidate) McGill University and Genome Quebec Innovation Center -------------- next part -------------- An embedded message was scrubbed... From: Jason Stajich Subject: Re: Modules in Bio:Tree Date: Wed, 5 May 2010 18:45:41 -0400 Size: 5420 URL: From amackey at virginia.edu Tue May 11 17:26:50 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 11 May 2010 17:26:50 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug Message-ID: Hi Zerui (and others), I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, specifically in this code: lines: 1170: (-start => int ($loc->start / 3 ) +1, 1171: -end => int ($loc->end / 3 ) +1, both of those lines should look like: int (($loc->start - 1) / 3) + 1 otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) There is also a problem when mapping exon coordinates that are outside/after the CDS region (instead of getting undefined locations, you continue to get peptide coordinates, but they are invalid, larger than the protein length). Dennis and fringy -- this may affect the SNPtab.pl script I wrote for you, as it uses this module to calculate codons for SNPs. -Aaron P.S. a script the demonstrates the problem: use Bio::Coordinate::GeneMapper; my $mapper = Bio::Coordinate::GeneMapper ->new( -in => "chr", -out => "propeptide", -exons => [ Bio::Location::Simple ->new( -start => 101, -end => 109 ), Bio::Location::Simple ->new( -start => 201, -end => 221 ), ], -cds => Bio::Location::Simple ->new(-start => 101, -end => 209), ); print join("\t", "chr", "aa"), "\n"; for my $pos (99..111,199..211) { my $res = $mapper->map( Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => 1)); my $start = $res->start; $start = "NA" unless defined $start; my $end = $res->end; $end = "NA" unless defined $end; print join("\t", $pos, $start), "\n"; } From cjfields at illinois.edu Tue May 11 18:31:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 May 2010 17:31:17 -0500 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: Message-ID: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Aaron, Do we want to write this up as a set of tests to add to the bioperl test suite? We can probably add it after the github migration tomorrow. chris On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > Hi Zerui (and others), > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > specifically in this code: > > lines: > 1170: (-start => int ($loc->start / 3 ) +1, > 1171: -end => int ($loc->end / 3 ) +1, > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > There is also a problem when mapping exon coordinates that are outside/after > the CDS region (instead of getting undefined locations, you continue to get > peptide coordinates, but they are invalid, larger than the protein length). > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for you, > as it uses this module to calculate codons for SNPs. > > -Aaron > > P.S. a script the demonstrates the problem: > > use Bio::Coordinate::GeneMapper; > > my $mapper = > Bio::Coordinate::GeneMapper > ->new( -in => "chr", > -out => "propeptide", > -exons => [ Bio::Location::Simple > ->new( -start => 101, > -end => 109 ), > Bio::Location::Simple > ->new( -start => 201, > -end => 221 ), > ], > -cds => Bio::Location::Simple > ->new(-start => 101, -end => 209), > ); > > > print join("\t", "chr", "aa"), "\n"; > for my $pos (99..111,199..211) { > my $res = $mapper->map( > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => 1)); > my $start = $res->start; $start = "NA" unless defined $start; > my $end = $res->end; $end = "NA" unless defined $end; > print join("\t", $pos, $start), "\n"; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From amackey at virginia.edu Tue May 11 18:40:11 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 11 May 2010 18:40:11 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Hi Chris, I was hoping Heikki might take up the cause and investigate further -- let's give him a chance to respond. But it's a frightening bug if it's really been that way for all this time ... -Aaron On Tue, May 11, 2010 at 6:31 PM, Chris Fields wrote: > Aaron, > > Do we want to write this up as a set of tests to add to the bioperl test > suite? We can probably add it after the github migration tomorrow. > > chris > > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > > > Hi Zerui (and others), > > > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > > specifically in this code: > > > > lines: > > 1170: (-start => int ($loc->start / 3 ) +1, > > 1171: -end => int ($loc->end / 3 ) +1, > > > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > > > There is also a problem when mapping exon coordinates that are > outside/after > > the CDS region (instead of getting undefined locations, you continue to > get > > peptide coordinates, but they are invalid, larger than the protein > length). > > > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for > you, > > as it uses this module to calculate codons for SNPs. > > > > -Aaron > > > > P.S. a script the demonstrates the problem: > > > > use Bio::Coordinate::GeneMapper; > > > > my $mapper = > > Bio::Coordinate::GeneMapper > > ->new( -in => "chr", > > -out => "propeptide", > > -exons => [ Bio::Location::Simple > > ->new( -start => 101, > > -end => 109 ), > > Bio::Location::Simple > > ->new( -start => 201, > > -end => 221 ), > > ], > > -cds => Bio::Location::Simple > > ->new(-start => 101, -end => 209), > > ); > > > > > > print join("\t", "chr", "aa"), "\n"; > > for my $pos (99..111,199..211) { > > my $res = $mapper->map( > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => > 1)); > > my $start = $res->start; $start = "NA" unless defined $start; > > my $end = $res->end; $end = "NA" unless defined $end; > > print join("\t", $pos, $start), "\n"; > > } > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed May 12 00:15:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 May 2010 23:15:54 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow Message-ID: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Just a friendly reminder that we'll freeze the dev subversion repository tomorrow prior to migration to github. The migration will take about an hour, during which all bioperl github repos will be replaced with the full repos, and devs added. The test repos will be removed around that time (Heikki, will that be a problem?). chris From heikki.lehvaslaiho at gmail.com Wed May 12 00:23:07 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Wed, 12 May 2010 07:23:07 +0300 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Message-ID: No problem at all. Go ahead. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 07:15, Chris Fields wrote: > Just a friendly reminder that we'll freeze the dev subversion repository > tomorrow prior to migration to github. The migration will take about an > hour, during which all bioperl github repos will be replaced with the full > repos, and devs added. The test repos will be removed around that time > (Heikki, will that be a problem?). > > chris From heikki.lehvaslaiho at gmail.com Wed May 12 06:23:03 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Wed, 12 May 2010 13:23:03 +0300 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Outch. I'll definitely have a look. Strange that none of the tests have picked this up... -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 01:40, Aaron Mackey wrote: > Hi Chris, > > I was hoping Heikki might take up the cause and investigate further -- > let's > give him a chance to respond. But it's a frightening bug if it's really > been that way for all this time ... > > -Aaron > > On Tue, May 11, 2010 at 6:31 PM, Chris Fields > wrote: > > > Aaron, > > > > Do we want to write this up as a set of tests to add to the bioperl test > > suite? We can probably add it after the github migration tomorrow. > > > > chris > > > > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > > > > > Hi Zerui (and others), > > > > > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > > > specifically in this code: > > > > > > lines: > > > 1170: (-start => int ($loc->start / 3 ) +1, > > > 1171: -end => int ($loc->end / 3 ) +1, > > > > > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > > > > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect > peptide > > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > > > > > There is also a problem when mapping exon coordinates that are > > outside/after > > > the CDS region (instead of getting undefined locations, you continue to > > get > > > peptide coordinates, but they are invalid, larger than the protein > > length). > > > > > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for > > you, > > > as it uses this module to calculate codons for SNPs. > > > > > > -Aaron > > > > > > P.S. a script the demonstrates the problem: > > > > > > use Bio::Coordinate::GeneMapper; > > > > > > my $mapper = > > > Bio::Coordinate::GeneMapper > > > ->new( -in => "chr", > > > -out => "propeptide", > > > -exons => [ Bio::Location::Simple > > > ->new( -start => 101, > > > -end => 109 ), > > > Bio::Location::Simple > > > ->new( -start => 201, > > > -end => 221 ), > > > ], > > > -cds => Bio::Location::Simple > > > ->new(-start => 101, -end => 209), > > > ); > > > > > > > > > print join("\t", "chr", "aa"), "\n"; > > > for my $pos (99..111,199..211) { > > > my $res = $mapper->map( > > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => > > 1)); > > > my $start = $res->start; $start = "NA" unless defined $start; > > > my $end = $res->end; $end = "NA" unless defined $end; > > > print join("\t", $pos, $start), "\n"; > > > } > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed May 12 12:24:49 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:24:49 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <4BEAD562.1010702@cornell.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: <97B3DF77-C657-4E7C-8298-529F474E1FA5@illinois.edu> Yup, haven't started the migration yet (I'm taking down some crontab scripts used for prior github updates, nightly builds). Then I'll announce before freezing the repo. chris On May 12, 2010, at 11:20 AM, Robert Buels wrote: > The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. > > R > > Heikki Lehvaslaiho wrote: >> No problem at all. Go ahead. >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +966 545 595 849 office: +966 2 808 2429 >> Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 >> 4700 King Abdullah University of Science and Technology (KAUST) >> Thuwal 23955-6900, Kingdom of Saudi Arabia >> On 12 May 2010 07:15, Chris Fields wrote: >>> Just a friendly reminder that we'll freeze the dev subversion repository >>> tomorrow prior to migration to github. The migration will take about an >>> hour, during which all bioperl github repos will be replaced with the full >>> repos, and devs added. The test repos will be removed around that time >>> (Heikki, will that be a problem?). >>> >>> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Wed May 12 12:20:50 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 12 May 2010 09:20:50 -0700 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Message-ID: <4BEAD562.1010702@cornell.edu> The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. R Heikki Lehvaslaiho wrote: > No problem at all. Go ahead. > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > On 12 May 2010 07:15, Chris Fields wrote: > >> Just a friendly reminder that we'll freeze the dev subversion repository >> tomorrow prior to migration to github. The migration will take about an >> hour, during which all bioperl github repos will be replaced with the full >> repos, and devs added. The test repos will be removed around that time >> (Heikki, will that be a problem?). >> >> chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed May 12 12:43:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:43:42 -0500 Subject: [Bioperl-l] dev.open-bio.org SVN is now read-only Message-ID: Just like the subject says, switched the repo to a read only status. I'm starting the github migration now. chris From thomas.sharpton at gmail.com Wed May 12 12:45:22 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 12 May 2010 09:45:22 -0700 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <4BEAD562.1010702@cornell.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: Sorry if I screwed things up - updated before checking this email tread. -T On May 12, 2010, at 9:20 AM, Robert Buels wrote: > The SVN repository is not frozen yet, driveby_bot just say 16984 go > into svn from Thomas Sharpton. > > R > > Heikki Lehvaslaiho wrote: >> No problem at all. Go ahead. >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +966 545 595 849 office: +966 2 808 2429 >> Computational Bioscience Research Centre (CBRC), Building #2, >> Office #4216 >> 4700 King Abdullah University of Science and Technology (KAUST) >> Thuwal 23955-6900, Kingdom of Saudi Arabia >> On 12 May 2010 07:15, Chris Fields wrote: >>> Just a friendly reminder that we'll freeze the dev subversion >>> repository >>> tomorrow prior to migration to github. The migration will take >>> about an >>> hour, during which all bioperl github repos will be replaced with >>> the full >>> repos, and devs added. The test repos will be removed around that >>> time >>> (Heikki, will that be a problem?). >>> >>> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed May 12 12:47:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:47:36 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: <08E7C628-D914-43C0-AB3D-E8FC41A144DC@illinois.edu> No problem, just froze the repo and rsynced to my local machine, so your commit made it just under the wire. chris On May 12, 2010, at 11:45 AM, Thomas Sharpton wrote: > Sorry if I screwed things up - updated before checking this email tread. > > -T > > On May 12, 2010, at 9:20 AM, Robert Buels wrote: > >> The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. >> >> R >> >> Heikki Lehvaslaiho wrote: >>> No problem at all. Go ahead. >>> -Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +966 545 595 849 office: +966 2 808 2429 >>> Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 >>> 4700 King Abdullah University of Science and Technology (KAUST) >>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>> On 12 May 2010 07:15, Chris Fields wrote: >>>> Just a friendly reminder that we'll freeze the dev subversion repository >>>> tomorrow prior to migration to github. The migration will take about an >>>> hour, during which all bioperl github repos will be replaced with the full >>>> repos, and devs added. The test repos will be removed around that time >>>> (Heikki, will that be a problem?). >>>> >>>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maizemu at gmail.com Wed May 12 13:12:28 2010 From: maizemu at gmail.com (Christopher Bottoms) Date: Wed, 12 May 2010 12:12:28 -0500 Subject: [Bioperl-l] Citing CPAN modules in scientific publications Message-ID: Dear BioPerlers, I am working on a publication which would be impossible without the use of several CPAN modules. I appreciate the work authors and maintainers have put into these modules and would like to acknowledge them by citing their work. I was thinking of a format such as Author(s), Maintainer(s) *Module::Name* [ http://search.cpan.org/dist/Module-Name] A reference for File::Slurp would appear thus: Uri Guttman, Dave Rolsky *File::Slurp* [ http://search.cpan.org/dist/File-Slurp] I guess that I could instead mention authors in an acknowledgment section. I noticed a large acknowledgment section in the BioPerl paper ( http://genome.cshlp.org/content/12/10/1611.full). Thanks for your time, Christopher Bottoms (molecules) From greg at ebi.ac.uk Wed May 12 14:16:53 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Wed, 12 May 2010 19:16:53 +0100 Subject: [Bioperl-l] BioPerl for indexing quality score files Message-ID: Hi all, I'm wondering if anyone has tried using BioPerl to index sequence quality score files? The files I'm looking at tend to look like Fasta files, but with numbers (between 0 and 99) and spaces instead of sequence strings. Something like: --- >chr1 0 20 20 20 50 99 99 99 99 30 30 20 20 10 10 0 0 0 0 --- (An example for Chimpanzee can be found here, as the file 'panTro2.quals.fa.gz': http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ ) I'm currently using a home-brewed file indexing system to access subsets of these quality scores, but it's kind of slow and (probably) buggy. I'd much rather use something like Bio::DB::Fasta, but (without having actually tried it) I expect it wouldn't be too happy with these not-quite-fasta format quality files. Has anyone run into a similar situation and found a solution using Bioperl (or something else)? I'd be happy to hack around a bit to get this to work, if necessary; if anyone could provide pointers on where to start, I'd be much obliged. Cheers, Greg PS - it's great to see the GitHub migration moving along so swiftly! I'll be *much* more likely to start bug-hunting and patch-submitting with the code living there now. :) From greg at ebi.ac.uk Wed May 12 14:26:26 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Wed, 12 May 2010 19:26:26 +0100 Subject: [Bioperl-l] BioPerl for indexing quality score files In-Reply-To: References: Message-ID: Ok, I need to shame myself with a huge "RTFM" for this one -- http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/DB/Qual.pm Sorry for the spam. Still happy about the GitHub, though! greg On 12 May 2010 19:16, Gregory Jordan wrote: > Hi all, > > I'm wondering if anyone has tried using BioPerl to index sequence quality > score files? The files I'm looking at tend to look like Fasta files, but > with numbers (between 0 and 99) and spaces instead of sequence strings. > Something like: > --- > >chr1 > 0 20 20 20 50 99 99 99 99 30 30 20 20 10 10 0 0 0 0 > --- > (An example for Chimpanzee can be found here, as the file > 'panTro2.quals.fa.gz': > http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ ) > > I'm currently using a home-brewed file indexing system to access subsets of > these quality scores, but it's kind of slow and (probably) buggy. I'd much > rather use something like Bio::DB::Fasta, but (without having actually tried > it) I expect it wouldn't be too happy with these not-quite-fasta format > quality files. > > Has anyone run into a similar situation and found a solution using Bioperl > (or something else)? > > I'd be happy to hack around a bit to get this to work, if necessary; if > anyone could provide pointers on where to start, I'd be much obliged. > > Cheers, > Greg > > PS - it's great to see the GitHub migration moving along so swiftly! I'll > be *much* more likely to start bug-hunting and patch-submitting with the > code living there now. :) > From cjfields at illinois.edu Wed May 12 14:48:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 13:48:53 -0500 Subject: [Bioperl-l] GitHub migration complete Message-ID: All, The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. I've added collaborators to almost all repositories; unfortunately, GitHub decided to remove 'copy permissions' for adding collaborators just recently, so we'll have to manually add each in to each repo until that is resolved (from what I hear, should be soon). In the meantime, if you are a bioperl developer and aren't listed as a github collaborator please sign up for a github account, add SSH keys, and let me know your github user name. I'll add you to bioperl-live and any other repos you want (please let me know which ones!). I'll be doing a few last-minute house-cleaning bits (adding post-receive hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. Let me know how it goes! chris From David.Messina at sbc.su.se Wed May 12 15:59:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 12 May 2010 21:59:14 +0200 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: Thanks, Chris! Clone and commit are working here. Dave From Kevin.M.Brown at asu.edu Wed May 12 16:06:38 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 12 May 2010 13:06:38 -0700 Subject: [Bioperl-l] Citing CPAN modules in scientific publications In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> Wouldn't the format of the citation actually be dictated by the publication the paper was going to be in? E.g. the APA guide sets the format to be: Jones, D. F. (2002). The Mental Measurement Tester (Version 3.2) [Computer software]. Fort Lauderdale, FL: Nova Southeastern University. Retrieved July 22, 2007. Available from http://www.buros.com/ Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Christopher Bottoms > Sent: Wednesday, May 12, 2010 10:12 AM > To: bioperl-l List > Subject: [Bioperl-l] Citing CPAN modules in scientific publications > > Dear BioPerlers, > > I am working on a publication which would be impossible > without the use of > several CPAN modules. I appreciate the work authors and > maintainers have put > into these modules and would like to acknowledge them by > citing their work. > > I was thinking of a format such as > Author(s), Maintainer(s) *Module::Name* [ > http://search.cpan.org/dist/Module-Name] > > > A reference for File::Slurp would appear thus: > > Uri Guttman, Dave Rolsky *File::Slurp* [ > http://search.cpan.org/dist/File-Slurp] > > > I guess that I could instead mention authors in an > acknowledgment section. I > noticed a large acknowledgment section in the BioPerl paper ( > http://genome.cshlp.org/content/12/10/1611.full). > > Thanks for your time, > Christopher Bottoms (molecules) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Wed May 12 16:35:27 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 12 May 2010 15:35:27 -0500 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> On May 12, 2010, at 1:48 PM, Chris Fields wrote: > The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. You mean clones? ;) Thanks Chris!! This is *awesome*. I'm really glad we're in git now and very much appreciate all your work on this. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Wed May 12 17:34:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 16:34:39 -0500 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> References: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> Message-ID: On May 12, 2010, at 3:35 PM, Jay Hannah wrote: > On May 12, 2010, at 1:48 PM, Chris Fields wrote: >> The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. > > You mean clones? ;) > > Thanks Chris!! This is *awesome*. I'm really glad we're in git now and very much appreciate all your work on this. > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah Yes, that was svn slipping in there... chris From maj at fortinbras.us Wed May 12 21:44:09 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 12 May 2010 21:44:09 -0400 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: <77C82E975CC24860AA16EE537E270FBD@NewLife> awesome job, Chris- MAJ (what's git again? Oh never mind...) ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Wednesday, May 12, 2010 2:48 PM Subject: [Bioperl-l] GitHub migration complete > All, > > The migration to github is now essentially complete, minus a few small > house-keeping details. Please let me know if there are problems with > checkouts. > > I've added collaborators to almost all repositories; unfortunately, GitHub > decided to remove 'copy permissions' for adding collaborators just recently, > so we'll have to manually add each in to each repo until that is resolved > (from what I hear, should be soon). In the meantime, if you are a bioperl > developer and aren't listed as a github collaborator please sign up for a > github account, add SSH keys, and let me know your github user name. I'll add > you to bioperl-live and any other repos you want (please let me know which > ones!). > > I'll be doing a few last-minute house-cleaning bits (adding post-receive > hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. Let > me know how it goes! > > chris > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maizemu at gmail.com Wed May 12 23:27:47 2010 From: maizemu at gmail.com (Christopher Bottoms) Date: Wed, 12 May 2010 22:27:47 -0500 Subject: [Bioperl-l] Citing CPAN modules in scientific publications In-Reply-To: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> Message-ID: Thanks. I was also wondering about listing the maintainer. I'm guessing not, since the maintainer can add herself (or himself) to the list of authors if she felt that she had contributed enough to warrant it. On Wed, May 12, 2010 at 3:06 PM, Kevin Brown wrote: > Wouldn't the format of the citation actually be dictated by the > publication the paper was going to be in? E.g. the APA guide sets the > format to be: > > Jones, D. F. (2002). The Mental Measurement Tester (Version 3.2) > [Computer software]. > Fort Lauderdale, FL: Nova Southeastern University. Retrieved > July 22, 2007. > Available from http://www.buros.com/ > > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Christopher Bottoms > > Sent: Wednesday, May 12, 2010 10:12 AM > > To: bioperl-l List > > Subject: [Bioperl-l] Citing CPAN modules in scientific publications > > > > Dear BioPerlers, > > > > I am working on a publication which would be impossible > > without the use of > > several CPAN modules. I appreciate the work authors and > > maintainers have put > > into these modules and would like to acknowledge them by > > citing their work. > > > > I was thinking of a format such as > > Author(s), Maintainer(s) *Module::Name* [ > > http://search.cpan.org/dist/Module-Name] > > > > > > A reference for File::Slurp would appear thus: > > > > Uri Guttman, Dave Rolsky *File::Slurp* [ > > http://search.cpan.org/dist/File-Slurp] > > > > > > I guess that I could instead mention authors in an > > acknowledgment section. I > > noticed a large acknowledgment section in the BioPerl paper ( > > http://genome.cshlp.org/content/12/10/1611.full). > > > > Thanks for your time, > > Christopher Bottoms (molecules) > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki.lehvaslaiho at gmail.com Thu May 13 02:11:40 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 09:11:40 +0300 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: <77C82E975CC24860AA16EE537E270FBD@NewLife> References: <77C82E975CC24860AA16EE537E270FBD@NewLife> Message-ID: It works. Bliss. Worth mentioning now on the list that the latest instructions are in http://www.bioperl.org/wiki/Using_Git I've recommitted the the two changes I did on the experimental repo. I had a small problem when editing the README text file: git was not showing differences between the original file and my edits. It kept saying that bala ~/src/bioperl-live> git diff README diff --git a/README b/README index 03685a8..8e20592 100644 Binary files a/README and b/README differ The reason, of course, was that a hard to detect binary character had slipped in to my edit. Just so that you know when this happens to you... -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 13 May 2010 04:44, Mark A. Jensen wrote: > awesome job, Chris- MAJ > (what's git again? Oh never mind...) > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Wednesday, May 12, 2010 2:48 PM > Subject: [Bioperl-l] GitHub migration complete > > > > All, >> >> The migration to github is now essentially complete, minus a few small >> house-keeping details. Please let me know if there are problems with >> checkouts. >> >> I've added collaborators to almost all repositories; unfortunately, GitHub >> decided to remove 'copy permissions' for adding collaborators just recently, >> so we'll have to manually add each in to each repo until that is resolved >> (from what I hear, should be soon). In the meantime, if you are a bioperl >> developer and aren't listed as a github collaborator please sign up for a >> github account, add SSH keys, and let me know your github user name. I'll >> add you to bioperl-live and any other repos you want (please let me know >> which ones!). >> >> I'll be doing a few last-minute house-cleaning bits (adding post-receive >> hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. >> Let me know how it goes! >> >> chris >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki.lehvaslaiho at gmail.com Thu May 13 02:20:51 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 09:20:51 +0300 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Just a thumbs up. Aaron's fix works. It problem seems to be limited to where he spotted it. I am working on refreshing my memory how the code work - it has been quite a few years since I wrote it - and will commit better tests. As of getting values outseide the defined region, that is a feature rather than a bug. The idea was to be able to ask what would the new coordinate be if the feature extended beyond the known limits. The is the capability of Bio::Coordinate::ExtrapolatingPair that is used here. That class also has a method strict that can be used to prevent extrapolating, but the code to access that has not been written into GeneMapper. I'll see if I can get it to work. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 13:23, Heikki Lehvaslaiho wrote: > Outch. I'll definitely have a look. > > Strange that none of the tests have picked this up... > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > > On 12 May 2010 01:40, Aaron Mackey wrote: > >> Hi Chris, >> >> I was hoping Heikki might take up the cause and investigate further -- >> let's >> give him a chance to respond. But it's a frightening bug if it's really >> been that way for all this time ... >> >> -Aaron >> >> On Tue, May 11, 2010 at 6:31 PM, Chris Fields >> wrote: >> >> > Aaron, >> > >> > Do we want to write this up as a set of tests to add to the bioperl test >> > suite? We can probably add it after the github migration tomorrow. >> > >> > chris >> > >> > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: >> > >> > > Hi Zerui (and others), >> > > >> > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, >> > > specifically in this code: >> > > >> > > lines: >> > > 1170: (-start => int ($loc->start / 3 ) +1, >> > > 1171: -end => int ($loc->end / 3 ) +1, >> > > >> > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 >> > > >> > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect >> peptide >> > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) >> > > >> > > There is also a problem when mapping exon coordinates that are >> > outside/after >> > > the CDS region (instead of getting undefined locations, you continue >> to >> > get >> > > peptide coordinates, but they are invalid, larger than the protein >> > length). >> > > >> > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for >> > you, >> > > as it uses this module to calculate codons for SNPs. >> > > >> > > -Aaron >> > > >> > > P.S. a script the demonstrates the problem: >> > > >> > > use Bio::Coordinate::GeneMapper; >> > > >> > > my $mapper = >> > > Bio::Coordinate::GeneMapper >> > > ->new( -in => "chr", >> > > -out => "propeptide", >> > > -exons => [ Bio::Location::Simple >> > > ->new( -start => 101, >> > > -end => 109 ), >> > > Bio::Location::Simple >> > > ->new( -start => 201, >> > > -end => 221 ), >> > > ], >> > > -cds => Bio::Location::Simple >> > > ->new(-start => 101, -end => 209), >> > > ); >> > > >> > > >> > > print join("\t", "chr", "aa"), "\n"; >> > > for my $pos (99..111,199..211) { >> > > my $res = $mapper->map( >> > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => >> > 1)); >> > > my $start = $res->start; $start = "NA" unless defined $start; >> > > my $end = $res->end; $end = "NA" unless defined $end; >> > > print join("\t", $pos, $start), "\n"; >> > > } >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From remi.planel at free.fr Thu May 13 05:08:58 2010 From: remi.planel at free.fr (Remi) Date: Thu, 13 May 2010 11:08:58 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast Message-ID: <4BEBC1AA.2020908@free.fr> Hi all, I'm using Bio::Tools::Run::StandAloneBlastPlus and trying to run a remote blast using this code : /my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'nr', -remote => '1', ); my $result = $fac->blastp( -query => 'P12996.fasta', -outfile => 'out.bls', ); /but I got an error message : "BLAST Database error: Protein BLAST database './nr' does not exist in the NCBI servers". But if I'm modifying directly the value of $fac->{'_db_path'} like : /$fac->{'_db_path'} = 'nr';/ it's working. Is that a Bug or am I missing something ? Thanks, R?mi From maj at fortinbras.us Thu May 13 07:17:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 13 May 2010 07:17:55 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast In-Reply-To: <4BEBC1AA.2020908@free.fr> References: <4BEBC1AA.2020908@free.fr> Message-ID: <1A1631149DEF4B9080E5D4D5851F4587@NewLife> Hi R?mi Looks like a bug-- can you report it via http://bugzilla.bioperl.org? Just enter what you've written here-- I appreciate it- Mark ----- Original Message ----- From: "Remi" To: "BioPerl List" Sent: Thursday, May 13, 2010 5:08 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast Hi all, I'm using Bio::Tools::Run::StandAloneBlastPlus and trying to run a remote blast using this code : /my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'nr', -remote => '1', ); my $result = $fac->blastp( -query => 'P12996.fasta', -outfile => 'out.bls', ); /but I got an error message : "BLAST Database error: Protein BLAST database './nr' does not exist in the NCBI servers". But if I'm modifying directly the value of $fac->{'_db_path'} like : /$fac->{'_db_path'} = 'nr';/ it's working. Is that a Bug or am I missing something ? Thanks, R?mi _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed May 12 16:10:36 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 12 May 2010 22:10:36 +0200 Subject: [Bioperl-l] Ohloh update Message-ID: <32ED5B44-061D-4634-9E5C-72E313E1A58C@sbc.su.se> Hi everyone, Ohloh account probably needs to be changed to point to our Github repo. I'd be happy to do it if someone adds me on there. Otherwise, could one of the admins check into that when they get a chance? Also, I notice it hasn't registered any commits since March 15th ? hopefully the repo change will wake it up or we may need to contact one of their admins again. Can anyone think of other external sites pointing to BioPerl which need updating, too? Dave From jay at jays.net Thu May 13 08:42:41 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 07:42:41 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <201005130328.o4D3S8Fs011865@portal.open-bio.org> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> Message-ID: ------- Comment #3 from cjfields at bioperl.org 2010-05-12 23:28 EST ------- > Ouch, that's a bit nasty. Taking advantage of git move and doing this on a > topic branch (topic/bug_3077) on github. I plan on cleaning up the 'jhannah' branch (renaming it 'topic/bug_2515', asking people for their input, merging to master). I plan on cleaning up the 'yapc10hackathon' branch. I can't remember what Robert and I left in there after YAPC last year. Should most of the other branches be deleted? If a branch hasn't been changed in more than a year and no one intends to jump into it in the coming year what purpose does it serve? Old tags can hang out forever, but shouldn't our branch list be tidy? (Specifically I would argue that old release number tags should hang out forever, but I don't see the point in any other ancient tags continuing to exist if their purpose isn't documented anywhere.) Are we serious about emulating this branching model? http://nvie.com/git-model If so then we need to create a 'develop' branch and only the release manager should touch 'master' and yahoos like me should be branching off of 'develop' instead, right? Counter argument: Since 'master' is the default branch and we want to encourage doc patches and typo corrections from the world making trivial contributions as easy as possible for everyone, I would think that using 'master' as the daily headstream would be better. So 'topic/bug_####' for each non-trivial Bugzilla ticket, and release managers can work their magic in 'release-#-#' branches. (Release branches old enough that there's no way we're going to patch them any more are deleted, and only the tag remains). Thoughts? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah P.S. My "up to date" OS X 10.6.3 machines both had git 1.5.3.1 on them. Upgrading to git 1.7.1 makes branch checkouts simpler. jhannah at minijaysnet~/src/bioperl-live$ git branch -a * master remotes/origin/HEAD -> origin/master remotes/origin/TRY_featureio_refactor remotes/origin/TRY_gff_refactor remotes/origin/TRY_locatableseq_refactor remotes/origin/anydbm-branch remotes/origin/bioperl remotes/origin/bioperl-branch-1-5-1 remotes/origin/bioperl-live remotes/origin/branch-06 remotes/origin/branch-07 remotes/origin/branch-07-ensembl-120 remotes/origin/branch-1-0-0 remotes/origin/branch-1-2 remotes/origin/branch-1-2-collection remotes/origin/branch-1-4 remotes/origin/branch-1-5-2 remotes/origin/branch-1-6 remotes/origin/branch-ensembl-m1 remotes/origin/branch-experimental remotes/origin/featann_rollback remotes/origin/internal-branch-pre-delete-06-tag remotes/origin/jhannah remotes/origin/lightweight_feature_branch remotes/origin/master remotes/origin/ontology-cache remotes/origin/release-0-04-bug remotes/origin/restriction-refactor remotes/origin/stable-0-05 remotes/origin/stable-0-05-new remotes/origin/steve_chervitz remotes/origin/topic/bug_3077 remotes/origin/yapc10hackathon jhannah at minijaysnet~/src/bioperl-live$ git tag after-05-06-merge after-05-06-merge-2 after004 before-05-to-06-merge before-05-to-06-trunk bioperl-06-1 bioperl-061-pre1 bioperl-1-0-0 bioperl-1-0-alpha bioperl-1-0-alpha2-rc bioperl-1-2-1-rc1 bioperl-1-6-0_001 bioperl-1-6-0_002 bioperl-1-6-0_003 bioperl-1-6-0_004 bioperl-1-6-0_005 bioperl-1-6-0_006 bioperl-1-6-RC1 bioperl-1-6-RC2 bioperl-1-6-RC2_15306 bioperl-1-6-RC3 bioperl-1-6-RC3_15392 bioperl-1-6-RC4 bioperl-devel-1-1-1 bioperl-devel-1-3-01 bioperl-devel-1-3-02 bioperl-devel-1-3-03 bioperl-devel-1-3-04 bioperl-release-1-0-0 bioperl-release-1-0-1 bioperl-release-1-0-2 bioperl-release-1-1-0 bioperl-release-1-2-0 bioperl-release-1-2-1 bioperl-release-1-2-2 bioperl-release-1-2-3 bioperl-release-1-4-0 bioperl-release-1-5-0 bioperl-release-1-5-0-rc1 bioperl-release-1-5-0-rc2 bioperl-release-1-5-1 bioperl-release-1-5-1-rc4 bioperl-release-1-5-2 bioperl-release-1-5-2-patch1 bioperl-release-1-5-2-patch2 bioperl-release-1-6 bioperl-release-1-6-1 bioperl-run-release-1-2-0 for_gmod_0_003 gbrowse_1_65 join-0-04-to-0-05 lightweight_feature ontology-fix1 ontology-overhaul-end ontology-overhaul-start prerelease-06 release-0-04-1 release-0-04-2 release-0-04-3 release-0-04-4 release-0-05 release-0-05-1 release-0-7-0 release-0-7-1 release-0-7-2 release-0-9-0 release-0-9-2 release-0-9-3 release-06 release-06-2 release-1_01 release-ensembl-06 snapshot-at-head-of-07-branch start tag-ensembl-stable-061 From cjfields at illinois.edu Thu May 13 09:49:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 08:49:19 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> Message-ID: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> On May 13, 2010, at 7:42 AM, Jay Hannah wrote: > ------- Comment #3 from cjfields at bioperl.org 2010-05-12 23:28 EST ------- >> Ouch, that's a bit nasty. Taking advantage of git move and doing this on a >> topic branch (topic/bug_3077) on github. > > I plan on cleaning up the 'jhannah' branch (renaming it 'topic/bug_2515', asking people for their input, merging to master). > > I plan on cleaning up the 'yapc10hackathon' branch. I can't remember what Robert and I left in there after YAPC last year. > > Should most of the other branches be deleted? If a branch hasn't been changed in more than a year and no one intends to jump into it in the coming year what purpose does it serve? Old tags can hang out forever, but shouldn't our branch list be tidy? (Specifically I would argue that old release number tags should hang out forever, but I don't see the point in any other ancient tags continuing to exist if their purpose isn't documented anywhere.) I would say err on the safe side and keep the ones we're unsure of, but a cleanup would be nice. We could adopt what Moose has done and move branches we're unsure of to something like 'attic'. > Are we serious about emulating this branching model? > > http://nvie.com/git-model > > If so then we need to create a 'develop' branch and only the release manager should touch 'master' and yahoos like me should be branching off of 'develop' instead, right? > > Counter argument: Since 'master' is the default branch and we want to encourage doc patches and typo corrections from the world making trivial contributions as easy as possible for everyone, I would think that using 'master' as the daily headstream would be better. So 'topic/bug_####' for each non-trivial Bugzilla ticket, and release managers can work their magic in 'release-#-#' branches. (Release branches old enough that there's no way we're going to patch them any more are deleted, and only the tag remains). ... > Thoughts? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > P.S. My "up to date" OS X 10.6.3 machines both had git 1.5.3.1 on them. Upgrading to git 1.7.1 makes branch checkouts simpler. Moose has a 'stable' branch that release managers (the cabal) pull into from 'master' for releases. It's just a matter of semantics, what name we use for active development branches and what to use for stable releases; for us, the 'develop' and 'master' from that link could be (respectively) 'master' and 'stable'. 'hotfixes' would be bug fixes, and 'feature branches' would be just that, new features to be added. As for bug fixes, it would be much nicer to have most changes beyond very simple ones (including all bug fixes) relegated to branches that can be merged in. This sequesters any changes to the branch, where they can be tested prior to a merge. Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. chris From jay at jays.net Thu May 13 10:38:20 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 09:38:20 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: So, like this? Flow diagram: http://biodoc.ist.unomaha.edu/~jhannah/tmp/branches.png master (git and github default) Trivial changes committed directly here. topic/bug_#### One branch per non-trivial Bugzilla ticket topic/jhannah_crazy_idea Branches for unstable/unfinished work stable Release manager pulls from master to stable periodically (all tests are passing, etc.) release-#-#-# Pulled from stable, pushed to CPAN attic/* Any branch with no activity for 1 year I like it. > Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? I'm fine with attic/ and just leaving stuff in there until 2050. Then we should probably delete them. :) My understanding is that by default commits that have no pointers to them (branches or tags or subsequent commits) are subject to cleanup/prune. I think this means that if someone, 10 years ago, committed 3 times to the branch "jhannah_crazy_idea" and that branch is deleted, then those 3 commits may be removed (gone forever) by git cleanup/prune. This is a feature or a crime against humanity depending on who you ask. It can be disabled in a normal repo, I don't know about github. > Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. As I collect clues I'll be brain dumping everything I think I know onto the wiki. This is a crazy busy week for me though. :( Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Thu May 13 11:00:05 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 10:00:05 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: On May 13, 2010, at 8:49 AM, Chris Fields wrote: > Saying that, we could adopt a workflow policy that allows deletion of any merged branch. Right. Except for release-* branches, which are never merged anywhere. A release is a branch while it's being prepared and tweaked. Once perfect, it is tagged and pushed to CPAN. At that point the branch can be deleted since we can never push that release number to CPAN again (even if we wanted to). The tag remains forever. Or am I mistaken? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From shalabh.sharma7 at gmail.com Thu May 13 11:07:26 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 13 May 2010 11:07:26 -0400 Subject: [Bioperl-l] parsing blast report with long description Message-ID: Hi All, I need some help in parsing blast output. I have a inhouse database that contain sequences with really long description. >SMPL_IDI_1105131728043 /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV So my blast report looks like this: ..... ..... >SMPL_IDI_1105131728043 /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 6887/Open Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 Length = 213 Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix adjust. Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) ..... ..... (note that the tag "TI_1000008216887" is splitting in two lines). I am using SeqIO to parse this report. What i am doing is parsing the description field again to get all the tags. like .... .... my $desc = $hit->description; my @f = split('/',$desc); for(my $i = 0;$i < scalar @f;$i++){ print OUT "$f[$i]\t";} ..... ..... *I am getting the perfect parsed report but the field with TI_1000008216887 has a space **TI_100000821 6887 *. I would really appreciate if anyone can help me out. Thanks Shalabh Sharma From joshpk105 at gmail.com Thu May 13 10:42:28 2010 From: joshpk105 at gmail.com (Katz) Date: Thu, 13 May 2010 07:42:28 -0700 (PDT) Subject: [Bioperl-l] RemoteBlast Message-ID: <54674635-db43-413c-8c96-0d214f1b978d@l31g2000yqm.googlegroups.com> Is there anyway to differentiate between the three different ncbi blastn? Right now I'm using RemoteBlast as follows: Bio::Tools::Run::RemoteBlast->new(-prog => 'blastn', -data => 'nr', - expect => '1e-5', -readmethod => 'SearchIO'); then blasting my files. However, this is auto using megablastn and i need to use regular blastn. Thx, Josh From hlapp at drycafe.net Thu May 13 11:43:47 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 11:43:47 -0400 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> On May 13, 2010, at 9:49 AM, Chris Fields wrote: > Re: deletion of branches, I'm only really in support of deleting > feature branches that have been merged back to 'master' or another > branch (e.g. only removed using 'git branch -d foo'). I agree. > Older subversion release branches don't tend to fall into that > category, in that we had merged or cherry-picked changes from svn > trunk to them, not vice versa; they were never merged back to > trunk. Deletion in this case would be somewhat history-revising, > correct? I wouldn't call it history-revising. I also think it's OK to delete release branches that are no longer supported, iff we have a tag for the release itself. That's different from counting inactivity. A branch may lie dormant for a year or longer until someone has time to pick it back up again - I don't see the harm in keeping those around. > Saying that, we could adopt a workflow policy that allows deletion > of any merged branch. All this suggests coming up with a good > 'Contributing' document. That would be highly useful. I'll also voice a word of caution here though - I find it kind of ironic that the switch to git, which is supposed to make contribution *easier*, very often leads subsequently to complex commit/pull/push/branching workflows being instituted for projects that take pages and pages to document, a lot of time to ingest, and discipline to follow - it seems to be very easy and tempting to go overboard with this. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Thu May 13 12:01:05 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 11:01:05 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: On May 13, 2010, at 10:43 AM, Hilmar Lapp wrote: > On May 13, 2010, at 9:49 AM, Chris Fields wrote: >> Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. > > That would be highly useful. I'll also voice a word of caution here though - I find it kind of ironic that the switch to git, which is supposed to make contribution *easier*, very often leads subsequently to complex commit/pull/push/branching workflows being instituted for projects that take pages and pages to document, a lot of time to ingest, and discipline to follow - it seems to be very easy and tempting to go overboard with this. I'm happy to comply with whatever the policy is. If that policy is "everything trivial in master, non-trivial in topic/FOO, release manager will figure out everything else" that's fine with me. A branch cleanup would be nice. Or I'll just close my eyes. :) I'm embarrassed that I left unfinished business in branches in 2009. I'm fishing for a consensus on a contribution policy. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From heikki.lehvaslaiho at gmail.com Thu May 13 12:48:14 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 19:48:14 +0300 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: I second Hilmar. Let's try to keep this simple. While for most people just beginning to use git this discussion seems confusing and the structures complex, things really are pretty simple. I expect most of the branches to live only in developers copies of the repo. They are created when work starts on the new bug or a feature, merged to master when work is done, and removed immediately or soon after that. Most of the work is done in the master and only the release managers touch the stable and release branches. See Jay's flow diagram. Work flow for this is (while calling 'git status' all the time): git branch $new git checkout $new # work git commit git commit ... git checkout master git merge $new git push ... git branch -d $new -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 13 May 2010 18:43, Hilmar Lapp wrote: > > On May 13, 2010, at 9:49 AM, Chris Fields wrote: > > Re: deletion of branches, I'm only really in support of deleting feature >> branches that have been merged back to 'master' or another branch (e.g. only >> removed using 'git branch -d foo'). >> > > I agree. > > > Older subversion release branches don't tend to fall into that category, >> in that we had merged or cherry-picked changes from svn trunk to them, not >> vice versa; they were never merged back to trunk. Deletion in this case >> would be somewhat history-revising, correct? >> > > I wouldn't call it history-revising. I also think it's OK to delete release > branches that are no longer supported, iff we have a tag for the release > itself. > > That's different from counting inactivity. A branch may lie dormant for a > year or longer until someone has time to pick it back up again - I don't see > the harm in keeping those around. > > > Saying that, we could adopt a workflow policy that allows deletion of any >> merged branch. All this suggests coming up with a good 'Contributing' >> document. >> > > That would be highly useful. I'll also voice a word of caution here though > - I find it kind of ironic that the switch to git, which is supposed to make > contribution *easier*, very often leads subsequently to complex > commit/pull/push/branching workflows being instituted for projects that take > pages and pages to document, a lot of time to ingest, and discipline to > follow - it seems to be very easy and tempting to go overboard with this. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 13 17:41:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 16:41:35 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: On May 13, 2010, at 11:48 AM, Heikki Lehvaslaiho wrote: > I second Hilmar. Let's try to keep this simple. > > While for most people just beginning to use git this discussion seems > confusing and the structures complex, things really are pretty simple. > > I expect most of the branches to live only in developers copies of the repo. > They are created when work starts on the new bug or a feature, merged to > master when work is done, and removed immediately or soon after that. Most > of the work is done in the master and only the release managers touch the > stable and release branches. See Jay's flow diagram. Right, many branches will occur locally. And I'm not suggesting that we strictly follow a particular pattern; I would rather not enforce that upon devs who already have a productive pattern set. I think this would act more as a suggested method of development, something that has been demonstrated to work well for other large projects (and something I'll be following). What I would really like to promote is using branches for making code changes, even ones that are only a few commits or so (and even if they are only local ones not pushed to github). Branches are cheap. > Work flow for this is (while calling 'git status' all the time): > > git branch $new > git checkout $new > # work > git commit > git commit > ... > git checkout master > git merge $new > git push > ... > git branch -d $new > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia Yes, that's essentially the basic workflow, maybe with a preliminary 'git pull' to sync to the latest. chris > On 13 May 2010 18:43, Hilmar Lapp wrote: > >> >> On May 13, 2010, at 9:49 AM, Chris Fields wrote: >> >> Re: deletion of branches, I'm only really in support of deleting feature >>> branches that have been merged back to 'master' or another branch (e.g. only >>> removed using 'git branch -d foo'). >>> >> >> I agree. >> >> >> Older subversion release branches don't tend to fall into that category, >>> in that we had merged or cherry-picked changes from svn trunk to them, not >>> vice versa; they were never merged back to trunk. Deletion in this case >>> would be somewhat history-revising, correct? >>> >> >> I wouldn't call it history-revising. I also think it's OK to delete release >> branches that are no longer supported, iff we have a tag for the release >> itself. >> >> That's different from counting inactivity. A branch may lie dormant for a >> year or longer until someone has time to pick it back up again - I don't see >> the harm in keeping those around. >> >> >> Saying that, we could adopt a workflow policy that allows deletion of any >>> merged branch. All this suggests coming up with a good 'Contributing' >>> document. >>> >> >> That would be highly useful. I'll also voice a word of caution here though >> - I find it kind of ironic that the switch to git, which is supposed to make >> contribution *easier*, very often leads subsequently to complex >> commit/pull/push/branching workflows being instituted for projects that take >> pages and pages to document, a lot of time to ingest, and discipline to >> follow - it seems to be very easy and tempting to go overboard with this. >> >> -hilmar >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu May 13 17:56:11 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 14:56:11 -0700 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: <4BEC757B.5030407@cornell.edu> OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. Rob From jay at jays.net Thu May 13 18:00:21 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 17:00:21 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <4BEC757B.5030407@cornell.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: <7BA7535D-AE97-4827-8B86-91C24842BAED@jays.net> On May 13, 2010, at 4:56 PM, Robert Buels wrote: > OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. > > I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. master++ Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From rmb32 at cornell.edu Thu May 13 18:13:52 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 15:13:52 -0700 Subject: [Bioperl-l] move ancient branches to attic Message-ID: <4BEC79A0.5000505@cornell.edu> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. Note that there are still tags for all the old releases, so those won't be lost. Thoughts? Rob From jay at jays.net Thu May 13 18:22:30 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 17:22:30 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC79A0.5000505@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> Message-ID: On May 13, 2010, at 5:13 PM, Robert Buels wrote: > To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. > > Note that there are still tags for all the old releases, so those won't be lost. Sounds generous to me. proceed++ Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From hlapp at drycafe.net Thu May 13 18:46:00 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 18:46:00 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC79A0.5000505@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> Message-ID: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Why? What is the gain from deleting branches that you don't know whether they are dead or not? -hilmar On May 13, 2010, at 6:13 PM, Robert Buels wrote: > To clean up branches, I propose to deleting branches (merged or not) > whose head is older than Jan 1, 2006, and moving branches to attic/ > whose head is older than Jan 1, 2009. > > Note that there are still tags for all the old releases, so those > won't be lost. > > Thoughts? > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From rmb32 at cornell.edu Thu May 13 19:05:06 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 16:05:06 -0700 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <4BEC85A2.50401@cornell.edu> The gain is to avoid having useless things hanging around. Every time somebody has to read through a list of 50 branches to find the maybe 5 that are useful, it's time lost. In other word, it's the same gain that you get from cleaning off your desk, so that you can see where you put things. Rob Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether > they are dead or not? > > -hilmar > > On May 13, 2010, at 6:13 PM, Robert Buels wrote: > >> To clean up branches, I propose to deleting branches (merged or not) >> whose head is older than Jan 1, 2006, and moving branches to attic/ >> whose head is older than Jan 1, 2009. >> >> Note that there are still tags for all the old releases, so those >> won't be lost. >> >> Thoughts? >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 13 19:07:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 18:07:31 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <4BEC757B.5030407@cornell.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: 'master'. That's more in lone with other repos. chris On May 13, 2010, at 4:56 PM, Robert Buels wrote: > OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. > > I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu May 13 20:27:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:27:22 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC85A2.50401@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> Message-ID: <77C06787-B381-43AA-8F5A-74331866C495@illinois.edu> Let's go through and check which branches are specifically merged back to trunk and delete those first, then list the ones that aren't or we're unsure of. If needed we can move those to an 'attic', like Moose. chris On May 13, 2010, at 6:05 PM, Robert Buels wrote: > The gain is to avoid having useless things hanging around. Every time somebody has to read through a list of 50 branches to find the maybe 5 that are useful, it's time lost. > > In other word, it's the same gain that you get from cleaning off your desk, so that you can see where you put things. > > Rob > > > Hilmar Lapp wrote: >> Why? What is the gain from deleting branches that you don't know whether they are dead or not? >> -hilmar >> On May 13, 2010, at 6:13 PM, Robert Buels wrote: >>> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. >>> >>> Note that there are still tags for all the old releases, so those won't be lost. >>> >>> Thoughts? >>> >>> Rob >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu May 13 20:28:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:28:30 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: <6757E1DD-5712-4894-8EAF-52F5F902D348@illinois.edu> On May 13, 2010, at 9:38 AM, Jay Hannah wrote: > So, like this? > > Flow diagram: > http://biodoc.ist.unomaha.edu/~jhannah/tmp/branches.png > > master > (git and github default) Trivial changes committed directly here. > topic/bug_#### > One branch per non-trivial Bugzilla ticket > topic/jhannah_crazy_idea > Branches for unstable/unfinished work > stable > Release manager pulls from master to stable periodically (all tests are passing, etc.) > release-#-#-# > Pulled from stable, pushed to CPAN > attic/* > Any branch with no activity for 1 year > > I like it. Yes, something along those lines. >> Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? > > I'm fine with attic/ and just leaving stuff in there until 2050. Then we should probably delete them. :) > > My understanding is that by default commits that have no pointers to them (branches or tags or subsequent commits) are subject to cleanup/prune. I think this means that if someone, 10 years ago, committed 3 times to the branch "jhannah_crazy_idea" and that branch is deleted, then those 3 commits may be removed (gone forever) by git cleanup/prune. > > This is a feature or a crime against humanity depending on who you ask. It can be disabled in a normal repo, I don't know about github. I don't think this is disabled in github (e.g. one can still delete branches). Duke Leto suggested the only real way to prevent history revising commits would be to do a pre-commit hook, which is not supported right now in github. >> Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. > > As I collect clues I'll be brain dumping everything I think I know onto the wiki. This is a crazy busy week for me though. :( > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah No problem. chris From cjfields at illinois.edu Thu May 13 20:41:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:41:57 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> It would be nice to at least designate them as outdated in some respect, and organize them along those lines. chris On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether they are dead or not? > > -hilmar > > On May 13, 2010, at 6:13 PM, Robert Buels wrote: > >> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. >> >> Note that there are still tags for all the old releases, so those won't be lost. >> >> Thoughts? >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu May 13 20:55:01 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 20:55:01 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> Message-ID: On May 13, 2010, at 8:41 PM, Chris Fields wrote: > It would be nice to at least designate them as outdated in some > respect, and organize them along those lines. I agree. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Thu May 13 21:04:02 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 21:04:02 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC85A2.50401@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> Message-ID: <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> On May 13, 2010, at 7:05 PM, Robert Buels wrote: > The gain is to avoid having useless things hanging around. Every > time somebody has to read through a list of 50 branches to find the > maybe 5 that are useful, it's time lost. > > In other word, it's the same gain that you get from cleaning off > your desk, so that you can see where you put things. Hold on - that's not a good comparison is it? First off, this being git, the "main" repo is not your desk. You can have your desk and wipe it clean of all branches and tags that have ever existed, without affecting, or imposing this on, anyone else. Second, why would you *want* to look through all those branches? This being git, you create branches all the time and merge them back, on your own repo, right? Where in this workflow are you browsing through the 50 branches of the "main" repo all the time? Third, and maybe I'm just too old, but moving to git because branching and having your own clone exactly the way you want it is so easy, only to subsequently delete most of the branches on the "main" repo for primarily aesthetic reasons just doesn't make much sense to me, honestly. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From heikki.lehvaslaiho at gmail.com Fri May 14 06:41:22 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Fri, 14 May 2010 13:41:22 +0300 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: Yep. master. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 14 May 2010 02:07, Chris Fields wrote: > 'master'. That's more in lone with other repos. > > chris > > On May 13, 2010, at 4:56 PM, Robert Buels wrote: > > > OK then, decision time, which is the main devel branch, 'master' or > 'develop'? I need to merge in a few small bugfixes. > > > > I vote for 'master', since it's slightly simpler for new devs, with > releases being constructed in branches off of that. > > > > Rob > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From heikki.lehvaslaiho at gmail.com Fri May 14 06:45:50 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Fri, 14 May 2010 13:45:50 +0300 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> Message-ID: Rob, If you think is important, do a survay and create a nice wiki page explaing these braches to everyone. Then we can discuss if some of them are best deleted. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 14 May 2010 04:04, Hilmar Lapp wrote: > > On May 13, 2010, at 7:05 PM, Robert Buels wrote: > > The gain is to avoid having useless things hanging around. Every time >> somebody has to read through a list of 50 branches to find the maybe 5 that >> are useful, it's time lost. >> >> In other word, it's the same gain that you get from cleaning off your >> desk, so that you can see where you put things. >> > > > Hold on - that's not a good comparison is it? First off, this being git, > the "main" repo is not your desk. You can have your desk and wipe it clean > of all branches and tags that have ever existed, without affecting, or > imposing this on, anyone else. > > Second, why would you *want* to look through all those branches? This being > git, you create branches all the time and merge them back, on your own repo, > right? Where in this workflow are you browsing through the 50 branches of > the "main" repo all the time? > > Third, and maybe I'm just too old, but moving to git because branching and > having your own clone exactly the way you want it is so easy, only to > subsequently delete most of the branches on the "main" repo for primarily > aesthetic reasons just doesn't make much sense to me, honestly. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Fri May 14 09:32:04 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 14 May 2010 08:32:04 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether they are dead or not? If our branch list was clean they wouldn't dupe up when I go to merge in other people's contributions. You don't find large lists of probably dead things annoying? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah jhannah at cplreynoldslpt:~/src/bioperl-live$ git remote add vinanna git://github.com/vinanna/bioperl-live.gitjhannah at cplreynoldslpt:~/src/bioperl-live$ git fetch vinanna remote: Counting objects: 18, done. remote: Compressing objects: 100% (9/9), done. remote: Total 10 (delta 8), reused 0 (delta 0) Unpacking objects: 100% (10/10), done. >From git://github.com/vinanna/bioperl-live * [new branch] TRY_featureio_refactor -> vinanna/TRY_featureio_refactor * [new branch] TRY_gff_refactor -> vinanna/TRY_gff_refactor * [new branch] TRY_locatableseq_refactor -> vinanna/TRY_locatableseq_refactor * [new branch] anydbm-branch -> vinanna/anydbm-branch * [new branch] bioperl -> vinanna/bioperl * [new branch] bioperl-branch-1-5-1 -> vinanna/bioperl-branch-1-5-1 * [new branch] bioperl-live -> vinanna/bioperl-live * [new branch] branch-06 -> vinanna/branch-06 * [new branch] branch-07 -> vinanna/branch-07 * [new branch] branch-07-ensembl-120 -> vinanna/branch-07-ensembl-120 * [new branch] branch-1-0-0 -> vinanna/branch-1-0-0 * [new branch] branch-1-2 -> vinanna/branch-1-2 * [new branch] branch-1-2-collection -> vinanna/branch-1-2-collection * [new branch] branch-1-4 -> vinanna/branch-1-4 * [new branch] branch-1-5-2 -> vinanna/branch-1-5-2 * [new branch] branch-1-6 -> vinanna/branch-1-6 * [new branch] branch-ensembl-m1 -> vinanna/branch-ensembl-m1 * [new branch] branch-experimental -> vinanna/branch-experimental * [new branch] featann_rollback -> vinanna/featann_rollback * [new branch] internal-branch-pre-delete-06-tag -> vinanna/internal-branch-pre-delete-06-tag * [new branch] lightweight_feature_branch -> vinanna/lightweight_feature_branch * [new branch] master -> vinanna/master * [new branch] ontology-cache -> vinanna/ontology-cache * [new branch] release-0-04-bug -> vinanna/release-0-04-bug * [new branch] restriction-refactor -> vinanna/restriction-refactor * [new branch] stable-0-05 -> vinanna/stable-0-05 * [new branch] stable-0-05-new -> vinanna/stable-0-05-new * [new branch] steve_chervitz -> vinanna/steve_chervitz * [new branch] topic/bug_2515 -> vinanna/topic/bug_2515 jhannah at cplreynoldslpt:~/src/bioperl-live$ git branch -a * master remotes/origin/HEAD -> origin/master remotes/origin/TRY_featureio_refactor remotes/origin/TRY_gff_refactor remotes/origin/TRY_locatableseq_refactor remotes/origin/anydbm-branch remotes/origin/bioperl remotes/origin/bioperl-branch-1-5-1 remotes/origin/bioperl-live remotes/origin/branch-06 remotes/origin/branch-07 remotes/origin/branch-07-ensembl-120 remotes/origin/branch-1-0-0 remotes/origin/branch-1-2 remotes/origin/branch-1-2-collection remotes/origin/branch-1-4 remotes/origin/branch-1-5-2 remotes/origin/branch-1-6 remotes/origin/branch-ensembl-m1 remotes/origin/branch-experimental remotes/origin/featann_rollback remotes/origin/internal-branch-pre-delete-06-tag remotes/origin/jhannah remotes/origin/lightweight_feature_branch remotes/origin/master remotes/origin/ontology-cache remotes/origin/release-0-04-bug remotes/origin/restriction-refactor remotes/origin/stable-0-05 remotes/origin/stable-0-05-new remotes/origin/steve_chervitz remotes/origin/topic/bug_2515 remotes/origin/yapc10hackathon remotes/vinanna/TRY_featureio_refactor remotes/vinanna/TRY_gff_refactor remotes/vinanna/TRY_locatableseq_refactor remotes/vinanna/anydbm-branch remotes/vinanna/bioperl remotes/vinanna/bioperl-branch-1-5-1 remotes/vinanna/bioperl-live remotes/vinanna/branch-06 remotes/vinanna/branch-07 remotes/vinanna/branch-07-ensembl-120 remotes/vinanna/branch-1-0-0 remotes/vinanna/branch-1-2 remotes/vinanna/branch-1-2-collection remotes/vinanna/branch-1-4 remotes/vinanna/branch-1-5-2 remotes/vinanna/branch-1-6 remotes/vinanna/branch-ensembl-m1 remotes/vinanna/branch-experimental remotes/vinanna/featann_rollback remotes/vinanna/internal-branch-pre-delete-06-tag remotes/vinanna/lightweight_feature_branch remotes/vinanna/master remotes/vinanna/ontology-cache remotes/vinanna/release-0-04-bug remotes/vinanna/restriction-refactor remotes/vinanna/stable-0-05 remotes/vinanna/stable-0-05-new remotes/vinanna/steve_chervitz remotes/vinanna/topic/bug_2515 From cjfields at illinois.edu Fri May 14 09:47:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 08:47:05 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> Message-ID: <2309AD4D-9FEA-4463-A4FD-519F0FCA2639@illinois.edu> To me, this is more a problem with the way forks currently work in github, via automatically dup-ing all branches vs allowing a single branch ('master', for instance). In fairness, that makes sense if they're implementing this the way I think, in order to conserve space. There are other small issues on github that should be worked out, for instance the automatic addition of all collabs with pull requests, since these go to bioperl-guts now. At least, I got a dup email from the last pull request. Some fixes are supposedly being planned for group-like accounts, just don't know when they'll appear. But I think the overall benefits of github outweigh some of the bumps in the road we're seeing. chris On May 14, 2010, at 8:32 AM, Jay Hannah wrote: > On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: >> Why? What is the gain from deleting branches that you don't know whether they are dead or not? > > If our branch list was clean they wouldn't dupe up when I go to merge in other people's contributions. > > You don't find large lists of probably dead things annoying? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > > > jhannah at cplreynoldslpt:~/src/bioperl-live$ git remote add vinanna git://github.com/vinanna/bioperl-live.gitjhannah at cplreynoldslpt:~/src/bioperl-live$ git fetch vinanna > remote: Counting objects: 18, done. > remote: Compressing objects: 100% (9/9), done. > remote: Total 10 (delta 8), reused 0 (delta 0) > Unpacking objects: 100% (10/10), done. >> From git://github.com/vinanna/bioperl-live > * [new branch] TRY_featureio_refactor -> vinanna/TRY_featureio_refactor > * [new branch] TRY_gff_refactor -> vinanna/TRY_gff_refactor > * [new branch] TRY_locatableseq_refactor -> vinanna/TRY_locatableseq_refactor > * [new branch] anydbm-branch -> vinanna/anydbm-branch > * [new branch] bioperl -> vinanna/bioperl > * [new branch] bioperl-branch-1-5-1 -> vinanna/bioperl-branch-1-5-1 > * [new branch] bioperl-live -> vinanna/bioperl-live > * [new branch] branch-06 -> vinanna/branch-06 > * [new branch] branch-07 -> vinanna/branch-07 > * [new branch] branch-07-ensembl-120 -> vinanna/branch-07-ensembl-120 > * [new branch] branch-1-0-0 -> vinanna/branch-1-0-0 > * [new branch] branch-1-2 -> vinanna/branch-1-2 > * [new branch] branch-1-2-collection -> vinanna/branch-1-2-collection > * [new branch] branch-1-4 -> vinanna/branch-1-4 > * [new branch] branch-1-5-2 -> vinanna/branch-1-5-2 > * [new branch] branch-1-6 -> vinanna/branch-1-6 > * [new branch] branch-ensembl-m1 -> vinanna/branch-ensembl-m1 > * [new branch] branch-experimental -> vinanna/branch-experimental > * [new branch] featann_rollback -> vinanna/featann_rollback > * [new branch] internal-branch-pre-delete-06-tag -> vinanna/internal-branch-pre-delete-06-tag > * [new branch] lightweight_feature_branch -> vinanna/lightweight_feature_branch > * [new branch] master -> vinanna/master > * [new branch] ontology-cache -> vinanna/ontology-cache > * [new branch] release-0-04-bug -> vinanna/release-0-04-bug > * [new branch] restriction-refactor -> vinanna/restriction-refactor > * [new branch] stable-0-05 -> vinanna/stable-0-05 > * [new branch] stable-0-05-new -> vinanna/stable-0-05-new > * [new branch] steve_chervitz -> vinanna/steve_chervitz > * [new branch] topic/bug_2515 -> vinanna/topic/bug_2515 > jhannah at cplreynoldslpt:~/src/bioperl-live$ git branch -a > * master > remotes/origin/HEAD -> origin/master > remotes/origin/TRY_featureio_refactor > remotes/origin/TRY_gff_refactor > remotes/origin/TRY_locatableseq_refactor > remotes/origin/anydbm-branch > remotes/origin/bioperl > remotes/origin/bioperl-branch-1-5-1 > remotes/origin/bioperl-live > remotes/origin/branch-06 > remotes/origin/branch-07 > remotes/origin/branch-07-ensembl-120 > remotes/origin/branch-1-0-0 > remotes/origin/branch-1-2 > remotes/origin/branch-1-2-collection > remotes/origin/branch-1-4 > remotes/origin/branch-1-5-2 > remotes/origin/branch-1-6 > remotes/origin/branch-ensembl-m1 > remotes/origin/branch-experimental > remotes/origin/featann_rollback > remotes/origin/internal-branch-pre-delete-06-tag > remotes/origin/jhannah > remotes/origin/lightweight_feature_branch > remotes/origin/master > remotes/origin/ontology-cache > remotes/origin/release-0-04-bug > remotes/origin/restriction-refactor > remotes/origin/stable-0-05 > remotes/origin/stable-0-05-new > remotes/origin/steve_chervitz > remotes/origin/topic/bug_2515 > remotes/origin/yapc10hackathon > remotes/vinanna/TRY_featureio_refactor > remotes/vinanna/TRY_gff_refactor > remotes/vinanna/TRY_locatableseq_refactor > remotes/vinanna/anydbm-branch > remotes/vinanna/bioperl > remotes/vinanna/bioperl-branch-1-5-1 > remotes/vinanna/bioperl-live > remotes/vinanna/branch-06 > remotes/vinanna/branch-07 > remotes/vinanna/branch-07-ensembl-120 > remotes/vinanna/branch-1-0-0 > remotes/vinanna/branch-1-2 > remotes/vinanna/branch-1-2-collection > remotes/vinanna/branch-1-4 > remotes/vinanna/branch-1-5-2 > remotes/vinanna/branch-1-6 > remotes/vinanna/branch-ensembl-m1 > remotes/vinanna/branch-experimental > remotes/vinanna/featann_rollback > remotes/vinanna/internal-branch-pre-delete-06-tag > remotes/vinanna/lightweight_feature_branch > remotes/vinanna/master > remotes/vinanna/ontology-cache > remotes/vinanna/release-0-04-bug > remotes/vinanna/restriction-refactor > remotes/vinanna/stable-0-05 > remotes/vinanna/stable-0-05-new > remotes/vinanna/steve_chervitz > remotes/vinanna/topic/bug_2515 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Fri May 14 09:56:48 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 14 May 2010 09:56:48 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> Message-ID: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> On May 14, 2010, at 9:32 AM, Jay Hannah wrote: > You don't find large lists of probably dead things annoying? Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. As an analogy, Google Mail keeps all your dead email (email you delete). Forever. Not because they think most of what you delete you shouldn't have deleted, but because it costs so little, and can be so efficiently managed for the few things that you do decide to recover a year later that it's not worth for you as a user to spend any brain cycles on which emails you should physically delete and which you should only "archive". Likewise, I don't see the gain that outweighs the brain cycles and careful consideration that would have to go into deciding which branches to delete, which ones to move into an "attic", and which ones to keep around. If you don't want to see them, simply clone and wipe them away. Life can be so easy :-) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Fri May 14 10:20:22 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 14 May 2010 09:20:22 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> Message-ID: <0C1AE8D4-70F5-427E-9429-B59156587E19@jays.net> On May 14, 2010, at 8:56 AM, Hilmar Lapp wrote: > On May 14, 2010, at 9:32 AM, Jay Hannah wrote: >> You don't find large lists of probably dead things annoying? > > Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. > > As an analogy, Google Mail keeps all your dead email (email you delete). Forever. OK. So our policy is that our branch list is an ever-growing pile of probably-dead things that we all ignore. A couple of them might be alive and useful at any given moment in time, but only if whoever created them is still around and cares and happens to remember what the point was. Understood. Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Fri May 14 11:34:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 10:34:41 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> Message-ID: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> On May 14, 2010, at 8:56 AM, Hilmar Lapp wrote: > > On May 14, 2010, at 9:32 AM, Jay Hannah wrote: > >> You don't find large lists of probably dead things annoying? > > > Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. > > As an analogy, Google Mail keeps all your dead email (email you delete). Forever. Not because they think most of what you delete you shouldn't have deleted, but because it costs so little, and can be so efficiently managed for the few things that you do decide to recover a year later that it's not worth for you as a user to spend any brain cycles on which emails you should physically delete and which you should only "archive". > > Likewise, I don't see the gain that outweighs the brain cycles and careful consideration that would have to go into deciding which branches to delete, which ones to move into an "attic", and which ones to keep around. If you don't want to see them, simply clone and wipe them away. Life can be so easy :-) > > -hilmar I tend to fall in the middle here, in that it would be nice to clean out feature branches that have been merged back in and relegate all older branches to an attic. Moving branches is as easy as 'git branch -m foo attic/foo'. I'm not in favor of removing branches that haven't been merged back, unless they're deemed unnecessary by the core devs. re: removing feature branches, this is something we have talked about doing in the past on svn, but is a bit trickier at the moment as the git repo doesn't currently indicate if/when specific svn branches were merged to HEAD. We still have read-only access to our svn repo to determine that if needed. So far, though, I haven't seen much in the way of indicating what some regard as 'feature' (removable) vs 'attic' (old but retained). That discussion needs to happen on list. chris From hlapp at drycafe.net Fri May 14 12:56:54 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 14 May 2010 12:56:54 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> Message-ID: <69D4619C-F21E-4FAE-B56F-C2F3B323EFD6@drycafe.net> On May 14, 2010, at 11:34 AM, Chris Fields wrote: > it would be nice to clean out feature branches that have been merged > back in Agreed, if the case is clear. > and relegate all older branches to an attic. Moving branches is as > easy as 'git branch -m foo attic/foo'. That's easy enough too and doesn't lose anything, hence no need to spend time on making sure it might not be a mistake. > I'm not in favor of removing branches that haven't been merged > back, unless they're deemed unnecessary by the core devs. Agreed, except I would remove the conditional. I'd rather spend that time on coding ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From subodhs at iastate.edu Fri May 14 12:24:21 2010 From: subodhs at iastate.edu (Srivastava, Subodh K [AGRON]) Date: Fri, 14 May 2010 11:24:21 -0500 Subject: [Bioperl-l] running perl script Message-ID: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> hi, I am running a perl script and getting error like: Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. How to set the path for this? the other related scripts are working in same directory. I am running; perl, v5.8.8 built for x86_64-linux-thread-multi thank you subodh ************************************* G-302 Agronomy Hall Iowa State University Ames, IA -50010 From rmb32 at cornell.edu Fri May 14 14:38:10 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 11:38:10 -0700 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> Message-ID: <4BED9892.5070408@cornell.edu> At the PDX hackathon last night, I was talking about this problem with a git expert, and he gave me a little tutorial on how git thinks about and keep branches and tags. Each of these things is just a special case of a 'ref', which is just a reference to the end of some piece of the commit graph. If you run git ls-remote http://github.com/bioperl/bioperl-live.git you can see all the refs we currently have in our bioperl-live repo, which are all in either /refs/heads (which are our branches), or /refs/tags (our tags). Now, it turns out you can have arbitrary things in here in addition to heads and tags. I copied one of the old branches to /refs/archives/branch-ensembl-m1 to demonstrate this. Now, it doesn't show up in normal workflow listings, but it's not deleted. If somebody wanted to resurrect it, they could move or copy it into /refs/heads (where it would show up as as an active branch again). To copy a branch into archives/, git push origin origin/:refs/archives/ To *move* a branch into archives/ git push origin origin/:refs/archives/ \ :refs/heads/ The first part of that second part of that push has nothing on the left side of the colon, which pushes a 'null' to refs/heads/, which deletes it. You can have an arbitrary number of these kinds of commands in each push invocation. So, there's a good mechanism for archiving our old branches. Rob From pat.boutet at gmail.com Fri May 14 15:14:36 2010 From: pat.boutet at gmail.com (Patrick Boutet) Date: Fri, 14 May 2010 13:14:36 -0600 Subject: [Bioperl-l] running perl script In-Reply-To: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> Message-ID: On Fri, May 14, 2010 at 10:24 AM, Srivastava, Subodh K [AGRON] < subodhs at iastate.edu> wrote: > hi, > I am running a perl script and getting error like: > > Can't locate Bio/Perl.pm in @INC (@INC contains: > /home/subodhs/SHORE_map/SHOREmap_release_1.1 > /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl > /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl > /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at > /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. > > How to set the path for this? > the other related scripts are working in same directory. > > I am running; perl, v5.8.8 built for x86_64-linux-thread-multi > > thank you > subodh > ************************************* > G-302 > Agronomy Hall > Iowa State University > Ames, IA -50010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Now I'm still new at this but I'll try and be helpful, first where is bioperl installed? System wide or local to your home directory? Do you have root access? What type of shell are you using? Because it seems like you might have to set your shells PERL5LIB variable to check the directory where bioperl is installed. Patrick Boutet From cjfields at illinois.edu Fri May 14 15:23:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 14:23:31 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BED9892.5070408@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> Message-ID: On May 14, 2010, at 1:38 PM, Robert Buels wrote: > At the PDX hackathon last night, I was talking about this problem with a git expert, and he gave me a little tutorial on how git thinks about and keep branches and tags. > > Each of these things is just a special case of a 'ref', which is just a reference to the end of some piece of the commit graph. If you run > > git ls-remote http://github.com/bioperl/bioperl-live.git > > you can see all the refs we currently have in our bioperl-live repo, which are all in either /refs/heads (which are our branches), or /refs/tags (our tags). > > Now, it turns out you can have arbitrary things in here in addition to heads and tags. I copied one of the old branches to /refs/archives/branch-ensembl-m1 to demonstrate this. Now, it doesn't show up in normal workflow listings, but it's not deleted. If somebody wanted to resurrect it, they could move or copy it into /refs/heads (where it would show up as as an active branch again). > > To copy a branch into archives/, > > git push origin origin/:refs/archives/ > > To *move* a branch into archives/ > > git push origin origin/:refs/archives/ \ > :refs/heads/ > > The first part of that second part of that push has nothing on the left side of the colon, which pushes a 'null' to refs/heads/, which deletes it. You can have an arbitrary number of these kinds of commands in each push invocation. > > So, there's a good mechanism for archiving our old branches. > > Rob That's a nice alternative to an attic, and less visible. On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. chris From rmb32 at cornell.edu Fri May 14 18:56:49 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 15:56:49 -0700 Subject: [Bioperl-l] BioPerl for indexing quality score files In-Reply-To: References: Message-ID: <4BEDD531.8050502@cornell.edu> Gregory Jordan wrote: > Ok, I need to shame myself with a huge "RTFM" for this one -- We still like you, Greg. Come hang out in #bioperl, where we can make fun of you properly. ;-) Rob From rmb32 at cornell.edu Fri May 14 19:01:50 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 16:01:50 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> Message-ID: <4BEDD65E.9070702@cornell.edu> Chris Fields wrote: > On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. OK, here are all our current branches, I will go through them in order of last-modified date. 1998-12-11 bioperl 1999-02-19 release-0-04-bug 1999-04-13 bioperl-live 1999-04-13 stable-0-05 2000-01-27 branch-ensembl-m1 2000-02-07 internal-branch-pre-delete-06-tag 2000-03-22 stable-0-05-new 2001-02-19 branch-06 2001-11-14 branch-07-ensembl-120 2001-12-28 steve_chervitz 2002-01-16 branch-07 2002-10-22 branch-1-0-0 2003-07-07 branch-1-2-collection 2003-10-13 branch-1-2 2004-10-20 ontology-cache 2005-04-14 branch-1-4 2006-01-11 bioperl-branch-1-5-1 2006-08-14 branch-experimental 2007-02-14 branch-1-5-2 2007-08-28 featann_rollback 2007-11-07 lightweight_feature_branch Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. 2009-06-17 restriction-refactor Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f 2009-07-16 topic/bug_2515 proposal: keep, jhannah "working" ;-) 2009-08-13 TRY_gff_refactor proposal: delete, git claims it is merged 2009-08-13 TRY_locatableseq_refactor proposal: delete, git claims it is merged 2009-09-29 branch-1-6 keep, 1.6 maint branch i think. 2009-10-14 anydbm-branch keep, MAJ working. MAJ, maybe you should move this to topic/ ? 2010-01-31 TRY_featureio_refactor keep, but looks dead. cjfields, maybe you want to delete it? 2010-05-12 topic/bug_3077 delete, git claims it is merged. Please review, and I'll do the work if people agree. Rob From jason at bioperl.org Fri May 14 19:54:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 14 May 2010 16:54:30 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDD65E.9070702@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> Message-ID: <4BEDE2B6.3010307@bioperl.org> lightweight_feature_branch was my test built with a feature type that is based on arrays instead of hashes got 25+% speedup I believe - have to go back to the archives to see what I claimed was speedup... =) I think that Bio::SeqFeature::Slim might be at least one speedup by Lincoln for Gbrowse that addresses some of the speed problem, though I think it still isn't array-based for data storage. -j Robert Buels wrote, On 5/14/10 4:01 PM: > Chris Fields wrote: >> On a related note, going through, it appears the git conversion >> didn't track merges back to trunk. For instance, I know the >> featann_rollback was merged to trunk but it's not showing up. I know >> svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came >> into play), so it may be hard to actually find true merges w/o that. > > OK, here are all our current branches, I will go through them in order > of last-modified date. > > 1998-12-11 bioperl > 1999-02-19 release-0-04-bug > 1999-04-13 bioperl-live > 1999-04-13 stable-0-05 > 2000-01-27 branch-ensembl-m1 > 2000-02-07 internal-branch-pre-delete-06-tag > 2000-03-22 stable-0-05-new > 2001-02-19 branch-06 > 2001-11-14 branch-07-ensembl-120 > 2001-12-28 steve_chervitz > 2002-01-16 branch-07 > 2002-10-22 branch-1-0-0 > 2003-07-07 branch-1-2-collection > 2003-10-13 branch-1-2 > 2004-10-20 ontology-cache > 2005-04-14 branch-1-4 > 2006-01-11 bioperl-branch-1-5-1 > 2006-08-14 branch-experimental > 2007-02-14 branch-1-5-2 > 2007-08-28 featann_rollback > 2007-11-07 lightweight_feature_branch > > Proposal: move the above to refs/archive and not worry any further > about them. Maybe we can throw them out in 2020. > > 2009-06-17 restriction-refactor > > Proposal: delete, looks like it was merged in > a2cb40e6c9c7da4f776dbb72a0266f54320fa37f > > 2009-07-16 topic/bug_2515 > proposal: keep, jhannah "working" ;-) > > 2009-08-13 TRY_gff_refactor > proposal: delete, git claims it is merged > > 2009-08-13 TRY_locatableseq_refactor > proposal: delete, git claims it is merged > > 2009-09-29 branch-1-6 > keep, 1.6 maint branch i think. > > 2009-10-14 anydbm-branch > keep, MAJ working. MAJ, maybe you should move this to topic/ ? > > 2010-01-31 TRY_featureio_refactor > keep, but looks dead. cjfields, maybe you want to delete it? > > 2010-05-12 topic/bug_3077 > delete, git claims it is merged. > > Please review, and I'll do the work if people agree. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri May 14 23:41:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 22:41:18 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDD65E.9070702@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> Message-ID: <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> On May 14, 2010, at 6:01 PM, Robert Buels wrote: > Chris Fields wrote: >> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. > > OK, here are all our current branches, I will go through them in order of last-modified date. > > 1998-12-11 bioperl > 1999-02-19 release-0-04-bug > 1999-04-13 bioperl-live > 1999-04-13 stable-0-05 > 2000-01-27 branch-ensembl-m1 > 2000-02-07 internal-branch-pre-delete-06-tag > 2000-03-22 stable-0-05-new > 2001-02-19 branch-06 > 2001-11-14 branch-07-ensembl-120 > 2001-12-28 steve_chervitz > 2002-01-16 branch-07 > 2002-10-22 branch-1-0-0 > 2003-07-07 branch-1-2-collection > 2003-10-13 branch-1-2 > 2004-10-20 ontology-cache > 2005-04-14 branch-1-4 > 2006-01-11 bioperl-branch-1-5-1 > 2006-08-14 branch-experimental > 2007-02-14 branch-1-5-2 > 2007-08-28 featann_rollback > 2007-11-07 lightweight_feature_branch > > Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. Just as long as we know they are there. Rob, can you document the archive set up on the wiki so we don't forget it? I deleted the featann_rollback branch. That was a feature branch (no pun intended) to rollback overloading and a host of other changes introduced to bioperl just before the 1.5 release. It was merged a few years ago in svn. > 2009-06-17 restriction-refactor > > Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f This may have been Mark's refactoring, so yes, delete. > 2009-08-13 TRY_gff_refactor > proposal: delete, git claims it is merged > > 2009-08-13 TRY_locatableseq_refactor > proposal: delete, git claims it is merged I deleted these. The primary goal of TRY_gff_refactor was to work in GFF3 work, but that may rely on FeatureIO so will have to be done in stages. At some point, if we do a larger scale refactoring of GFF for GFF3 compat we can make another branch. TRY_locatableseq_refactor will be obsoleted once GSoC starts. > 2009-09-29 branch-1-6 > keep, 1.6 maint branch i think. Yes. I will probably work on another set of merges from to 1.6 soon to bring it up to speed, maybe for one last 1.6 release. > 2009-10-14 anydbm-branch > keep, MAJ working. MAJ, maybe you should move this to topic/ ? > > 2010-01-31 TRY_featureio_refactor > keep, but looks dead. cjfields, maybe you want to delete it? Yes. I've deleted this, as FeatureIO is on it's own. > 2010-05-12 topic/bug_3077 > delete, git claims it is merged. That's already deleted. Maybe needs to be pruned locally? > Please review, and I'll do the work if people agree. > > Rob Good start! chris From cjfields at illinois.edu Fri May 14 23:45:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 22:45:07 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDE2B6.3010307@bioperl.org> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <4BEDE2B6.3010307@bioperl.org> Message-ID: <34DFCB4E-2048-4A62-AE9C-06CBF900D38A@illinois.edu> This was moved into bioperl-dev at some point: http://github.com/bioperl/bioperl-dev/tree/master/Bio/SeqFeature/ Might be obsolete as well. chris On May 14, 2010, at 6:54 PM, Jason Stajich wrote: > lightweight_feature_branch was my test built with a feature type that is based on arrays instead of hashes got 25+% speedup I believe - have to go back to the archives to see what I claimed was speedup... =) > > I think that Bio::SeqFeature::Slim might be at least one speedup by Lincoln for Gbrowse that addresses some of the speed problem, though I think it still isn't array-based for data storage. > > -j > > Robert Buels wrote, On 5/14/10 4:01 PM: >> Chris Fields wrote: >>> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. >> >> OK, here are all our current branches, I will go through them in order of last-modified date. >> >> 1998-12-11 bioperl >> 1999-02-19 release-0-04-bug >> 1999-04-13 bioperl-live >> 1999-04-13 stable-0-05 >> 2000-01-27 branch-ensembl-m1 >> 2000-02-07 internal-branch-pre-delete-06-tag >> 2000-03-22 stable-0-05-new >> 2001-02-19 branch-06 >> 2001-11-14 branch-07-ensembl-120 >> 2001-12-28 steve_chervitz >> 2002-01-16 branch-07 >> 2002-10-22 branch-1-0-0 >> 2003-07-07 branch-1-2-collection >> 2003-10-13 branch-1-2 >> 2004-10-20 ontology-cache >> 2005-04-14 branch-1-4 >> 2006-01-11 bioperl-branch-1-5-1 >> 2006-08-14 branch-experimental >> 2007-02-14 branch-1-5-2 >> 2007-08-28 featann_rollback >> 2007-11-07 lightweight_feature_branch >> >> Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. >> >> 2009-06-17 restriction-refactor >> >> Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f >> >> 2009-07-16 topic/bug_2515 >> proposal: keep, jhannah "working" ;-) >> >> 2009-08-13 TRY_gff_refactor >> proposal: delete, git claims it is merged >> >> 2009-08-13 TRY_locatableseq_refactor >> proposal: delete, git claims it is merged >> >> 2009-09-29 branch-1-6 >> keep, 1.6 maint branch i think. >> >> 2009-10-14 anydbm-branch >> keep, MAJ working. MAJ, maybe you should move this to topic/ ? >> >> 2010-01-31 TRY_featureio_refactor >> keep, but looks dead. cjfields, maybe you want to delete it? >> >> 2010-05-12 topic/bug_3077 >> delete, git claims it is merged. >> >> Please review, and I'll do the work if people agree. >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Sat May 15 10:27:48 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 09:27:48 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) Message-ID: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> I wrote some tests and merged and deleted branch topic/bug_2515. Bio::SeqIO::gbxml is now in master. Thanks to Ryan Golhar for the contribution! Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah bioperl-live$ perl -I. t/SeqIO/gbxml.t 1..14 ok 1 - use Bio::SeqIO::gbxml; ok 2 - The object isa Bio::SeqIO ok 3 - molecule ok 4 - alphabet ok 5 - primary_id ok 6 - display_id ok 7 - version ok 8 - is_circular ok 9 - description ok 10 - sequence ok 11 - classification ok 12 - feat - clone_lib ok 13 - feat - db_xref ok 14 - feat - lab_host From jay at jays.net Sat May 15 10:57:54 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 09:57:54 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> Message-ID: On May 15, 2010, at 9:34 AM, Chris Fields wrote: > Can you add something to the Changes file for this? You can make a new section for bug fixes or new features at the top, and we can worry about versions later. > > I'll add in the recent bug fix I made as well. Pushed. Feel free to discard any of that you don't like. HTH, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Sat May 15 11:46:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 May 2010 10:46:16 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> Message-ID: <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> Thanks Jay. I'll add a bit in myself for bug 3077. Not sure if we'll pursue another point release yet, but it would be nice to get changes out prior to any major structural reorganization. chris On May 15, 2010, at 9:57 AM, Jay Hannah wrote: > On May 15, 2010, at 9:34 AM, Chris Fields wrote: >> Can you add something to the Changes file for this? You can make a new section for bug fixes or new features at the top, and we can worry about versions later. >> >> I'll add in the recent bug fix I made as well. > > Pushed. Feel free to discard any of that you don't like. HTH, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > From jay at jays.net Sat May 15 14:08:35 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 13:08:35 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> Message-ID: On May 15, 2010, at 10:46 AM, Chris Fields wrote: > Thanks Jay. I'll add a bit in myself for bug 3077. Not sure if we'll pursue another point release yet, but it would be nice to get changes out prior to any major structural reorganization. Is there a list whose completion will mark the push of 1.6.2 to CPAN? The Changes file says this now: Bugs to be addressed: http://bugzilla.open-bio.org specific bugs intended for the next CPAN release series highlighted in BUGS But I don't understand what 'highlighted in BUGS' means. I also don't know what a 'point release' is. :) Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From David.Messina at sbc.su.se Sat May 15 15:34:58 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 May 2010 21:34:58 +0200 Subject: [Bioperl-l] parsing blast report with long description In-Reply-To: References: Message-ID: Shalabh, Could you please file a bug report on this at bugzilla.open-bio.org? Please include a description (pasting this email will do) and most importantly a test script and sample blast output file which reproduces the problem. We will need those in order to be able to diagnose and fix the problem. Thanks! Dave On May 13, 2010, at 5:07 PM, shalabh sharma wrote: > Hi All, > I need some help in parsing blast output. > I have a inhouse database that contain sequences with really long > description. > >> SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open > Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - > 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV > > So my blast report looks like this: > > ..... > ..... >> SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 > 6887/Open Ocean/Galapagos Islands/134 miles NE of > Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 > m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > Length = 213 > > Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix > adjust. > Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) > ..... > ..... > > (note that the tag "TI_1000008216887" is splitting in two lines). > > I am using SeqIO to parse this report. What i am doing is parsing the > description field again to get all the tags. like > .... > .... > my $desc = $hit->description; > my @f = split('/',$desc); > for(my $i = 0;$i < scalar > @f;$i++){ print OUT "$f[$i]\t";} > ..... > ..... > > > *I am getting the perfect parsed report but the field with TI_1000008216887 > has a space **TI_100000821 6887 *. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun May 16 11:14:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 10:14:25 -0500 Subject: [Bioperl-l] GenomeeTools Message-ID: Anyone used GenomeTools? I'm thinking of setting up some C bindings to it. It has a C-based GFF3 parser, among other goodies. http://genometools.org/index.html chris From cjfields at illinois.edu Sun May 16 12:16:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 11:16:11 -0500 Subject: [Bioperl-l] Bio-FeatureIO Message-ID: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> All, Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. chris From jay at jays.net Sun May 16 13:32:57 2010 From: jay at jays.net (Jay Hannah) Date: Sun, 16 May 2010 12:32:57 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 11:16 AM, Chris Fields wrote: > Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. I'm curious about how this works in terms of git storage. Does this mean that the separate Bio-FeatureIO repo will have the entire history of BioPerl inside it? (Making git clones of Bio-FeatureIO 189MB?) In the recent past I have attempted pulling certain files across git repos before, and ended up with the full history of repo1 inside repo2. I'm unclear if this is just how life is, or if I did it wrong. You could, of course, always just cp text files in, but then you lose the history of those files. Is there some way to get all the history of a handful of files from massive repo1 into tiny repo2 without making repo1 massive? I don't know if any of these considerations are important for the eventual de-monolithification of BioPerl, I was just generally curious. git does that to me. :) Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Sun May 16 14:18:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 13:18:24 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 12:32 PM, Jay Hannah wrote: > On May 16, 2010, at 11:16 AM, Chris Fields wrote: >> Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. > > I'm curious about how this works in terms of git storage. > > Does this mean that the separate Bio-FeatureIO repo will have the entire history of BioPerl inside it? (Making git clones of Bio-FeatureIO 189MB?) > > In the recent past I have attempted pulling certain files across git repos before, and ended up with the full history of repo1 inside repo2. I'm unclear if this is just how life is, or if I did it wrong. > > You could, of course, always just cp text files in, but then you lose the history of those files. > > Is there some way to get all the history of a handful of files from massive repo1 into tiny repo2 without making repo1 massive? > > I don't know if any of these considerations are important for the eventual de-monolithification of BioPerl, I was just generally curious. git does that to me. :) > > Thanks, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I'm just planning on having something to the effect of 'Bio-FeatureIO is a set of modules developed by author X that once was part of bioperl-live, but was removed at point XYZ to significantly refactor the code,' then point back to bioperl-live if anyone is interested in software archaeology. Not sure we would need to go beyond that. chris From jay at jays.net Sun May 16 14:47:42 2010 From: jay at jays.net (Jay Hannah) Date: Sun, 16 May 2010 13:47:42 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 1:18 PM, Chris Fields wrote: > I'm just planning on having something to the effect of 'Bio-FeatureIO is a set of modules developed by author X that once was part of bioperl-live, but was removed at point XYZ to significantly refactor the code,' then point back to bioperl-live if anyone is interested in software archaeology. Not sure we would need to go beyond that. Gotcha. That certainly solves the problem. :) So maybe in 2020 we'll be pushing 30 independent github repos to PAUSE all citing the bioperl-live repo for historical digging prior to their emancipation. To jhannah in the year 2020: You are NOT too old for dirt bikes. Keep riding! :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From fs5 at sanger.ac.uk Mon May 17 04:38:18 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 17 May 2010 09:38:18 +0100 Subject: [Bioperl-l] parsing blast report with long description In-Reply-To: References: Message-ID: <1274085498.5288.30.camel@deskpro15336.dynamic.sanger.ac.uk> I think you should try to avoid those long IDs anyway, especially because you have spaces in there too and this may cause problems further down the line as many programs will use a pattern like />(\S+)/ as the identifier. I would build a small database for your files and use unique database identifiers in your FASTA files. That will make it easier in the future to collect, for example, all sequences from a certain region etc. If you want to avoid that you could have two file: one FASTA files using numbers as IDs and a file where you map those numbers to sample descriptions, i.e. a simple flat-file database. Frank On Thu, 2010-05-13 at 11:07 -0400, shalabh sharma wrote: > Hi All, > I need some help in parsing blast output. > I have a inhouse database that contain sequences with really long > description. > > >SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open > Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - > 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV > > So my blast report looks like this: > > ..... > ..... > >SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 > 6887/Open Ocean/Galapagos Islands/134 miles NE of > Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 > m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > Length = 213 > > Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix > adjust. > Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) > ..... > ..... > > (note that the tag "TI_1000008216887" is splitting in two lines). > > I am using SeqIO to parse this report. What i am doing is parsing the > description field again to get all the tags. like > .... > .... > my $desc = $hit->description; > my @f = split('/',$desc); > for(my $i = 0;$i < scalar > @f;$i++){ print OUT "$f[$i]\t";} > ..... > ..... > > > *I am getting the perfect parsed report but the field with TI_1000008216887 > has a space **TI_100000821 6887 *. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From fs5 at sanger.ac.uk Mon May 17 04:41:51 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 17 May 2010 09:41:51 +0100 Subject: [Bioperl-l] running perl script In-Reply-To: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> Message-ID: <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> why are you requiring "Bio::Perl"? You would normally use somethink specific in the BioPerl bundle, like Bio::Seq or whatever. Can you show some of your script? Frank On Fri, 2010-05-14 at 11:24 -0500, Srivastava, Subodh K [AGRON] wrote: > hi, > I am running a perl script and getting error like: > > Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. > > How to set the path for this? > the other related scripts are working in same directory. > > I am running; perl, v5.8.8 built for x86_64-linux-thread-multi > > thank you > subodh > ************************************* > G-302 > Agronomy Hall > Iowa State University > Ames, IA -50010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Mon May 17 08:26:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 07:26:20 -0500 Subject: [Bioperl-l] running perl script In-Reply-To: <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <63D0BEDA-27F7-48AB-ABE8-1F39B09B349A@illinois.edu> Frank, Bio::Perl is the generic user module for very simple tasks. See here: http://github.com/bioperl/bioperl-live/blob/master/Bio/Perl.pm Subodh, you need to make sure the modules are in your perl library path. See the following link, under 'INSTALLING BIOPERL IN A PERSONAL MODULE AREA': http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix chris On May 17, 2010, at 3:41 AM, Frank Schwach wrote: > why are you requiring "Bio::Perl"? You would normally use somethink > specific in the BioPerl bundle, like Bio::Seq or whatever. Can you show > some of your script? > Frank > > > On Fri, 2010-05-14 at 11:24 -0500, Srivastava, Subodh K [AGRON] wrote: >> hi, >> I am running a perl script and getting error like: >> >> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. >> >> How to set the path for this? >> the other related scripts are working in same directory. >> >> I am running; perl, v5.8.8 built for x86_64-linux-thread-multi >> >> thank you >> subodh >> ************************************* >> G-302 >> Agronomy Hall >> Iowa State University >> Ames, IA -50010 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ross at cuhk.edu.hk Mon May 17 08:42:35 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 17 May 2010 20:42:35 +0800 Subject: [Bioperl-l] extracting genbank content Message-ID: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Dear all, When there are more than one genbank records in a file, except by splitting the file into separate records, what can I do to transverse the records? $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); $seqobj=$obj->next_seq(); Do I just use another $obj->next_seq() so it will point to another record? Thanks for your advice. From amackey at virginia.edu Mon May 17 09:51:31 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Mon, 17 May 2010 09:51:31 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: On Thu, May 13, 2010 at 2:20 AM, Heikki Lehvaslaiho < heikki.lehvaslaiho at gmail.com> wrote: > > As of getting values outseide the defined region, that is a feature rather > than a bug. The idea was to be able to ask what would the new coordinate be > if the feature extended beyond the known limits. The is the capability of > Bio::Coordinate::ExtrapolatingPair that is used here. That class also has a > method strict that can be used to prevent extrapolating, but the code to > access that has not been written into GeneMapper. I'll see if I can get it > to work. > > I had this same thought/expectation, but that in fact is not what's going on. There is no place in the GeneMapper code where the CDS end coordinate is being used, only the begin coordinate. The implicit assumption is that the CDS ends at the last exon. >From the perspective of the translate/revtranslate methods, an extrapolating pair does not make sense (at least to me) -- just as a CDS coordinate is undefined within an intron, so too would I expect a CDS coordinate to be undefined in an UTR or intragenic region. Alternatively, it would be nice (in general) to be able to check whether the provided mapping is an extrapolation or not. -Aaron From David.Messina at sbc.su.se Mon May 17 09:56:35 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 17 May 2010 15:56:35 +0200 Subject: [Bioperl-l] extracting genbank content In-Reply-To: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: Hi Ross, > Do I just use another $obj->next_seq() so it will point to another record? Yes. The common approach is to use a while loop: my $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); while(my $seqobj = $obj->next_seq) { # do stuff with $seqobj } For more details, see the SeqIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SeqIO Dave From cjfields at illinois.edu Mon May 17 12:36:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 11:36:37 -0500 Subject: [Bioperl-l] extracting genbank content In-Reply-To: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: <9952EA98-248E-41B8-9816-A3A01EC6ADFE@illinois.edu> Depends on what you need to do. If you are just interested in pulling out certain bits of data from each record, using SeqIO is a good option. But if you want to access the records as a flat database (not iteration, but indexed for fast access), use Bio::Index::GenBank or Bio::DB::Flat to make a simple flat file database and access them by ID. chris On May 17, 2010, at 7:42 AM, Ross KK Leung wrote: > Dear all, > > > > When there are more than one genbank records in a file, except by splitting > the file into separate records, what can I do to transverse the records? > > > > $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); > > > $seqobj=$obj->next_seq(); > > > > Do I just use another $obj->next_seq() so it will point to another record? > > > > Thanks for your advice. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Mon May 17 12:50:21 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 May 2010 09:50:21 -0700 Subject: [Bioperl-l] GenomeeTools In-Reply-To: References: Message-ID: <4BF173CD.8020600@cornell.edu> I haven't used GenomeTools but I've used GenomeThreader, one of Gordon's other tools. Rob Chris Fields wrote: > Anyone used GenomeTools? I'm thinking of setting up some C bindings to it. It has a C-based GFF3 parser, among other goodies. > > http://genometools.org/index.html > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Mon May 17 20:15:13 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 May 2010 17:15:13 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> Message-ID: <4BF1DC11.6030402@cornell.edu> OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches Rob Chris Fields wrote: > On May 14, 2010, at 6:01 PM, Robert Buels wrote: > >> Chris Fields wrote: >>> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. >> OK, here are all our current branches, I will go through them in order of last-modified date. >> >> 1998-12-11 bioperl >> 1999-02-19 release-0-04-bug >> 1999-04-13 bioperl-live >> 1999-04-13 stable-0-05 >> 2000-01-27 branch-ensembl-m1 >> 2000-02-07 internal-branch-pre-delete-06-tag >> 2000-03-22 stable-0-05-new >> 2001-02-19 branch-06 >> 2001-11-14 branch-07-ensembl-120 >> 2001-12-28 steve_chervitz >> 2002-01-16 branch-07 >> 2002-10-22 branch-1-0-0 >> 2003-07-07 branch-1-2-collection >> 2003-10-13 branch-1-2 >> 2004-10-20 ontology-cache >> 2005-04-14 branch-1-4 >> 2006-01-11 bioperl-branch-1-5-1 >> 2006-08-14 branch-experimental >> 2007-02-14 branch-1-5-2 >> 2007-08-28 featann_rollback >> 2007-11-07 lightweight_feature_branch >> >> Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. > > Just as long as we know they are there. Rob, can you document the archive set up on the wiki so we don't forget it? > > I deleted the featann_rollback branch. That was a feature branch (no pun intended) to rollback overloading and a host of other changes introduced to bioperl just before the 1.5 release. It was merged a few years ago in svn. > >> 2009-06-17 restriction-refactor >> >> Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f > > This may have been Mark's refactoring, so yes, delete. > >> 2009-08-13 TRY_gff_refactor >> proposal: delete, git claims it is merged >> >> 2009-08-13 TRY_locatableseq_refactor >> proposal: delete, git claims it is merged > > I deleted these. The primary goal of TRY_gff_refactor was to work in GFF3 work, but that may rely on FeatureIO so will have to be done in stages. At some point, if we do a larger scale refactoring of GFF for GFF3 compat we can make another branch. TRY_locatableseq_refactor will be obsoleted once GSoC starts. > >> 2009-09-29 branch-1-6 >> keep, 1.6 maint branch i think. > > Yes. I will probably work on another set of merges from to 1.6 soon to bring it up to speed, maybe for one last 1.6 release. > >> 2009-10-14 anydbm-branch >> keep, MAJ working. MAJ, maybe you should move this to topic/ ? >> >> 2010-01-31 TRY_featureio_refactor >> keep, but looks dead. cjfields, maybe you want to delete it? > > Yes. I've deleted this, as FeatureIO is on it's own. > >> 2010-05-12 topic/bug_3077 >> delete, git claims it is merged. > > That's already deleted. Maybe needs to be pruned locally? > >> Please review, and I'll do the work if people agree. >> >> Rob > > Good start! > > chris > > From jay at jays.net Mon May 17 20:35:33 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 17 May 2010 19:35:33 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BF1DC11.6030402@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> Message-ID: <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> On May 17, 2010, at 7:15 PM, Robert Buels wrote: > OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches Thank you!! git pull --prune and suddenly I feel clean again! :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From amackey at virginia.edu Mon May 17 20:42:17 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Mon, 17 May 2010 20:42:17 -0400 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: <20100518001029.CD8644229D@smtp1.rs.github.com> References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: I probably missed some prior discussion of this, but any chance that the new commit messages can actually include the (unified, possibly truncated-for-length) diff of the changes? My own 2 cents is that community-wide visual skims of the diffs provide a valuable spot-check for typo's and other think-o's. Plus it gives me an indication of how major the change was. A corollary -- might there be an RSS feed by which I could subscribe to such diffs, rather than get emails about them? Since the emails are sent from "noreply", I already have to step out of the normal email flow to respond to a diff, might as well go whole hog and remove them from my email consciousness entirely, and place them with the other various information streams in my RSS reader. Thanks, -Aaron On Mon, May 17, 2010 at 8:10 PM, wrote: > Branch: refs/archives/heads/branch-1-0-0 > Home: http://github.com/bioperl/bioperl-live > > Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 > > http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 > Author: sac > Date: 2002-10-22 (Tue, 22 Oct 2002) > > Changed paths: > M Bio/SearchIO/Writer/HitTableWriter.pm > > Log Message: > ----------- > Added frame to the column map. > > svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > From jay at jays.net Mon May 17 21:10:56 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 17 May 2010 20:10:56 -0500 Subject: [Bioperl-l] 319a6e: Added frame to the column map. In-Reply-To: References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > I probably missed some prior discussion of this, but any chance that the new > commit messages can actually include the (unified, possibly > truncated-for-length) diff of the changes? I'm 5 years behind the cool-kids curve on this stuff. :) I just discovered SVN::Notify for $work[0]. By default it kicks out really pretty color HTML diffs of every change. I assume there's an equivalent for git? You could always click to github. It's color HTML diffs are very pretty. That commit for example: http://github.com/bioperl/bioperl-live/commit/319a6e Plus all the other github shiny -- comment specific lines of the commit, or the commit itself, etc. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Mon May 17 21:35:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 20:35:21 -0500 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> Aaron, We can do either, though setting up diffs will take a bit more work (will have to set up a post-receive URL to a CGI script to process this). RSS is quite a bit easier: http://github.com/bioperl/bioperl-live/commits/master.atom Replace 'bioperl-live' with any of the other repos for repo-specific RSS commits. The links go to the commits where you can also make in-line notes/comments by clicking in the diff code, or simple comments at the bottom. Those comments are then passed on to bioperl-guts-l for everyone to see. Example here: http://github.com/bioperl/bioperl-live/commit/c86c048c96786f8517ae1ad1fc5e5823eecf52c3 and the relevant bioperl-guts-l posts: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031259.html http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031260.html chris On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > I probably missed some prior discussion of this, but any chance that the new > commit messages can actually include the (unified, possibly > truncated-for-length) diff of the changes? > > My own 2 cents is that community-wide visual skims of the diffs provide a > valuable spot-check for typo's and other think-o's. Plus it gives me an > indication of how major the change was. > > A corollary -- might there be an RSS feed by which I could subscribe to such > diffs, rather than get emails about them? Since the emails are sent from > "noreply", I already have to step out of the normal email flow to respond to > a diff, might as well go whole hog and remove them from my email > consciousness entirely, and place them with the other various information > streams in my RSS reader. > > Thanks, > > -Aaron > > On Mon, May 17, 2010 at 8:10 PM, wrote: > >> Branch: refs/archives/heads/branch-1-0-0 >> Home: http://github.com/bioperl/bioperl-live >> >> Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 >> >> http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 >> Author: sac >> Date: 2002-10-22 (Tue, 22 Oct 2002) >> >> Changed paths: >> M Bio/SearchIO/Writer/HitTableWriter.pm >> >> Log Message: >> ----------- >> Added frame to the column map. >> >> svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 >> >> >> _______________________________________________ >> Bioperl-guts-l mailing list >> Bioperl-guts-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Tue May 18 03:16:52 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 00:16:52 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> Message-ID: <4BF23EE4.6020704@cornell.edu> We may want to do the same for our tags as well. Our github download page is fairly disastrous. See: http://github.com/bioperl/bioperl-live/downloads It's not clear that a similar date-cutoff policy would work for tags. Pretty much all of these things were before my time, I don't know what most of them are. Does someone with more history than me have some thoughts as to what should stay on that download page? The rest of the tags could be archived. Rob Jay Hannah wrote: > On May 17, 2010, at 7:15 PM, Robert Buels wrote: >> OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches > > Thank you!! git pull --prune and suddenly I feel clean again! :) > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > From bpcwhite at gmail.com Tue May 18 05:49:29 2010 From: bpcwhite at gmail.com (Bryan White) Date: Tue, 18 May 2010 02:49:29 -0700 (PDT) Subject: [Bioperl-l] distance Message-ID: Hello, I am trying to create a simple program to show me the distance between taxa on a given tree. However, I am having trouble getting the bioperl code to work. Here is the code that I am using: -------- #! /usr/bin/perl use strict; use warnings; use Bio::Tree::Draw::Cladogram; use Bio::TreeIO; #use Bio::TreeFunctionsI; my $node1 = 'homo_sapiens'; my $node2 = 'murinae'; my $input = new Bio::TreeIO('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree = $input->next_tree; my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); my $distance = $tree->distance(-nodes => \@nodes); #print $distance; -------- And here is the error message I receive: ------------- EXCEPTION ------------- MSG: Must provide 2 nodes STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ Bio/Tree/TreeFunctionsI.pm:811 STACK toplevel ./phylo.pl:19 ------------------------------------- It seems that the nodes are not being read into the @nodes variable. Any help in figuring this out would be appreciated. Thanks, Bryan From biopython at maubp.freeserve.co.uk Tue May 18 06:07:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 May 2010 11:07:15 +0100 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BF23EE4.6020704@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> Message-ID: On Tue, May 18, 2010 at 8:16 AM, Robert Buels wrote: > We may want to do the same for our tags as well. ?Our github download page > is fairly disastrous. ?See: > > http://github.com/bioperl/bioperl-live/downloads > > It's not clear that a similar date-cutoff policy would work for tags. Pretty > much all of these things were before my time, I don't know what most of them > are. > > Does someone with more history than me have some thoughts as to what should > stay on that download page? ?The rest of the tags could be archived. > > Rob Or just turn off the download feature in github. When you prepare a BioPerl release does it contain anything else not found in the repository (e.g. compiled documentation)? We have this for Biopython (compiled PDF and HTML docs) so we prefer to direct casual release downloads via the website not via the tag on github to ensure they get these extra files in the archive. Peter From adsj at novozymes.com Tue May 18 06:21:25 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Tue, 18 May 2010 12:21:25 +0200 Subject: [Bioperl-l] distance References: Message-ID: <87k4r11pei.fsf@topper.koldfront.dk> On Tue, 18 May 2010 02:49:29 -0700 (PDT), Bryan wrote: > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); I think you may have misunderstood the documentation of find_node(). You are supposed to give the fieldname after the dash, so what you want is: my @nodes = $tree->find_node(-id => 'Homo_sapiens','Murinae'); - if the field you want to match on is 'id'. Also, I don't think you can get find_node() to do 'OR'-searches , so you'll need to do something like this: = = = #!/usr/bin/perl use strict; use warnings; use Bio::TreeIO; my $input=Bio::TreeIO->new('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree=$input->next_tree; my ($node1)=$tree->find_node(-id=>'Homo_sapiens'); # this (arbitrarily) picks the first match my ($node2)=$tree->find_node(-id=>'Murinae'); # -"- my $distance=$tree->distance(-nodes=>[$node1, $node2]); print "$distance\n"; = = = It is much easier to help if you give an example of the input as well as the script. I constructed this stand-in for your newick file to test on: (Homo_sapiens:1.1,B:2.2,(C:3.3,Murinae:4.4):5.5); Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From David.Messina at sbc.su.se Tue May 18 06:50:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 12:50:52 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> Message-ID: <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> On May 18, 2010, at 12:07, Peter wrote: > Or just turn off the download feature in github. That might be the best solution, at least for now. The download page is somewhat unfriendly anyway ? the tag names are truncated, there's no way to sort, and the descriptions are, well, not so descriptive (they appear to be just the last commit message). Probably better to keep http://www.bioperl.org/wiki/Getting_BioPerl as our main distribution point for downloads. Dave From jun.yin at ucd.ie Tue May 18 07:15:14 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 18 May 2010 12:15:14 +0100 Subject: [Bioperl-l] distance In-Reply-To: <87k4r11pei.fsf@topper.koldfront.dk> References: <87k4r11pei.fsf@topper.koldfront.dk> Message-ID: <002d01caf67b$637c20d0$2a746270$%yin@ucd.ie> Hi, Bryan, Use Adam's code. The last sentence of my code was wrong. I made a wrong reference... Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Adam "Sj?gren" Sent: Tuesday, May 18, 2010 11:21 AM To: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] distance On Tue, 18 May 2010 02:49:29 -0700 (PDT), Bryan wrote: > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); I think you may have misunderstood the documentation of find_node(). You are supposed to give the fieldname after the dash, so what you want is: my @nodes = $tree->find_node(-id => 'Homo_sapiens','Murinae'); - if the field you want to match on is 'id'. Also, I don't think you can get find_node() to do 'OR'-searches , so you'll need to do something like this: = = = #!/usr/bin/perl use strict; use warnings; use Bio::TreeIO; my $input=Bio::TreeIO->new('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree=$input->next_tree; my ($node1)=$tree->find_node(-id=>'Homo_sapiens'); # this (arbitrarily) picks the first match my ($node2)=$tree->find_node(-id=>'Murinae'); # -"- my $distance=$tree->distance(-nodes=>[$node1, $node2]); print "$distance\n"; = = = It is much easier to help if you give an example of the input as well as the script. I constructed this stand-in for your newick file to test on: (Homo_sapiens:1.1,B:2.2,(C:3.3,Murinae:4.4):5.5); Best regards, Adam -- Adam Sj?gren adsj at novozymes.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From amackey at virginia.edu Tue May 18 07:26:17 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 18 May 2010 07:26:17 -0400 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> References: <20100518001029.CD8644229D@smtp1.rs.github.com> <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> Message-ID: Thanks for the info, and the thoroughness of your explanation! -Aaron On Mon, May 17, 2010 at 9:35 PM, Chris Fields wrote: > Aaron, > > We can do either, though setting up diffs will take a bit more work (will > have to set up a post-receive URL to a CGI script to process this). > > RSS is quite a bit easier: > > http://github.com/bioperl/bioperl-live/commits/master.atom > > Replace 'bioperl-live' with any of the other repos for repo-specific RSS > commits. The links go to the commits where you can also make in-line > notes/comments by clicking in the diff code, or simple comments at the > bottom. Those comments are then passed on to bioperl-guts-l for everyone to > see. Example here: > > > http://github.com/bioperl/bioperl-live/commit/c86c048c96786f8517ae1ad1fc5e5823eecf52c3 > > and the relevant bioperl-guts-l posts: > > http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031259.html > http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031260.html > > chris > > On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > > > I probably missed some prior discussion of this, but any chance that the > new > > commit messages can actually include the (unified, possibly > > truncated-for-length) diff of the changes? > > > > My own 2 cents is that community-wide visual skims of the diffs provide a > > valuable spot-check for typo's and other think-o's. Plus it gives me an > > indication of how major the change was. > > > > A corollary -- might there be an RSS feed by which I could subscribe to > such > > diffs, rather than get emails about them? Since the emails are sent from > > "noreply", I already have to step out of the normal email flow to respond > to > > a diff, might as well go whole hog and remove them from my email > > consciousness entirely, and place them with the other various information > > streams in my RSS reader. > > > > Thanks, > > > > -Aaron > > > > On Mon, May 17, 2010 at 8:10 PM, wrote: > > > >> Branch: refs/archives/heads/branch-1-0-0 > >> Home: http://github.com/bioperl/bioperl-live > >> > >> Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 > >> > >> > http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 > >> Author: sac > >> Date: 2002-10-22 (Tue, 22 Oct 2002) > >> > >> Changed paths: > >> M Bio/SearchIO/Writer/HitTableWriter.pm > >> > >> Log Message: > >> ----------- > >> Added frame to the column map. > >> > >> svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 > >> > >> > >> _______________________________________________ > >> Bioperl-guts-l mailing list > >> Bioperl-guts-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jun.yin at ucd.ie Tue May 18 07:07:43 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 18 May 2010 12:07:43 +0100 Subject: [Bioperl-l] distance In-Reply-To: References: Message-ID: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> Hi, Bryan, In your code: my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); First, You should specify the fieldname. The "fieldname" itself doesnot seem like a valid key. The default field name is "id". Second, the find_node method can only search for one specific term at one time. Third, distance method can only work on two nodes. So try this: my @nodes_human = $tree->find_node(-id => 'Homo_sapiens'); my @nodes_murinae=$tree->find_node(-id=>'Murinae'); my $distance = $tree->distance(-nodes => \($nodes_human[0],$nodes_murinae[0])); #Providing you only have one match for "Homo_sapiens" and " Murinae". Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bryan White Sent: Tuesday, May 18, 2010 10:49 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] distance Hello, I am trying to create a simple program to show me the distance between taxa on a given tree. However, I am having trouble getting the bioperl code to work. Here is the code that I am using: -------- #! /usr/bin/perl use strict; use warnings; use Bio::Tree::Draw::Cladogram; use Bio::TreeIO; #use Bio::TreeFunctionsI; my $node1 = 'homo_sapiens'; my $node2 = 'murinae'; my $input = new Bio::TreeIO('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree = $input->next_tree; my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); my $distance = $tree->distance(-nodes => \@nodes); #print $distance; -------- And here is the error message I receive: ------------- EXCEPTION ------------- MSG: Must provide 2 nodes STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ Bio/Tree/TreeFunctionsI.pm:811 STACK toplevel ./phylo.pl:19 ------------------------------------- It seems that the nodes are not being read into the @nodes variable. Any help in figuring this out would be appreciated. Thanks, Bryan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Tue May 18 08:47:10 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 07:47:10 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> Message-ID: <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> On May 18, 2010, at 5:50 AM, Dave Messina wrote: > > On May 18, 2010, at 12:07, Peter wrote: > >> Or just turn off the download feature in github. > > That might be the best solution, at least for now. > > The download page is somewhat unfriendly anyway ? the tag names are truncated, there's no way to sort, and the descriptions are, well, not so descriptive (they appear to be just the last commit message). > > Probably better to keep > > http://www.bioperl.org/wiki/Getting_BioPerl > > as our main distribution point for downloads. > > > Dave We can turn that off for now, though it is a nice feature. If we need a replacement link for downloads we can use the repo.or.cz mirror link, for example: http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip chris From David.Messina at sbc.su.se Tue May 18 08:53:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 14:53:29 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: On May 18, 2010, at 14:47, Chris Fields wrote: > http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz > http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. I'll go ahead and update the nightly build links on http://www.bioperl.org/wiki/Getting_BioPerl to point to those, then, unless there are objections. Dave From cjfields at illinois.edu Tue May 18 09:56:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 08:56:45 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> On May 18, 2010, at 7:53 AM, Dave Messina wrote: > > On May 18, 2010, at 14:47, Chris Fields wrote: > >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip > > > Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. > > > I'll go ahead and update the nightly build links on > > http://www.bioperl.org/wiki/Getting_BioPerl > > to point to those, then, unless there are objections. > > > Dave This link also still works, even with the 'Downloads' tab off: http://github.com/bioperl/bioperl-live/archives/master Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. 'build' really never applied either, but oh well... chris From biopython at maubp.freeserve.co.uk Tue May 18 09:57:50 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 May 2010 14:57:50 +0100 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: On Tue, May 18, 2010 at 1:53 PM, Dave Messina wrote: > > > On May 18, 2010, at 14:47, Chris Fields wrote: > >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip > > > Oh right, I forgot about the mirror. Silly me. :) So probably > unnecessary to make our own nightly snapshots then. > Just like what you'd get from the big "Download Source" button on github? Equivalent to visiting this page: http://github.com/bioperl/bioperl-live/archives/master Peter From cjfields at illinois.edu Tue May 18 10:03:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 09:03:46 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> Message-ID: On May 18, 2010, at 8:56 AM, Chris Fields wrote: > On May 18, 2010, at 7:53 AM, Dave Messina wrote: > >> >> On May 18, 2010, at 14:47, Chris Fields wrote: >> >>> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >>> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip >> >> >> Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. >> >> >> I'll go ahead and update the nightly build links on >> >> http://www.bioperl.org/wiki/Getting_BioPerl >> >> to point to those, then, unless there are objections. >> >> >> Dave > > This link also still works, even with the 'Downloads' tab off: > > http://github.com/bioperl/bioperl-live/archives/master > > Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. > > 'build' really never applied either, but oh well... > > chris Oh, and on the topic of annotated tags for downloads: http://github.com/blog/651-annotated-downloads chris From David.Messina at sbc.su.se Tue May 18 10:23:34 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 16:23:34 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> Message-ID: <075CC735-0573-4E79-975F-23AD61C41C72@sbc.su.se> On May 18, 2010, at 16:03, Chris Fields wrote: > > This link also still works, even with the 'Downloads' tab off: > > http://github.com/bioperl/bioperl-live/archives/master Ah, great, thanks Chris and Peter. > Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. > > 'build' really never applied either, but oh well... Righto ? done. 'Snapshots' it is. > Oh, and on the topic of annotated tags for downloads: > > http://github.com/blog/651-annotated-downloads Heh, how timely. :) Good, that will solve the description part of it nicely. Dave From jay at jays.net Tue May 18 10:32:47 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 18 May 2010 09:32:47 -0500 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: <20100518030511.59C314202D@smtp1.rs.github.com> References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: Hi Florent, Can you add a line to the /Changes please? New features are especially great to add to that file. :) If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. You also might want to set your git config so your email is valid in your commits. e.g.: $ git config user.name "Jay Hannah" $ git config user.email jay at jays.net (these end up in ~/.gitconfig) Thanks! Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah On May 17, 2010, at 10:05 PM, noreply at github.com wrote: > Branch: refs/heads/master > Home: http://github.com/bioperl/bioperl-live > > Commit: 87c530525da35a981e9f7b06134184f0adfae156 > http://github.com/bioperl/bioperl-live/commit/87c530525da35a981e9f7b06134184f0adfae156 > Author: Florent Angly > Date: 2010-05-17 (Mon, 17 May 2010) > > Changed paths: > M Bio/Assembly/IO.pm > M Bio/Assembly/IO/ace.pm > M t/Assembly/Assembly.t > > Log Message: > ----------- > Implemented the 454 Newbler ACE assembly variant > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l From florent.angly at gmail.com Tue May 18 11:11:40 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 18 May 2010 08:11:40 -0700 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: <4BF2AE2C.209@gmail.com> Good idea Jay! I did as you suggested. Florent On 18/05/10 07:32, Jay Hannah wrote: > Can you add a line to the /Changes please? New features are especially great to add to that file.:) > > If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. > > You also might want to set your git config so your email is valid in your commits. e.g.: > From bimber at wisc.edu Tue May 18 11:28:06 2010 From: bimber at wisc.edu (Ben Bimber) Date: Tue, 18 May 2010 10:28:06 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? Message-ID: this question is more of a general perl one than bioperl specific, so I hope it is appropriate for this list: I am writing code that has two steps. the first generates a large, complex hash describing mutations. it takes a fair amount of time to run this step. the second step uses this data to perform downstream calculations. for the purposes of writing/debugging this downstream code, it would save me a lot of time if i could run the first step once, then store this hash in something like the file system. this way I could quickly load it, when debugging the downstream code without waiting for the hash to be recreated. is there a 'best practice' way to do something like this? I could save a tab-delimited file, which is human readable, but does not represent the structure of the hash, so I would need code to re-parse it. I assume I could probably do something along the lines of dumping a JSON string, then read/decode it. this is easy, but not so human-readable. is there another option i'm not thinking of? what do others do in this sort of situation? thanks in advance. -Ben From cjfields at illinois.edu Tue May 18 11:31:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 10:31:14 -0500 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: On May 18, 2010, at 9:32 AM, Jay Hannah wrote: > Hi Florent, > > Can you add a line to the /Changes please? New features are especially great to add to that file. :) > > If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. Agreed (or, +1, depending on your taste). Also, I would really like to break the habit of committing everything straight to trunk and promote using branches more. Branches are cheap. Something like: # on master git checkout -b 'topic/feature_foo' # switches over to branch 'topic/feature_foo' # hack hack hack # make commits # add tests # add to Changes # make more commits # push to remote branch # merge to master git checkout master git merge 'topic/feature_foo' # test test test, etc, push to origin or similar. Of course, there would be more to it (handling merge conflicts, etc), just need to get a decent workflow document started up. Ah tuits, where are you? > You also might want to set your git config so your email is valid in your commits. e.g.: > > $ git config user.name "Jay Hannah" > $ git config user.email jay at jays.net > (these end up in ~/.gitconfig) > > Thanks! > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I think these are only set there if you use --global, correct? Otherwise it's repo-specific, would be in .git/ somewhere. chris From s.denaxas at gmail.com Tue May 18 11:41:01 2010 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Tue, 18 May 2010 16:41:01 +0100 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Hello, it all really depends on your definition of readable. YAML is readable but requires a parser ; XML is readable but is bloated and requires a code and a parser. You can directly dump the output from Data::Dumper and then eval() it back in a hash. I would think this is the cleanest way if you specifically want to dump a hash and re-generate it with no additional code. You can set the $Data::Dumper::Indent flag to control how readable the hash is. hope this helps, Spiros On Tue, May 18, 2010 at 4:28 PM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. ?the first generates a large, > complex hash describing mutations. ?it takes a fair amount of time to > run this step. ?the second step uses this data to perform downstream > calculations. ?for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. ?this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? ?I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. ?I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. ?this is easy, but not so > human-readable. ?is there another option i'm not thinking of? ?what do > others do in this sort of situation? > > thanks in advance. > > -Ben > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From adsj at novozymes.com Tue May 18 11:57:12 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Tue, 18 May 2010 17:57:12 +0200 Subject: [Bioperl-l] storing/retrieving a large hash on file system? References: Message-ID: <87zkzxmcdj.fsf@topper.koldfront.dk> On Tue, 18 May 2010 10:28:06 -0500, Ben wrote: > is there a 'best practice' way to do something like this? The only one I can think of is "Don't make up your own format unless you really, really have to". > I could save a tab-delimited file, which is human readable, but does > not represent the structure of the hash, so I would need code to > re-parse it. I assume I could probably do something along the lines of > dumping a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? I would use YAML or JSON if I had to look at it "by hand" or if it had to be somehow portable. I would prefer those over CSV, which hasn't necessarily got well-defined handling of special chars, whitespace etc. If speed is more important, I think the Storable module is quite a bit quicker, but the format is "binary". Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From sdavis2 at mail.nih.gov Tue May 18 12:09:38 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 18 May 2010 12:09:38 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > There are a number of solutions on CPAN, probably. This is one maybe off the beaten path, but it is getting a lot of press in the NoSQL database realm: http://1978th.net/tokyocabinet/ Sean From David.Messina at sbc.su.se Tue May 18 12:19:18 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 18:19:18 +0200 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Hi Ben, Storable should do the trick. http://search.cpan.org/~ams/Storable-2.21/ It allows you to save arbitrary perl data structures to disk and load them back in without needing to dump into another format and then parse it later. Dave From cjfields at illinois.edu Tue May 18 12:22:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 11:22:09 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On May 18, 2010, at 10:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > -Ben Would a simple DB_File tied hash work? chris From cjfields at illinois.edu Tue May 18 12:25:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 11:25:11 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: <87zkzxmcdj.fsf@topper.koldfront.dk> References: <87zkzxmcdj.fsf@topper.koldfront.dk> Message-ID: On May 18, 2010, at 10:57 AM, Adam Sj?gren wrote: > On Tue, 18 May 2010 10:28:06 -0500, Ben wrote: > >> is there a 'best practice' way to do something like this? > > The only one I can think of is "Don't make up your own format unless you > really, really have to". > >> I could save a tab-delimited file, which is human readable, but does >> not represent the structure of the hash, so I would need code to >> re-parse it. I assume I could probably do something along the lines of >> dumping a JSON string, then read/decode it. this is easy, but not so >> human-readable. is there another option i'm not thinking of? what do >> others do in this sort of situation? > > I would use YAML or JSON if I had to look at it "by hand" or if it had > to be somehow portable. I would prefer those over CSV, which hasn't > necessarily got well-defined handling of special chars, whitespace etc. > > If speed is more important, I think the Storable module is quite a bit > quicker, but the format is "binary". > > > Best regards, > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Yes, that in combination with a AnyDBM tied hash would work (essentially what Bio::SeqFeature::Collection is under the hood). chris From sdavis2 at mail.nih.gov Tue May 18 12:39:44 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 18 May 2010 12:39:44 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On Tue, May 18, 2010 at 12:09 PM, Sean Davis wrote: > > > On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: > >> this question is more of a general perl one than bioperl specific, so >> I hope it is appropriate for this list: >> >> I am writing code that has two steps. the first generates a large, >> complex hash describing mutations. it takes a fair amount of time to >> run this step. the second step uses this data to perform downstream >> calculations. for the purposes of writing/debugging this downstream >> code, it would save me a lot of time if i could run the first step >> once, then store this hash in something like the file system. this >> way I could quickly load it, when debugging the downstream code >> without waiting for the hash to be recreated. >> >> is there a 'best practice' way to do something like this? I could >> save a tab-delimited file, which is human readable, but does not >> represent the structure of the hash, so I would need code to re-parse >> it. I assume I could probably do something along the lines of dumping >> a JSON string, then read/decode it. this is easy, but not so >> human-readable. is there another option i'm not thinking of? what do >> others do in this sort of situation? >> >> thanks in advance. >> >> > There are a number of solutions on CPAN, probably. This is one maybe off > the beaten path, but it is getting a lot of press in the NoSQL database > realm: > > http://1978th.net/tokyocabinet/ > > Just to be clear, I am assuming that the problem at hand is storing a key/value pair and then retrieving it later. If what you are talking about is a multi-level hash data structure, then Data::Dumper might be the easiest way to go. Sorry for the confusion.... Sean From bimber at wisc.edu Tue May 18 12:47:33 2010 From: bimber at wisc.edu (Ben Bimber) Date: Tue, 18 May 2010 11:47:33 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Thanks for all the suggestions. Storable seems like the simplest route. This will save me hours of staring at my computer. -Ben On Tue, May 18, 2010 at 11:39 AM, Sean Davis wrote: > > > On Tue, May 18, 2010 at 12:09 PM, Sean Davis wrote: >> >> >> On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: >>> >>> this question is more of a general perl one than bioperl specific, so >>> I hope it is appropriate for this list: >>> >>> I am writing code that has two steps. ?the first generates a large, >>> complex hash describing mutations. ?it takes a fair amount of time to >>> run this step. ?the second step uses this data to perform downstream >>> calculations. ?for the purposes of writing/debugging this downstream >>> code, it would save me a lot of time if i could run the first step >>> once, then store this hash in something like the file system. ?this >>> way I could quickly load it, when debugging the downstream code >>> without waiting for the hash to be recreated. >>> >>> is there a 'best practice' way to do something like this? ?I could >>> save a tab-delimited file, which is human readable, but does not >>> represent the structure of the hash, so I would need code to re-parse >>> it. ?I assume I could probably do something along the lines of dumping >>> a JSON string, then read/decode it. ?this is easy, but not so >>> human-readable. ?is there another option i'm not thinking of? ?what do >>> others do in this sort of situation? >>> >>> thanks in advance. >>> >> >> There are a number of solutions on CPAN, probably.? This is one maybe off >> the beaten path, but it is getting a lot of press in the NoSQL database >> realm: >> >> http://1978th.net/tokyocabinet/ >> > > Just to be clear, I am assuming that the problem at hand is storing a > key/value pair and then retrieving it later.? If what you are talking about > is a multi-level hash data structure, then Data::Dumper might be the easiest > way to go. > > Sorry for the confusion.... > > Sean > > > From bosborne11 at verizon.net Tue May 18 12:00:06 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 18 May 2010 12:00:06 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> Ben, I've use Storable to do things like this, for example: use Storable; my %species = ( "Sc" => 4932, # Saccharomyces cerevisiae "Ec" => 83333, # Escherichia coli K12 "Hs" => 9606 # H. sapiens ); my ($help,$id,$name); GetOptions( "s=s" => \$name, "i=i" => \$id, "h" => \$help ); usage() if ($help || !$id || !$name); my $storedHash = $name . ".dump"; # create index for a directory of fasta files my $db = Bio::DB::Fasta->new($name, -makeid => \&make_my_id); # extract species-specific data from gene2accession unless (-e $storedHash) { my $ref; # extract species-specific information from gene2accession open MYIN,"gene2accession" or die "No gene2accession file\n"; while () { my @arr = split "\t",$_; if ($arr[0] == $species{$name} && $arr[9] =~ /\d+/ && $arr[10] =~ /\d+/) { ($ref->{$arr[1]}->{"start"}, $ref->{$arr[1]}->{"end"}, $ref->{$arr[1]}->{"strand"}, $ref->{$arr[1]}->{"id"}) = ($arr[9], $arr[10], $arr[11], $arr[7]); } } # save species-specific information using Storable store $ref, $storedHash; } # retrieve the species-specific data from a stored hash my $ref = retrieve($storedHash); Take away all the parsing details and you can see that it's simple, and that Storable exports store() and retrieve(). Make up a file name, "store" the hash reference. Brian O. On May 18, 2010, at 11:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > -Ben > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Tue May 18 12:06:54 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 18 May 2010 12:06:54 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? Message-ID: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> bioperl-l, Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. We want these to point to github, yes? I'll fix it if the answer is 'yes'. Brian O. From cjfields at illinois.edu Tue May 18 14:04:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 13:04:55 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> Message-ID: <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> Yes. chris On May 18, 2010, at 11:06 AM, Brian Osborne wrote: > bioperl-l, > > Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. > > We want these to point to github, yes? I'll fix it if the answer is 'yes'. > > Brian O. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Tue May 18 15:39:48 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 12:39:48 -0700 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: <4BF2ED04.2050106@cornell.edu> Chris Fields wrote: > Agreed (or, +1, depending on your taste). Also, I would really like to break the habit of committing everything straight to trunk and promote using branches more. Branches are cheap. I did some work on our git workflow at http://www.bioperl.org/wiki/Using_Git#Developing_BioPerl, but it still needs some more work. So, there's the start of the workflow document I think. Rob From rmb32 at cornell.edu Tue May 18 15:42:44 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 12:42:44 -0700 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> References: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> Message-ID: <4BF2EDB4.4060907@cornell.edu> Based on your description, you want to use either: Storable - if you want to load the whole hash into memory or AnyDBM - if you want to be able to look things up from the hash without loading the whole thing in memory Rob From David.Messina at sbc.su.se Tue May 18 16:16:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 22:16:14 +0200 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: <4BF2ED04.2050106@cornell.edu> References: <20100518030511.59C314202D@smtp1.rs.github.com> <4BF2ED04.2050106@cornell.edu> Message-ID: <2D6396F7-E478-4544-B26A-F8A5799F2039@sbc.su.se> Nice, Rob! > I did some work on our git workflow at http://www.bioperl.org/wiki/Using_Git#Developing_BioPerl, but it still needs some more work. > > So, there's the start of the workflow document I think. From bpcwhite at gmail.com Tue May 18 17:34:06 2010 From: bpcwhite at gmail.com (Bryan White) Date: Tue, 18 May 2010 14:34:06 -0700 (PDT) Subject: [Bioperl-l] distance In-Reply-To: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> References: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> Message-ID: <1a2c786f-07e6-4499-8dc9-19a8d4169653@u3g2000prl.googlegroups.com> Thanks guys, I got it working! Bryan On May 18, 4:07?am, Jun Yin wrote: > Hi, Bryan, > > In your code: > ? ? ? ? my @nodes = $tree->find_node(-fieldname => > 'Homo_sapiens','Murinae'); > > First, You should specify the fieldname. The "fieldname" itself doesnot seem > like a valid key. The default field name is "id". > Second, the find_node method can only search for one specific term at one > time. > Third, distance method can only work on two nodes. > > So try this: > > my @nodes_human = $tree->find_node(-id => 'Homo_sapiens'); > my @nodes_murinae=$tree->find_node(-id=>'Murinae'); > > my $distance = $tree->distance(-nodes => > \($nodes_human[0],$nodes_murinae[0])); #Providing you only have one match > for "Homo_sapiens" and " Murinae". > > Cheers, > Jun Yin > Ph.D.?student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > -----Original Message----- > From: bioperl-l-boun... at lists.open-bio.org > > [mailto:bioperl-l-boun... at lists.open-bio.org] On Behalf Of Bryan White > Sent: Tuesday, May 18, 2010 10:49 AM > To: bioper... at bioperl.org > Subject: [Bioperl-l] distance > > Hello, > > I am trying to create a simple program to show me the distance between > taxa on a given tree. However, I am having trouble getting the bioperl > code to work. Here is the code that I am using: > -------- > #! /usr/bin/perl > use strict; > use warnings; > use Bio::Tree::Draw::Cladogram; > use Bio::TreeIO; > #use Bio::TreeFunctionsI; > > my $node1 = 'homo_sapiens'; > my $node2 = 'murinae'; > my $input = new Bio::TreeIO('-format' => 'newick', > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-file' => 'tree_mammalia_newick.txt'); > > my $tree = $input->next_tree; > > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); > > my $distance = $tree->distance(-nodes => \@nodes); > > #print $distance; > > -------- > > And here is the error message I receive: > > ------------- EXCEPTION ------------- > MSG: Must provide 2 nodes > STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ > Bio/Tree/TreeFunctionsI.pm:811 > STACK toplevel ./phylo.pl:19 > ------------------------------------- > > It seems that the nodes are not being read into the @nodes variable. > Any help in figuring this out would be appreciated. > > Thanks, > Bryan > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________ Information from ESET Smart Security, version of virus signature > database 5099 (20100509) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > __________ Information from ESET Smart Security, version of virus signature > database 5099 (20100509) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Wed May 19 00:17:24 2010 From: hartzell at alerce.com (George Hartzell) Date: Tue, 18 May 2010 21:17:24 -0700 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: <19443.26196.893455.52821@gargle.gargle.HOWL> Ben Bimber writes: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? Someone early on in the thread said not to invent another format, and I concur with that whole heartedly. Your choice of words, "large complex hash" makes me worry that you have something more than a large single level hash with sensible keys. Hashes of references to hashes to references to lists to etc... give me hives. If you'ld like to put add a nice general purpose tool to your kit, think about putting it into a simple SQLite database. Put it into an SQLite db and talk to it via DBI and you get some really cool tricks: - you can store complex stuff, - get back the just the part you need, a column, several columns, or the result of a join among multiple tables, - add indexes to make it Go Fast. and in the cool tricks category - you can use SQLite's backup interface to build the database in memory (nice and fast) then quickly stream it out to a disk based file for persistence. - same trick in reverse, if you know you're going to do a reasonably large number of complex queries you can stream a database into memory and then run your queries quickly. - rtree indexes are cool. Going forward you can scale things up to big databases (Pg, Oracle), you can provide safe multiuser access, transactions, etc.... (NFS not withstanding), etc.... g. From avilella at gmail.com Wed May 19 04:36:25 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 19 May 2010 09:36:25 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Message-ID: Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. From jun.yin at ucd.ie Wed May 19 06:40:51 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Wed, 19 May 2010 11:40:51 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: <008101caf73f$c04973c0$40dc5b40$%yin@ucd.ie> Hi, Albert, Check this page for the BioPerl wrapper on next-gen sequencing results http://bioperl.org/wiki/HOWTO:Short-read_assemblies_with_BWA And, I don't think Bio::SimpleAlign works on assembly files. It is targeted at global alignment, e.g. clustalw output file. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Albert Vilella Sent: Wednesday, May 19, 2010 9:36 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From maj at fortinbras.us Wed May 19 09:34:01 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 09:34:01 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > Hi, > > I would like to know what would be the best way to generate a SAM/BAM file > with cDNA alignments against the human reference from a bunch of > Bio::SimpleAlign > cDNA multiple sequence alignment objects. > > Considering I've got a way to map the cDNAs to chromosome coordinates, > how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 > human > coordinates? > > As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads > assemblies. > Should I be using some other tool existing not in bioperl? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed May 19 09:59:03 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 09:59:03 -0400 Subject: [Bioperl-l] out of memory issue In-Reply-To: References: Message-ID: Hi Shalabh and all, Sorry to comment on an old thread, but Dan Kortschak just pointed me to Tie::File. This may be the right solution to this issue. It turns out that DB_File will read in the entire file to memory anyway, while Tie::File (by MJD of course) works on pieces as it should. See Tie::File in CPAN and also this informative post: http://perl.plover.com/TieFile/why-not-DB_File cheers all- (someday, maybe next month, I'll return in force) MAJ ----- Original Message ----- From: "shalabh sharma" To: "bioperl-l" Sent: Wednesday, April 28, 2010 10:13 AM Subject: [Bioperl-l] out of memory issue > Hi All, > I am trying to make a hash of 38 Million ids but every time i get the > following message : > > perl(191) malloc: *** mmap(size=16777216) failed (error code=12) > *** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > Out of memory! > > I am working on MacOX 10.5.8 with 4GB of memory. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From avilella at gmail.com Wed May 19 11:00:27 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 19 May 2010 16:00:27 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Awesome, thanks. I'll give it a try :-) On Wed, May 19, 2010 at 2:34 PM, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use > of Bio::Assembly::IO::sam (I think). I know there is only read capability > for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing > writes (some assembly (so to speak) required...)-- cheers MAJ > ----- Original Message ----- From: "Albert Vilella" > To: > Sent: Wednesday, May 19, 2010 4:36 AM > Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > > >> Hi, >> >> I would like to know what would be the best way to generate a SAM/BAM file >> with cDNA alignments against the human reference from a bunch of >> Bio::SimpleAlign >> cDNA multiple sequence alignment objects. >> >> Considering I've got a way to map the cDNAs to chromosome coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >> human >> coordinates? >> >> As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads >> assemblies. >> Should I be using some other tool existing not in bioperl? >> >> Cheers, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > From lincoln.stein at gmail.com Wed May 19 12:40:31 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Wed, 19 May 2010 12:40:31 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the > use of Bio::Assembly::IO::sam (I think). I know there is only read > capability for B:A:I:sam, but Samtools may give you the appropriate wrapper > for doing writes (some assembly (so to speak) required...)-- cheers MAJ > ----- Original Message ----- From: "Albert Vilella" > > To: > Sent: Wednesday, May 19, 2010 4:36 AM > > Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > > > Hi, >> >> I would like to know what would be the best way to generate a SAM/BAM file >> with cDNA alignments against the human reference from a bunch of >> Bio::SimpleAlign >> cDNA multiple sequence alignment objects. >> >> Considering I've got a way to map the cDNAs to chromosome coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >> human >> coordinates? >> >> As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads >> assemblies. >> Should I be using some other tool existing not in bioperl? >> >> Cheers, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From john.marshall at sanger.ac.uk Wed May 19 12:22:19 2010 From: john.marshall at sanger.ac.uk (John Marshall) Date: Wed, 19 May 2010 17:22:19 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: On 19 May 2010, at 14:34, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates > the use of Bio::Assembly::IO::sam (I think). I've only briefly skimmed the B:T:R:Samtools documentation, but it would appear that this mostly encapsulates running the various samtools subcommands. These provide various manipulations on SAM and BAM files, but don't give you anything in terms of converting from not- SAM/BAM to SAM/BAM. > ----- Original Message ----- From: "Albert Vilella" > >> Considering I've got a way to map the cDNAs to chromosome >> coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against >> ~23.000 human >> coordinates? Perhaps I misunderstand, but if you already have a bunch of snippets of sequence and their mapped coordinates, then the easy way to generate a SAM file containing them is just to print it out by hand. A SAM file is just a tab-separated text file. For each sequence in your Bio::SimpleAlign objects, print out a line containing appropriate values for each of the 11 main SAM fields. (If the snippets are effectively unpaired, then MRNM,MPOS,ISIZE can just be *,0,0, and the only FLAG values you'll be choosing between are 0, 4, 16, and 20.) You should also start the file with an @SQ header for each of the chromosomes you've mapped against. (I'm assuming you've read http://samtools.sourceforge.net/SAM1.pdf -- it's a little vague, but should be more than enough to explain how to e.g. print out a basic SAM file with only the main fields.) Once you've printed out a simple SAM file, you can use B:T:R:Samtools or samtools directly or other tools to convert it to the binary BAM format and/or otherwise work with it. Cheers, John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From maj at fortinbras.us Wed May 19 13:26:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:26:16 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: <42F365BE46A545CE9DF897BA0B18B8EF@NewLife> CORRECTION: B:T:R:Samtools wraps samtools directly, as John said. Sorry, it's been a while... MAJ ----- Original Message ----- From: Lincoln Stein To: Mark A. Jensen Cc: Albert Vilella ; bioperl-l at bioperl.org Sent: Wednesday, May 19, 2010 12:40 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From maj at fortinbras.us Wed May 19 13:30:25 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:30:25 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Yes that's right John; B:T:R:Samtools is used within the B:A:.I:sam to do the write out with samtools command line pgms. Interested parties might look at Bio::Asssembly::IO::sam to see how Lincoln's Bio::DB::Sam (which uses the libbam library directly via XS, also not BioPerl proper but we love it anyway) might be employed. ----- Original Message ----- From: "John Marshall" To: Cc: "Albert Vilella" Sent: Wednesday, May 19, 2010 12:22 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM > On 19 May 2010, at 14:34, Mark A. Jensen wrote: >> Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use >> of Bio::Assembly::IO::sam (I think). > > I've only briefly skimmed the B:T:R:Samtools documentation, but it would > appear that this mostly encapsulates running the various samtools > subcommands. These provide various manipulations on SAM and BAM files, but > don't give you anything in terms of converting from not- SAM/BAM to SAM/BAM. > >> ----- Original Message ----- From: "Albert Vilella" > > >>> Considering I've got a way to map the cDNAs to chromosome coordinates, >>> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >>> human >>> coordinates? > > Perhaps I misunderstand, but if you already have a bunch of snippets of > sequence and their mapped coordinates, then the easy way to generate a SAM > file containing them is just to print it out by hand. > > A SAM file is just a tab-separated text file. For each sequence in your > Bio::SimpleAlign objects, print out a line containing appropriate values for > each of the 11 main SAM fields. (If the snippets are effectively unpaired, > then MRNM,MPOS,ISIZE can just be *,0,0, and the only FLAG values you'll be > choosing between are 0, 4, 16, and 20.) > > You should also start the file with an @SQ header for each of the chromosomes > you've mapped against. > > (I'm assuming you've read http://samtools.sourceforge.net/SAM1.pdf -- it's a > little vague, but should be more than enough to explain how to e.g. print out > a basic SAM file with only the main fields.) > > Once you've printed out a simple SAM file, you can use B:T:R:Samtools or > samtools directly or other tools to convert it to the binary BAM format > and/or otherwise work with it. > > Cheers, > > John > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a > charity registered in England with number 1021457 and a company registered in > England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed May 19 13:21:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:21:56 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: B:T:R:Samtools wraps Bio::Samtools ----- Original Message ----- From: Lincoln Stein To: Mark A. Jensen Cc: Albert Vilella ; bioperl-l at bioperl.org Sent: Wednesday, May 19, 2010 12:40 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Thu May 20 11:37:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 10:37:16 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> Message-ID: <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Yes, if you have time. I have started along that path already, but I'm sure there are lingering spots where links point to the wrong place, or subversion/svn is mentioned. chris On May 20, 2010, at 10:34 AM, Brian Osborne wrote: > Chris, > > Done, easy. Should I remove all references to SVN from the Wiki? > > Brian O. > > On May 18, 2010, at 2:04 PM, Chris Fields wrote: > >> Yes. >> >> chris >> >> On May 18, 2010, at 11:06 AM, Brian Osborne wrote: >> >>> bioperl-l, >>> >>> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >>> >>> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >>> >>> Brian O. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 20 12:05:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 11:05:56 -0500 Subject: [Bioperl-l] Regarding git commits... Message-ID: All, Please make sure to update your local git repos prior to doing commits and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. chris From florent.angly at gmail.com Thu May 20 12:22:50 2010 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 20 May 2010 09:22:50 -0700 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: <4BF561DA.1070700@gmail.com> On 20/05/10 09:05, Chris Fields wrote: > All, > > Please make sure to update your local git repos prior to doing commits That's done with "git pull", as mentioned on the wiki (http://www.bioperl.org/wiki/Using_Git), right? > and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bosborne11 at verizon.net Thu May 20 11:34:39 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 20 May 2010 11:34:39 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> Message-ID: <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> Chris, Done, easy. Should I remove all references to SVN from the Wiki? Brian O. On May 18, 2010, at 2:04 PM, Chris Fields wrote: > Yes. > > chris > > On May 18, 2010, at 11:06 AM, Brian Osborne wrote: > >> bioperl-l, >> >> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >> >> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >> >> Brian O. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu May 20 12:58:22 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 20 May 2010 09:58:22 -0700 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <4BF561DA.1070700@gmail.com> References: <4BF561DA.1070700@gmail.com> Message-ID: <4BF56A2E.8060309@bioperl.org> I think you want $ git pull upstream master http://help.github.com/forking/ Florent Angly wrote, On 5/20/10 9:22 AM: > On 20/05/10 09:05, Chris Fields wrote: >> All, >> >> Please make sure to update your local git repos prior to doing commits > That's done with "git pull", as mentioned on the wiki > (http://www.bioperl.org/wiki/Using_Git), right? > >> and pushing to master, and merge commits in properly if they don't >> match. Please please please don't save over files if they don't >> merge correctly. I just found out I had a prior commit that fixed >> the test number and removed old files that was completely clobbered, >> so I'm having to hand-merge those changes back in now. If it were >> anything more involved I would revert that prior commit completely. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu May 20 13:35:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 12:35:09 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <4BF56A2E.8060309@bioperl.org> References: <4BF561DA.1070700@gmail.com> <4BF56A2E.8060309@bioperl.org> Message-ID: <86401472-ECAB-4C21-8BD1-61AB37003F64@illinois.edu> Yes. The general syntax is: git pull If you have a read-write checkout directly from bioperl/bioperl-live.git, 'origin' should be set to that, and if you are on the a specific branch a simple 'git pull' will work (it implies 'git pull origin '). All collabs can do this. In the case of a forked repo (which anyone can do), it's a little trickier as it's essentially a branch from the repository at a specific point; it isn't automatically synced. You can see that here: http://github.com/bioperl/bioperl-live/network In order to sync with the original repo, you need to specify exactly which remote to pull from, likely not 'origin' (which is your forked repo), but 'upstream' or whatever you set the original bioperl read-only repo to via: git remote add upstream git://github.com/bioperl/bioperl-live.git Then, to sync, do: git pull upstream master git push # goes to your forked repo chris PS - Note on the graph linked to I just synced my branch using the above. On May 20, 2010, at 11:58 AM, Jason Stajich wrote: > I think you want > $ git pull upstream master > > http://help.github.com/forking/ > > Florent Angly wrote, On 5/20/10 9:22 AM: >> On 20/05/10 09:05, Chris Fields wrote: >>> All, >>> >>> Please make sure to update your local git repos prior to doing commits >> That's done with "git pull", as mentioned on the wiki (http://www.bioperl.org/wiki/Using_Git), right? >> >>> and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Thu May 20 14:06:13 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 20 May 2010 13:06:13 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: On May 20, 2010, at 11:05 AM, Chris Fields wrote: > Please make sure to update your local git repos prior to doing commits and pushing to master I thought git refused to push if your local was out of date? (I thought this was one of the general selling points of git?) It seems to be doing that to me, below. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah jhannah at jaysnet-MacBook:~/src/sandbox$ git push To git at github.com:jhannah/sandbox.git ! [rejected] master -> master (non-fast-forward) error: failed to push some refs to 'git at github.com:jhannah/sandbox.git' To prevent you from losing history, non-fast-forward updates were rejected Merge the remote changes before pushing again. See the 'Note about fast-forwards' section of 'git push --help' for details. From cjfields at illinois.edu Thu May 20 14:43:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 13:43:12 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. chris On May 20, 2010, at 1:06 PM, Jay Hannah wrote: > On May 20, 2010, at 11:05 AM, Chris Fields wrote: >> Please make sure to update your local git repos prior to doing commits and pushing to master > > I thought git refused to push if your local was out of date? (I thought this was one of the general selling points of git?) It seems to be doing that to me, below. > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > jhannah at jaysnet-MacBook:~/src/sandbox$ git push > To git at github.com:jhannah/sandbox.git > ! [rejected] master -> master (non-fast-forward) > error: failed to push some refs to 'git at github.com:jhannah/sandbox.git' > To prevent you from losing history, non-fast-forward updates were rejected > Merge the remote changes before pushing again. See the 'Note about > fast-forwards' section of 'git push --help' for details. > From jay at jays.net Thu May 20 15:09:00 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 20 May 2010 14:09:00 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> References: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> Message-ID: <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> On May 20, 2010, at 1:43 PM, Chris Fields wrote: > It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. So... you're saying don't commit if you don't have any idea what you're committing? :) git pull git diff git status if local is clean then -edit- git diff if it looks good then git commit git status if it looks good then git push Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah enjoys preaching to the choir ;) From cjfields at illinois.edu Thu May 20 15:24:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 14:24:17 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> References: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> Message-ID: <95305268-0D84-478C-A380-68E81742F18F@illinois.edu> On May 20, 2010, at 2:09 PM, Jay Hannah wrote: > On May 20, 2010, at 1:43 PM, Chris Fields wrote: >> It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. > > So... you're saying don't commit if you don't have any idea what you're committing? :) > > git pull > git diff > git status > if local is clean then > -edit- > git diff if it looks good then git commit > git status if it looks good then git push > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > enjoys preaching to the choir ;) Maybe the point is, if someone is having a problem with git either pulling from or pushing to the remote repo, it's very likely b/c of a merge conflict (git is trying to tell you something). There are lots of ways to resolve those (most easily by hand if the change is small). But saving over the top of someone else's commit in a re-cloned repo is definitely not one of them. Possibly a section of 'Using git' that needs some work? chris From charles.tilford at bms.com Thu May 20 16:27:27 2010 From: charles.tilford at bms.com (Charles Tilford) Date: Thu, 20 May 2010 16:27:27 -0400 Subject: [Bioperl-l] Bio::Species irritated with "unclassified sequences" Message-ID: <4BF59B2F.9000300@bms.com> Bio::Species::classification() is irritated with me when I provide it with a @class_array that is composed of one node, particularly: $obj->classification("unclassified sequences") AFAICT this is a valid, single node taxa "tree": http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=12908 Subroutine classification is expecting at least two class members, the problem with the above call crops up as: Use of uninitialized value $vals[1] in quotemeta at /stf/biocgi/tilfordc/patch_lib/Bio/Species.pm line 179 ( $Id: Species.pm 16700 2010-01-15 19:50:11Z dave_messina $) ... and the relevant code is: sub classification { my ($self, @vals) = @_; if (@vals) { if (ref($vals[0]) eq 'ARRAY') { @vals = @{$vals[0]}; } # make sure the lineage contains us as first or second element # (lineage may have subspecies, species, genus ...) my $name = $self->node_name; my ($genus, $species) = (quotemeta($vals[1]), quotemeta($vals[0])); That is, it's expecting at least (species, genus) in the array. Am I misusing classification(), or Bio::Species in general? I know it's named "Species", but I've been using it as a generic tree object for arbitrary taxonomy nodes, not just species and subspecies. This block a little lower down: unless ($self->rank) { # and that we are rank species $self->rank('species'); } ... implies that the module can be used for taxa ranks other than species. However, doing so would not prevent the module being aggravated over a null $vals[1]. The use case here is building Bio::Seq::RichSeq objects pulled from a (very large) sequence database, and then dumped / displayed with SeqIO. Most are well behaved, but there's a non-trivial number of 'artificial' constructs that don't root to an organism. -CAT From dimitark at bii.a-star.edu.sg Thu May 20 22:18:21 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 21 May 2010 10:18:21 +0800 Subject: [Bioperl-l] a problem with HspI module? Message-ID: <4BF5ED6D.6030506@bii.a-star.edu.sg> Hello guys, i think i found a problem with ' Bio::Search::HSP::HSPI'. Consider the following HSP: ------------- Score = 48.9 bits (115), Expect = 8e-04, Method: Compositional matrix adjust. Identities = 27/77 (35%), Positives = 40/77 (51%), Gaps = 14/77 (18%) Frame = +1 Query 371 PSGMLLA-----SCSDDMTLKIWSMKQEVCIHDLQAHNKEIYTIKWSPTGPATSNPNSNI 425 P LLA S S D T+++W ++Q VC H L H + +Y++ +SP G Sbjct 6955270 PGLQLLAFSHPPSASFDSTVRLWDVEQGVCTHTLMKHQEPVYSVAFSPDGK--------- 6955422 Query 426 MLASASFDSTVRLWDIE 442 LAS SFD V +W+ + Sbjct 6955423 YLASGSFDKYVHIWNTQ 6955473 --------------- The method 'frac_identical' is not functioning right. ------------- Title : frac_identical Usage : my $frac_id = $hsp->frac_identical( ['query'|'hit'|'total'] ); Function: Returns the fraction of identitical positions for this HSP Returns : Float in range 0.0 -> 1.0 Args : 'query' = num identical / length of query seq (without gaps) 'hit' = num identical / length of hit seq (without gaps) 'total' = num identical / length of alignment (with gaps) default = 'total' --------------- According to the method description, for the HSP above, 'frac_identical' should return '0.42' with 'hit'. But it doesnt. Now with 'hit' gives '0.13'. With 'total' gives normal result '0.35'. Thats all. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From cjfields at illinois.edu Thu May 20 22:24:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 21:24:46 -0500 Subject: [Bioperl-l] a problem with HspI module? In-Reply-To: <4BF5ED6D.6030506@bii.a-star.edu.sg> References: <4BF5ED6D.6030506@bii.a-star.edu.sg> Message-ID: It would be best to file this in a bug report, along with example data. chris On May 20, 2010, at 9:18 PM, Dimitar Kenanov wrote: > Hello guys, > i think i found a problem with ' Bio::Search::HSP::HSPI'. Consider the following HSP: > ------------- > Score = 48.9 bits (115), Expect = 8e-04, Method: Compositional matrix adjust. > Identities = 27/77 (35%), Positives = 40/77 (51%), Gaps = 14/77 (18%) > Frame = +1 > > Query 371 PSGMLLA-----SCSDDMTLKIWSMKQEVCIHDLQAHNKEIYTIKWSPTGPATSNPNSNI 425 > P LLA S S D T+++W ++Q VC H L H + +Y++ +SP G > Sbjct 6955270 PGLQLLAFSHPPSASFDSTVRLWDVEQGVCTHTLMKHQEPVYSVAFSPDGK--------- 6955422 > > Query 426 MLASASFDSTVRLWDIE 442 > LAS SFD V +W+ + > Sbjct 6955423 YLASGSFDKYVHIWNTQ 6955473 > --------------- > > The method 'frac_identical' is not functioning right. > ------------- > Title : frac_identical > Usage : my $frac_id = $hsp->frac_identical( ['query'|'hit'|'total'] ); > Function: Returns the fraction of identitical positions for this HSP > Returns : Float in range 0.0 -> 1.0 > Args : 'query' = num identical / length of query seq (without gaps) > 'hit' = num identical / length of hit seq (without gaps) > 'total' = num identical / length of alignment (with gaps) > default = 'total' > --------------- > According to the method description, for the HSP above, 'frac_identical' should return '0.42' with 'hit'. But it doesnt. Now with 'hit' gives '0.13'. With 'total' gives normal result '0.35'. > > Thats all. > Cheers > > Dimitar > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Fri May 21 13:44:26 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 10:44:26 -0700 Subject: [Bioperl-l] codon tables, finding ORFs Message-ID: <4BF6C67A.4040202@cornell.edu> Hi all, Right now, Bio::Tools::CodonTable uses as its 'standard' table the NCBI one, described at http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG1. This table recognizes three different start codons: the usual ATG, plus TTG and CTG (which I'd never heard of before looking there, seems they are rare). The issue is, if you use this codon scheme to find open reading frames in nucleotide sequences, you get some ORFs that I think a lot of biologists would be surprised at, from these two (rare?) start codons. Seems to me, this might be a problem. I mean, a naive user (which just about everyone is!) would expect the default codon table to only recognize the canonical ATG as a start, right? And would be rather displeased if BioPerl said (by default) that something starting with one of these rare codons was an open reading frame? So I guess my question is, do we think BioPerl (Bio::Tools::CodonTable) should really recognize these rare start codons by default? Rob From scott at scottcain.net Fri May 21 14:15:20 2010 From: scott at scottcain.net (Scott Cain) Date: Fri, 21 May 2010 14:15:20 -0400 Subject: [Bioperl-l] [Gmod-schema] Trying to load my first database In-Reply-To: References: Message-ID: Hi Daniel, I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. Of course, the file you sent me would be the same file you sent me yesterday; sorry for my poor memory :-) This file uncovered a bug in BioPerl in the FeatureIO module. While fixing the bug may be difficult, working around it might not be too bad. Additionally, I'm not sure we should fix it right now, as this is an effort underway to rework this section of BioPerl anyway. The good news is that the work around is fairly simple. In the GFF that MAKER created, when parsing prodigal output, it generates GFF lines like this: Contig125 pred_gff:prodigal_v2.00 match 104 1723 157.5 + . ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; The tricky part is this tag/value in the ninth column: type=ATG. The tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is in the third column, so when it is parsing this line of GFF, it tries to reassign the feature type to something that isn't valid. The work around is pretty easy: since "type" is a problematic tag, and it appears that the type tag here is defining the start type, I would suggest doing a global search and replace on the file to replace "type=" with "start_type=". I did that and the file loaded fine. I don't know if it is MAKER that creates this tag or the BioPerl parser for prodigal, but changing this at the source might be nice (of course, it might also break somebody else's code :-/ I'll enter a bug for this in the BioPerl bug tracker. Scott On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: > Hi Scott, > > I used Maker to generate the attached file. > > -Daniel > > On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >> Hi Daniel, >> >> Please keep the schema mailing list cc'ed in so the responses can be >> archived and more eyes than just mine can try to solve the problem. >> >> Can you send a sample of the GFF that is causing the problem? ?Any >> ontology term that is in Chado should be "legal." ?If there's >> something causing a problem, we need to figure out what it is. >> >> Scott >> >> >> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>> Hi Scott, >>> >>> I am using the same image as we used in class. ?I was able to load >>> each of the examples in the GMOD course (Pythium) and on the Chado >>> website (yeast). >>> >>> On another note, is there an easy way to navigate the ontology terms >>> that are legal and standard in both GFF3 and in Chado. ?I am having >>> trouble understanding how to convert from an arbitrary analysis (e.g. >>> Blasting KEGG) into a format that works. >>> >>> Thanks so much! >>> -Daniel >>> >>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>> Hi Daniel, >>>> >>>> That error message looks like one that would come from an older >>>> version of BioPerl. ?What version do you have? >>>> >>>> Scott >>>> >>>> >>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>> Hi Scott, >>>>> >>>>> Thanks for the reply. ?Sorry, I should have been able to track down >>>>> that error. ?Could you tell me what the following error means? >>>>> >>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>> --noexon --recreate_cache >>>>> (Re)creating the uniquename cache in the database... >>>>> Creating table... >>>>> Populating table... >>>>> Creating indexes... >>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>> Preparing data for inserting into the chado database >>>>> (This may take a while ...) >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>> with key type. If you were adding new keys in, perhaps you want to >>>>> make use >>>>> of the archetype method to allow registration to a more basic type >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>> ----------------------------------------------------------- >>>>> >>>>> Abnormal termination, trying to clean up... >>>>> >>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>> won't be needed)... >>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>> Exiting... >>>>> >>>>> >>>>> Thanks so much! >>>>> -Daniel >>>>> >>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>> Hi Daniel, >>>>>> >>>>>> The error message you got said that the GFF file that you are trying >>>>>> to load couldn't be found; are you sure the path was correct? ?The >>>>>> file itself looks OK. >>>>>> >>>>>> Scott >>>>>> >>>>>> >>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>> Hello All, >>>>>>> >>>>>>> I am trying to load my first genome from maker. ?Not sure what the >>>>>>> problem is... any help is awesome! ?I am attaching at least part of >>>>>>> the dataset. >>>>>>> >>>>>>> -Daniel >>>>>>> >>>>>>> >>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>> --noexon >>>>>>> >>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>> MSG: Could not open >>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>> such file or directory >>>>>>> STACK: Error::throw >>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>> ----------------------------------------------------------- >>>>>>> >>>>>>> Abnormal termination, trying to clean up... >>>>>>> >>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>> Exiting... >>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gmod-schema mailing list >>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ------------------------------------------------------------------------ >>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>> Ontario Institute for Cancer Research >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >> Ontario Institute for Cancer Research >> > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Fri May 21 14:45:01 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 21 May 2010 14:45:01 -0400 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6C67A.4040202@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> Message-ID: <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> Rob, The user will use translate(), which can do something like this: $prot_obj = $my_seq_object->translate(-orf => 1, -start => "atg" ); CodonTable does little more than hold the codon/aa data. All the useful work is done by translate(), and there are lots of options. Here is part of the documentation: Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! Brian O. On May 21, 2010, at 1:44 PM, Robert Buels wrote: > Hi all, > > Right now, Bio::Tools::CodonTable uses as its 'standard' table the NCBI one, described at http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG1. > > This table recognizes three different start codons: the usual ATG, plus TTG and CTG (which I'd never heard of before looking there, seems they are rare). > > The issue is, if you use this codon scheme to find open reading frames in nucleotide sequences, you get some ORFs that I think a lot of biologists would be surprised at, from these two (rare?) start codons. > > Seems to me, this might be a problem. I mean, a naive user (which just about everyone is!) would expect the default codon table to only recognize the canonical ATG as a start, right? And would be rather displeased if BioPerl said (by default) that something starting with one of these rare codons was an open reading frame? > > So I guess my question is, do we think BioPerl (Bio::Tools::CodonTable) should really recognize these rare start codons by default? > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From briano at bioteam.net Fri May 21 14:52:19 2010 From: briano at bioteam.net (Brian Osborne) Date: Fri, 21 May 2010 14:52:19 -0400 Subject: [Bioperl-l] What is CPAN doing? Message-ID: bioperl-l, Here's the POD for the translate() method: =head2 translate Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Or if you expect a complete coding sequence (CDS) translation, with inititator at the beginning and terminator at the end: $protein_seq_obj = $cds_seq_obj->translate(-complete => 1); Or if you want translate() to find the first initiation codon and return the corresponding protein: $protein_seq_obj = $cds_seq_obj->translate(-orf => 1); Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The complete CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translated protein object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! By default translate() translates termination codons to the some character (default is *), both internal and trailing codons. Setting "-complete" to 1 tells translate() to remove the trailing character. -offset is used for seqfeatures which contain the the \codon_start tag and can be set to 1, 2, or 3. This is the offset by which the sequence translation starts relative to the first base of the feature For details on codon tables used by translate() see L. Deprecated argument set (v. 1.5.1 and prior versions) where each argument is an element in an array: 1: character for terminator (optional), defaults to '*'. 2: character for unknown amino acid (optional), defaults to 'X'. 3: frame (optional), valid values are 0, 1, 2, defaults to 0. 4: codon table id (optional), defaults to 1. 5: complete coding sequence expected, defaults to 0 (false). 6: boolean, throw exception if not complete coding sequence (true), defaults to warning (false) 7: codontable, a custom Bio::Tools::CodonTable object (optional). =cut And here's what appears on CPAN: Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The full CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translation object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : character for terminator (optional) defaults to '*' character for unknown amino acid (optional) defaults to 'X' frame (optional) valid values 0, 1, 2, defaults to 0 codon table id (optional) defaults to 1 complete coding sequence expected, defaults to 0 (false) boolean, throw exception if not complete CDS (true) or defaults to warning (false) Most of the POD is missing - does anyone know why? Brian O. From barani at avesthagen.com Thu May 20 07:27:04 2010 From: barani at avesthagen.com (barani at avesthagen.com) Date: Thu, 20 May 2010 16:57:04 +0530 (IST) Subject: [Bioperl-l] Bio::Biblio find method proxy problem Message-ID: <49660.192.168.1.5.1274354824.squirrel@mail.avesthagen.com> Hi, Our lab is behind firewall. I am using FC10 Linux. I have set the httpproxy in /etc/bash_profile. I am searching for research articles using Bio::Biblio "find" method as shown in the following PERL code.This program executes well, when I run it in the command line. But when i use the same code in PERL CGI, it does not work.(Says "couldn't retrieve results from http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"). Is there anyway that I can set the proxy within the codes as argument and make it executable ? It will be very useful if you guys can help me. ##################################################### #!/usr/bin/perl use Bio::Biblio; use Bio::Biblio::IO; my $search="ABySS[title] AND (Simpson[Author]) AND 2009[dp]"; my $biblio = Bio::Biblio->new(-access=> 'eutils'); $biblio->find($search)->has_next; while(my $xml = $biblio->get_next){ my $io = Bio::Biblio::IO->new( -data => $xml, -format => 'medlinexml' ); my $article = $io->next_bibref(); >>>>>>>>>>>>>>> XML Parser >>>>>>>>>>>> <<<<<<<<<<<<<<< XML Parser <<<<<<<<<<<< } ############################################################### Best Regards barani ----------------------------------- Baranidharan P Project Head Bioinformatics - Genomics Group Avesthagen Ltd Ground floor, Innovator Building International Tech Park Bangalore Whitefield Bangalore - 560066 Ph. 09900727597 Mail Off .barani at avesthagen.com Per. baranidharanp at gmail.com ------------------------------------- From bbimber at gmail.com Fri May 21 09:58:03 2010 From: bbimber at gmail.com (Ben Bimber) Date: Fri, 21 May 2010 08:58:03 -0500 Subject: [Bioperl-l] CommandExts and arrays Message-ID: I am getting an error when trying to pass an array as a param with command exts. I hope there is something obvious i'm missing, but I cant seem to figure this out. I am trying to run the merge two BAM files using Bio::Tools::Run::Samtools using something like this: my $new_bam = Bio::Tools::Run::Samtools->new( -command => 'merge', -program_dir => '/usr/bin/samtools/', )->run( -obm => output_file.bam', -ibm => ['file1.bam', 'file2.bam'], ); When i use an array for the -ibm param, I get an error saying 'cannot use string 'file1' as an arrayref while strict refs in place'. The error comes from this code in CommandExts.pm, around line 989. adding 'no strict' right before the final line stops the error: # expand arrayrefs my $l = $#files; for (0..$l) { if (ref($files[$_]) eq 'ARRAY') { splice(@files, $_, 1, @{$files[$_]}); #error thrown from this line splice(@switches, $_, 1, ($switches[$_]) x @{$files[$_]}); } Thanks for the help. From daniel.quest at gmail.com Fri May 21 15:34:35 2010 From: daniel.quest at gmail.com (Daniel Quest) Date: Fri, 21 May 2010 12:34:35 -0700 Subject: [Bioperl-l] [Gmod-schema] Trying to load my first database In-Reply-To: References: Message-ID: Hey Scott, Thanks so much for the work on this. I have CC'ed Doug Hyatt, the developer of Prodigal so that he is aware of this problem. I am thinking that Maker just passed the Prodigal tags through and then the conflict happened on the Chado load. From my POV it is probably easiest to make small changes to the Prodigal GFF3 output to sync up with the Chado schema. Thanks so much -Daniel On Fri, May 21, 2010 at 11:15 AM, Scott Cain wrote: > Hi Daniel, > > I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. > > Of course, the file you sent me would be the same file you sent me > yesterday; sorry for my poor memory :-) > > This file uncovered a bug in BioPerl in the FeatureIO module. ?While > fixing the bug may be difficult, working around it might not be too > bad. ?Additionally, I'm not sure we should fix it right now, as this > is an effort underway to rework this section of BioPerl anyway. ?The > good news is that the work around is fairly simple. > > In the GFF that MAKER created, when parsing prodigal output, it > generates GFF lines like this: > > Contig125 ? ? ? pred_gff:prodigal_v2.00 match ? 104 ? ? 1723 ? ?157.5 > ?+ ? ? ? . > ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; > > The tricky part is this tag/value in the ninth column: type=ATG. ?The > tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is > in the third column, so when it is parsing this line of GFF, it tries > to reassign the feature type to something that isn't valid. ?The work > around is pretty easy: since "type" is a problematic tag, and it > appears that the type tag here is defining the start type, I would > suggest doing a global search and replace on the file to replace > "type=" with "start_type=". ?I did that and the file loaded fine. ?I > don't know if it is MAKER that creates this tag or the BioPerl parser > for prodigal, but changing this at the source might be nice (of > course, it might also break somebody else's code :-/ ?I'll enter a bug > for this in the BioPerl bug tracker. > > Scott > > > On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: >> Hi Scott, >> >> I used Maker to generate the attached file. >> >> -Daniel >> >> On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >>> Hi Daniel, >>> >>> Please keep the schema mailing list cc'ed in so the responses can be >>> archived and more eyes than just mine can try to solve the problem. >>> >>> Can you send a sample of the GFF that is causing the problem? ?Any >>> ontology term that is in Chado should be "legal." ?If there's >>> something causing a problem, we need to figure out what it is. >>> >>> Scott >>> >>> >>> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>>> Hi Scott, >>>> >>>> I am using the same image as we used in class. ?I was able to load >>>> each of the examples in the GMOD course (Pythium) and on the Chado >>>> website (yeast). >>>> >>>> On another note, is there an easy way to navigate the ontology terms >>>> that are legal and standard in both GFF3 and in Chado. ?I am having >>>> trouble understanding how to convert from an arbitrary analysis (e.g. >>>> Blasting KEGG) into a format that works. >>>> >>>> Thanks so much! >>>> -Daniel >>>> >>>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>>> Hi Daniel, >>>>> >>>>> That error message looks like one that would come from an older >>>>> version of BioPerl. ?What version do you have? >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>>> Hi Scott, >>>>>> >>>>>> Thanks for the reply. ?Sorry, I should have been able to track down >>>>>> that error. ?Could you tell me what the following error means? >>>>>> >>>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>>> --noexon --recreate_cache >>>>>> (Re)creating the uniquename cache in the database... >>>>>> Creating table... >>>>>> Populating table... >>>>>> Creating indexes... >>>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>>> Preparing data for inserting into the chado database >>>>>> (This may take a while ...) >>>>>> >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>>> with key type. If you were adding new keys in, perhaps you want to >>>>>> make use >>>>>> of the archetype method to allow registration to a more basic type >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Abnormal termination, trying to clean up... >>>>>> >>>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>>> won't be needed)... >>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>> Exiting... >>>>>> >>>>>> >>>>>> Thanks so much! >>>>>> -Daniel >>>>>> >>>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> The error message you got said that the GFF file that you are trying >>>>>>> to load couldn't be found; are you sure the path was correct? ?The >>>>>>> file itself looks OK. >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>>> Hello All, >>>>>>>> >>>>>>>> I am trying to load my first genome from maker. ?Not sure what the >>>>>>>> problem is... any help is awesome! ?I am attaching at least part of >>>>>>>> the dataset. >>>>>>>> >>>>>>>> -Daniel >>>>>>>> >>>>>>>> >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>>> --noexon >>>>>>>> >>>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>>> MSG: Could not open >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>>> such file or directory >>>>>>>> STACK: Error::throw >>>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>>> ----------------------------------------------------------- >>>>>>>> >>>>>>>> Abnormal termination, trying to clean up... >>>>>>>> >>>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>>> Exiting... >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gmod-schema mailing list >>>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>> Ontario Institute for Cancer Research >>> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > From rmb32 at cornell.edu Fri May 21 16:11:24 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 13:11:24 -0700 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> Message-ID: <4BF6E8EC.6050001@cornell.edu> Brian Osborne wrote: > The user will use translate(), which can do something like this: > > $prot_obj = $my_seq_object->translate(-orf => 1, > -start => "atg" ); Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. Rob From carson.holt at genetics.utah.edu Fri May 21 15:53:35 2010 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 21 May 2010 13:53:35 -0600 Subject: [Bioperl-l] [maker-devel] [Gmod-schema] Trying to load my first database In-Reply-To: Message-ID: That is correct. MAKER will just pass user defined GFF3 tags through rather than trying to make sense of them or trimming them off. Carson On 5/21/10 1:34 PM, "Daniel Quest" wrote: Hey Scott, Thanks so much for the work on this. I have CC'ed Doug Hyatt, the developer of Prodigal so that he is aware of this problem. I am thinking that Maker just passed the Prodigal tags through and then the conflict happened on the Chado load. From my POV it is probably easiest to make small changes to the Prodigal GFF3 output to sync up with the Chado schema. Thanks so much -Daniel On Fri, May 21, 2010 at 11:15 AM, Scott Cain wrote: > Hi Daniel, > > I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. > > Of course, the file you sent me would be the same file you sent me > yesterday; sorry for my poor memory :-) > > This file uncovered a bug in BioPerl in the FeatureIO module. While > fixing the bug may be difficult, working around it might not be too > bad. Additionally, I'm not sure we should fix it right now, as this > is an effort underway to rework this section of BioPerl anyway. The > good news is that the work around is fairly simple. > > In the GFF that MAKER created, when parsing prodigal output, it > generates GFF lines like this: > > Contig125 pred_gff:prodigal_v2.00 match 104 1723 157.5 > + . > ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; > > The tricky part is this tag/value in the ninth column: type=ATG. The > tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is > in the third column, so when it is parsing this line of GFF, it tries > to reassign the feature type to something that isn't valid. The work > around is pretty easy: since "type" is a problematic tag, and it > appears that the type tag here is defining the start type, I would > suggest doing a global search and replace on the file to replace > "type=" with "start_type=". I did that and the file loaded fine. I > don't know if it is MAKER that creates this tag or the BioPerl parser > for prodigal, but changing this at the source might be nice (of > course, it might also break somebody else's code :-/ I'll enter a bug > for this in the BioPerl bug tracker. > > Scott > > > On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: >> Hi Scott, >> >> I used Maker to generate the attached file. >> >> -Daniel >> >> On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >>> Hi Daniel, >>> >>> Please keep the schema mailing list cc'ed in so the responses can be >>> archived and more eyes than just mine can try to solve the problem. >>> >>> Can you send a sample of the GFF that is causing the problem? Any >>> ontology term that is in Chado should be "legal." If there's >>> something causing a problem, we need to figure out what it is. >>> >>> Scott >>> >>> >>> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>>> Hi Scott, >>>> >>>> I am using the same image as we used in class. I was able to load >>>> each of the examples in the GMOD course (Pythium) and on the Chado >>>> website (yeast). >>>> >>>> On another note, is there an easy way to navigate the ontology terms >>>> that are legal and standard in both GFF3 and in Chado. I am having >>>> trouble understanding how to convert from an arbitrary analysis (e.g. >>>> Blasting KEGG) into a format that works. >>>> >>>> Thanks so much! >>>> -Daniel >>>> >>>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>>> Hi Daniel, >>>>> >>>>> That error message looks like one that would come from an older >>>>> version of BioPerl. What version do you have? >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>>> Hi Scott, >>>>>> >>>>>> Thanks for the reply. Sorry, I should have been able to track down >>>>>> that error. Could you tell me what the following error means? >>>>>> >>>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>>> --noexon --recreate_cache >>>>>> (Re)creating the uniquename cache in the database... >>>>>> Creating table... >>>>>> Populating table... >>>>>> Creating indexes... >>>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>>> Preparing data for inserting into the chado database >>>>>> (This may take a while ...) >>>>>> >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>>> with key type. If you were adding new keys in, perhaps you want to >>>>>> make use >>>>>> of the archetype method to allow registration to a more basic type >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Abnormal termination, trying to clean up... >>>>>> >>>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>>> won't be needed)... >>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>> Exiting... >>>>>> >>>>>> >>>>>> Thanks so much! >>>>>> -Daniel >>>>>> >>>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> The error message you got said that the GFF file that you are trying >>>>>>> to load couldn't be found; are you sure the path was correct? The >>>>>>> file itself looks OK. >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>>> Hello All, >>>>>>>> >>>>>>>> I am trying to load my first genome from maker. Not sure what the >>>>>>>> problem is... any help is awesome! I am attaching at least part of >>>>>>>> the dataset. >>>>>>>> >>>>>>>> -Daniel >>>>>>>> >>>>>>>> >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>>> --noexon >>>>>>>> >>>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>>> MSG: Could not open >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>>> such file or directory >>>>>>>> STACK: Error::throw >>>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>>> ----------------------------------------------------------- >>>>>>>> >>>>>>>> Abnormal termination, trying to clean up... >>>>>>>> >>>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>>> Exiting... >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gmod-schema mailing list >>>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. scott at scottcain dot net >>>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. scott at scottcain dot net >>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Fri May 21 16:44:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 15:44:18 -0500 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6E8EC.6050001@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> Message-ID: <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> On May 21, 2010, at 3:11 PM, Robert Buels wrote: > Brian Osborne wrote: >> The user will use translate(), which can do something like this: >> $prot_obj = $my_seq_object->translate(-orf => 1, >> -start => "atg" ); > > Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. > > Rob Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. chris From rmb32 at cornell.edu Fri May 21 16:48:20 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 13:48:20 -0700 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> Message-ID: <4BF6F194.3080209@cornell.edu> Chris Fields wrote: > Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. > > chris Oh they're available, CodonTable has a number of tables in it that you make translate() use optionally, and there are bacterial tables in there (but they are not well documented). The default behavior is the 'NCBI standard' (eukaryotic) table that I linked to in the original post on this thread. What I am looking for is a discussion of what the best default behavior of $seq->translate( -orf => 1 ) with no arguments should be. But also, there should be better documentation about the codon tables that are available, I can add that in my topic/longest_orf branch. Rob From cjfields at illinois.edu Fri May 21 16:52:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 15:52:15 -0500 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6F194.3080209@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> <4BF6F194.3080209@cornell.edu> Message-ID: <06B1B1F1-979F-461C-BC9B-57A79C26CCE7@illinois.edu> On May 21, 2010, at 3:48 PM, Robert Buels wrote: > Chris Fields wrote: > > Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. > > > > chris > > Oh they're available, CodonTable has a number of tables in it that you make translate() use optionally, and there are bacterial tables in there (but they are not well documented). The default behavior is the 'NCBI standard' (eukaryotic) table that I linked to in the original post on this thread. > > What I am looking for is a discussion of what the best default behavior of $seq->translate( -orf => 1 ) with no arguments should be. Probably the simplest, with documentation on how to change it when needed. > But also, there should be better documentation about the codon tables that are available, I can add that in my topic/longest_orf branch. > > Rob Agreed. More docs never hurt. chris From bosborne11 at verizon.net Fri May 21 16:32:30 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 21 May 2010 16:32:30 -0400 Subject: [Bioperl-l] codon tables, finding ORFs Message-ID: Rob, translate() is one of these methods where reading the documentation is required. Or to put it another way, if you tried to use it without reading the docs most of the time you'd get a result that differs from what you wanted, given the variety of ways to use it, quite apart from the issue of the 3 initiation codons. So really, you have to read the docs, and they say: By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! My concern right now is that CPAN has removed this text and more! If you wanted to add an additional codon table and make it a default I have no problem with that. But, the "naive user" who doesn't read the documentation is probably still going to get "surprising" results. I don't think there's any way around RTFM for this method, changing the default table does not change this. Brian O. On May 21, 2010, at 4:11 PM, Robert Buels wrote: > Brian Osborne wrote: >> The user will use translate(), which can do something like this: >> $prot_obj = $my_seq_object->translate(-orf => 1, >> -start => "atg" ); > > Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. > > Rob From rmb32 at cornell.edu Fri May 21 17:53:34 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 14:53:34 -0700 Subject: [Bioperl-l] POD rendering question/problem (was [Fwd: What is CPAN doing?]) Message-ID: <4BF700DE.8040804@cornell.edu> Hi search.cpan.org maintainers, For one of the methods in BioPerl, a good portion of the POD that's in the source [1] isn't being rendered into HTML on its search.cpan.org page [2]. We'd like to get this POD displaying properly, either by us (BioPerl) tweaking the POD on our end, or by you guys tweaking whatever process is making the HTML. So: do we need to tweak our POD to get it displaying properly? If so, what needs to change in that POD? Rob [1] The source and POD in question: http://search.cpan.org/src/CJFIELDS/BioPerl-1.6.1/Bio/PrimarySeqI.pm [2] The HTML in question: http://search.cpan.org/~cjfields/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm#translate -------- Original Message -------- Subject: [Bioperl-l] What is CPAN doing? Date: Fri, 21 May 2010 14:52:19 -0400 From: Brian Osborne To: BioPerl List bioperl-l, Here's the POD for the translate() method: =head2 translate Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Or if you expect a complete coding sequence (CDS) translation, with inititator at the beginning and terminator at the end: $protein_seq_obj = $cds_seq_obj->translate(-complete => 1); Or if you want translate() to find the first initiation codon and return the corresponding protein: $protein_seq_obj = $cds_seq_obj->translate(-orf => 1); Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The complete CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translated protein object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! By default translate() translates termination codons to the some character (default is *), both internal and trailing codons. Setting "-complete" to 1 tells translate() to remove the trailing character. -offset is used for seqfeatures which contain the the \codon_start tag and can be set to 1, 2, or 3. This is the offset by which the sequence translation starts relative to the first base of the feature For details on codon tables used by translate() see L. Deprecated argument set (v. 1.5.1 and prior versions) where each argument is an element in an array: 1: character for terminator (optional), defaults to '*'. 2: character for unknown amino acid (optional), defaults to 'X'. 3: frame (optional), valid values are 0, 1, 2, defaults to 0. 4: codon table id (optional), defaults to 1. 5: complete coding sequence expected, defaults to 0 (false). 6: boolean, throw exception if not complete coding sequence (true), defaults to warning (false) 7: codontable, a custom Bio::Tools::CodonTable object (optional). =cut And here's what appears on CPAN: Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The full CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translation object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : character for terminator (optional) defaults to '*' character for unknown amino acid (optional) defaults to 'X' frame (optional) valid values 0, 1, 2, defaults to 0 codon table id (optional) defaults to 1 complete coding sequence expected, defaults to 0 (false) boolean, throw exception if not complete CDS (true) or defaults to warning (false) Most of the POD is missing - does anyone know why? Brian O. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Fri May 21 17:56:37 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 21 May 2010 23:56:37 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser Message-ID: <1274478997.1997.4.camel@gonzo.home.kblin.org> Hi list, hi Thomas, I've just bumped into the fact that bioperl-live still doesn't seem to support the hmmer3 hmmscan output format (thanks for the help at #bioperl). The nice folks on IRC pointed me at an email from Thomas Sharpton, noting that he was already working on a parser for this. So I thought I'd ask about the status of that before I run off writing my own. Is there anything I can help with? Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From rmb32 at cornell.edu Fri May 21 18:32:20 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 15:32:20 -0700 Subject: [Bioperl-l] [perl #75252] AutoReply: POD rendering question/problem (was [Fwd: What is CPAN doing?]) In-Reply-To: References: <4BF700DE.8040804@cornell.edu> Message-ID: <4BF709F4.4030705@cornell.edu> Doing a little more investigation, the culprit seems to actually be a stray old (non-installed) version of the module in our uploaded dist. No action required on your part, unless there is a tweak to the indexing that would have not made this module be the top hit. Status: resolved Rob From cjfields at illinois.edu Fri May 21 19:22:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 18:22:41 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274478997.1997.4.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> Message-ID: <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. Relevant commit msg here: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl =========================================== dev.open-bio.org - Authorized Access Only =========================================== ... bioperl-hmmer3/ ... perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 =========================================== dev.open-bio.org - Authorized Access Only =========================================== perllib cjfields$ chris On May 21, 2010, at 4:56 PM, Kai Blin wrote: > Hi list, hi Thomas, > > I've just bumped into the fact that bioperl-live still doesn't seem to > support the hmmer3 hmmscan output format (thanks for the help at > #bioperl). The nice folks on IRC pointed me at an email from Thomas > Sharpton, noting that he was already working on a parser for this. So I > thought I'd ask about the status of that before I run off writing my > own. Is there anything I can help with? > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin > Abteilung Mikrobiologie/Biotechnologie > Eberhard-Karls-Universit?t T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Mon May 24 06:19:55 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 12:19:55 +0200 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: Hi Ben, This looks like it might be a bug. When I ask for the filespec for the 'merge' command: my @filespec = $new_bam->filespec; print join "\n", @filespec, "\n"; I get: obm *ibm (note the leading '*'). Could you please submit this as a bug? http://www.bioperl.org/wiki/Bugs Thanks, Dave From David.Messina at sbc.su.se Mon May 24 09:00:56 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 15:00:56 +0200 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: <8565_1274696770_ZZg0Z3D5iEeCi.00_C34B77C6-2A3E-4B97-83C2-9BE8679CA331@sbc.su.se> Message-ID: > ok, i put in that bug. Thanks. > why exactly does having the asterisk indicate > this is a bug? i thought the asterisk indicated that multiple values > were allowed for that argument? Ah okay, my ignorance of this module is showing. :) > on a related note, are we supposed to be able to pass file names that > have spaces to command exts? on the few cases where this came up, i > have never seemed to get this to work right, so i just got rid of the > spaces. Sorry, I don't know. Paging Mark Jensen ? have you got a moment to look into this? Dave From diment at gmail.com Sat May 22 04:25:55 2010 From: diment at gmail.com (Kieren Diment) Date: Sat, 22 May 2010 18:25:55 +1000 Subject: [Bioperl-l] OT: The Perl Survey Message-ID: <63B7289C-E218-4BBB-A5A4-33AFECA4C867@gmail.com> Hi, Sorry about the off topic posting, but I'm trying to get as large a sample of programmers that use Perl as possible. The Perl Foundation have funded The Perl Survey, 2010 which is ready for people to complete at http://survey.perlfoundation.org. If you could spend a little time to complete the survey, we would be most grateful. It should take around 10-15 minutes to complete. The official announcement is at: http://news.perlfoundation.org/2010/05/grant-update-the-perl-survey-1.html Thanks in advance Kieren Diment From parametres-personnels at hotmail.fr Sun May 23 11:57:14 2010 From: parametres-personnels at hotmail.fr (NamNAme) Date: Sun, 23 May 2010 08:57:14 -0700 (PDT) Subject: [Bioperl-l] Pfam database Message-ID: <28650160.post@talk.nabble.com> Dear all, A few weeks ago I wrote a program that need the pfam database, and I tested it on the first version of pfam where each protein family sequences are in one file. But now I would like to test it on the last version of pfam but the organization changed. I've found a file called Pfam-A.fasta which contains sequences and the family they belong to. But the sequences inside are not complete. So, I've two questions : Why these sequences are not complete ? And, How can I find a file with complete sequences and the family they belong to ? Thank you for your help. Bye. P-S : There is the file pfamseq, I tried to make a script to read it and then retreive the database structure i want but, this file is enourmous and use too much memory so it crashed. -- View this message in context: http://old.nabble.com/Pfam-database-tp28650160p28650160.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From staffa at niehs.nih.gov Mon May 24 10:32:26 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Mon, 24 May 2010 10:32:26 -0400 Subject: [Bioperl-l] Restriction Enzymes Message-ID: So, back in 2007 I wrote a script using use Bio::Tools::RestrictionEnzyme; and generated some useful restriction maps for a client. This year he comes back to me with some very new enzymes that RestrictionEnzyme did not recognize. I erroneously thought that I needed an update of BioPerl, which I requested of SysAdmin. They did this across the board, there is no going back. (I did learn about the NEB file that needed to be installed) Now it appears that I must re-write my scripts because RestrictionEnzyme is not known to the latest version of bioperl. Is this true? How hard would it be to keep things backward compatible. Have I missed something here? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Enterprise-Wide Information Technology Support Contract National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From David.Messina at sbc.su.se Mon May 24 11:55:45 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 17:55:45 +0200 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <4046E576-2109-45BB-969C-F0B6F5749957@sbc.su.se> Hi Nick, Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that module, see also: http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > How hard would it be to keep things backward compatible. > Have I missed something here? I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones are intended to be at least partially backwards compatible. Dave From cjfields at illinois.edu Mon May 24 11:58:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 10:58:11 -0500 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > So, back in 2007 I wrote a script using > > use Bio::Tools::RestrictionEnzyme; > > and generated some useful restriction maps for a client. > > This year he comes back to me with some very new enzymes > that RestrictionEnzyme did not recognize. I erroneously thought that I > needed an update of BioPerl, which I requested of SysAdmin. > They did this across the board, there is no going back. > (I did learn about the NEB file that needed to be installed) > > Now it appears that I must re-write my scripts because RestrictionEnzyme is > not known to the latest version of bioperl. Is this true? > How hard would it be to keep things backward compatible. > Have I missed something here? Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, with removal at 1.6 (an announcement was made to the list regarding this, with no respondents, prior to the 1.6.0 release). The live version of the DEPRECATED docs are here: http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED If I understand correctly, the main reason was most development was put into Bio::Restriction modules, with very little change occurring in Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme and call it via a 'use lib' directive (or local::lib) or package it with your script, it should still work. However, from my perspective, if the older module wasn't recognizing specific enzyme cut sites, and the supported one did, wouldn't it be easier to modify your script to use the newer supported one instead? If the supported Bio::Restriction modules don't recognize the new sites I would consider that a bug. > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Enterprise-Wide Information Technology Support Contract > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina chris From maj at fortinbras.us Mon May 24 12:21:03 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 May 2010 12:21:03 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <13392E899AB04A0E8F66336CDBE417BE@NewLife> The rewrite this summer of Bio::Restriction made several funky enzyme (non-pal, non-symmetric) types workable. I would think it wouldn't be too onerous to convert code to the new system and have it work rather quickly- MAJ ----- Original Message ----- From: "Chris Fields" To: "Staffa, Nick (NIH/NIEHS) [C]" Cc: "Bioperl-l" Sent: Monday, May 24, 2010 11:58 AM Subject: Re: [Bioperl-l] Restriction Enzymes > On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > >> So, back in 2007 I wrote a script using >> >> use Bio::Tools::RestrictionEnzyme; >> >> and generated some useful restriction maps for a client. >> >> This year he comes back to me with some very new enzymes >> that RestrictionEnzyme did not recognize. I erroneously thought that I >> needed an update of BioPerl, which I requested of SysAdmin. >> They did this across the board, there is no going back. >> (I did learn about the NEB file that needed to be installed) >> >> Now it appears that I must re-write my scripts because RestrictionEnzyme is >> not known to the latest version of bioperl. Is this true? >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, > with removal at 1.6 (an announcement was made to the list regarding this, with > no respondents, prior to the 1.6.0 release). The live version of the > DEPRECATED docs are here: > > http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED > > If I understand correctly, the main reason was most development was put into > Bio::Restriction modules, with very little change occurring in > Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older > BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme > and call it via a 'use lib' directive (or local::lib) or package it with your > script, it should still work. > > However, from my perspective, if the older module wasn't recognizing specific > enzyme cut sites, and the supported one did, wouldn't it be easier to modify > your script to use the newer supported one instead? If the supported > Bio::Restriction modules don't recognize the new sites I would consider that a > bug. > >> Nick Staffa >> Telephone: 919-316-4569 (NIEHS: 6-4569) >> Scientific Computing Support Group >> NIEHS Enterprise-Wide Information Technology Support Contract >> National Institute of Environmental Health Sciences >> National Institutes of Health >> Research Triangle Park, North Carolina > > > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Mon May 24 12:54:29 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 24 May 2010 09:54:29 -0700 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] Message-ID: <4BFAAF45.4090400@cornell.edu> -------- Original Message -------- Subject: Re: [perl #75252] POD rendering question/problem (was [Fwd: [Bioperl-l] What is CPAN doing?]) Date: Mon, 24 May 2010 08:33:35 -0700 From: Graham Barr via RT Reply-To: search-rt at cpan.org To: rmb32 at cornell.edu References: <4BF700DE.8040804 at cornell.edu> <3F316B7B-DBCC-4668-94E4-45471ED5ACBB at pobox.com> On May 21, 2010, at 4:54 PM, Robert Buels via RT wrote: > > [1] The source and POD in question: > http://search.cpan.org/src/CJFIELDS/BioPerl-1.6.1/Bio/PrimarySeqI.pm > > [2] The HTML in question: > http://search.cpan.org/~cjfields/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm#translate that HTML is not for the above POD, it is located at http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/PrimarySeqI.pm the issue seems to be that when displaying the POD from the examples directory the source link is linking to the real module the html shown in [2] is representative of http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm IMO it is confusing to include 2 different copies of the same module. I would suggest adding to META.yml no_index: dir: - examples/root/lib Graham. From staffa at niehs.nih.gov Mon May 24 14:32:54 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Mon, 24 May 2010 14:32:54 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: <13392E899AB04A0E8F66336CDBE417BE@NewLife> Message-ID: Thanks, all. On 5/24/10 12:21 PM, "Mark A. Jensen" wrote: The rewrite this summer of Bio::Restriction made several funky enzyme (non-pal, non-symmetric) types workable. I would think it wouldn't be too onerous to convert code to the new system and have it work rather quickly- MAJ ----- Original Message ----- From: "Chris Fields" To: "Staffa, Nick (NIH/NIEHS) [C]" Cc: "Bioperl-l" Sent: Monday, May 24, 2010 11:58 AM Subject: Re: [Bioperl-l] Restriction Enzymes > On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > >> So, back in 2007 I wrote a script using >> >> use Bio::Tools::RestrictionEnzyme; >> >> and generated some useful restriction maps for a client. >> >> This year he comes back to me with some very new enzymes >> that RestrictionEnzyme did not recognize. I erroneously thought that I >> needed an update of BioPerl, which I requested of SysAdmin. >> They did this across the board, there is no going back. >> (I did learn about the NEB file that needed to be installed) >> >> Now it appears that I must re-write my scripts because RestrictionEnzyme is >> not known to the latest version of bioperl. Is this true? >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, > with removal at 1.6 (an announcement was made to the list regarding this, with > no respondents, prior to the 1.6.0 release). The live version of the > DEPRECATED docs are here: > > http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED > > If I understand correctly, the main reason was most development was put into > Bio::Restriction modules, with very little change occurring in > Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older > BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme > and call it via a 'use lib' directive (or local::lib) or package it with your > script, it should still work. > > However, from my perspective, if the older module wasn't recognizing specific > enzyme cut sites, and the supported one did, wouldn't it be easier to modify > your script to use the newer supported one instead? If the supported > Bio::Restriction modules don't recognize the new sites I would consider that a > bug. > >> Nick Staffa >> Telephone: 919-316-4569 (NIEHS: 6-4569) >> Scientific Computing Support Group >> NIEHS Enterprise-Wide Information Technology Support Contract >> National Institute of Environmental Health Sciences >> National Institutes of Health >> Research Triangle Park, North Carolina > > > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bbimber at gmail.com Mon May 24 15:43:07 2010 From: bbimber at gmail.com (Ben Bimber) Date: Mon, 24 May 2010 14:43:07 -0500 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: <1274729912.4373.19.camel@epistle> References: <1274729912.4373.19.camel@epistle> Message-ID: as long as the limitation is known, i dont see it as a big problem. On Mon, May 24, 2010 at 2:38 PM, Dan Kortschak wrote: > Hi Dave, > > You are right, spaces are not allowed - they are actively stripped from > filenames (the other option would be to escape or otherwise quote them - > the is certainly doable, is there enough of a call to do this?). > > You can use last_execution() to see what was attempted to be run, this > should show the filenames (and everything else) that were used in the > IPC call. > > cheers > Dan > > On Mon, 2010-05-24 at 12:00 -0400, Dave Messina wrote: >> Message: 2 >> Date: Mon, 24 May 2010 15:00:56 +0200 >> From: Dave Messina >> Subject: Re: [Bioperl-l] CommandExts and arrays >> To: Ben Bimber >> Message-ID: >> Content-Type: text/plain; charset=windows-1252 >> >> > ok, i put in that bug. >> >> Thanks. >> >> >> > why exactly does having the asterisk indicate >> > this is a bug? ?i thought the asterisk indicated that multiple >> values >> > were allowed for that argument? >> >> Ah okay, my ignorance of this module is showing. :) >> >> >> > on a related note, are we supposed to be able to pass file names >> that >> > have spaces to command exts? ?on the few cases where this came up, i >> > have never seemed to get this to work right, so i just got rid of >> the >> > spaces. >> >> Sorry, I don't know. >> >> >> Paging Mark Jensen ? have you got a moment to look into this? >> >> >> Dave > > From David.Messina at sbc.su.se Mon May 24 18:03:19 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 00:03:19 +0200 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: <4BFAAF45.4090400@cornell.edu> References: <4BFAAF45.4090400@cornell.edu> Message-ID: From: Graham Barr via RT > IMO it is confusing to include 2 different copies of the same module. I agree. It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. Dave From dan.kortschak at adelaide.edu.au Mon May 24 15:38:32 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 25 May 2010 05:08:32 +0930 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: <1274729912.4373.19.camel@epistle> Hi Dave, You are right, spaces are not allowed - they are actively stripped from filenames (the other option would be to escape or otherwise quote them - the is certainly doable, is there enough of a call to do this?). You can use last_execution() to see what was attempted to be run, this should show the filenames (and everything else) that were used in the IPC call. cheers Dan On Mon, 2010-05-24 at 12:00 -0400, Dave Messina wrote: > Message: 2 > Date: Mon, 24 May 2010 15:00:56 +0200 > From: Dave Messina > Subject: Re: [Bioperl-l] CommandExts and arrays > To: Ben Bimber > Message-ID: > Content-Type: text/plain; charset=windows-1252 > > > ok, i put in that bug. > > Thanks. > > > > why exactly does having the asterisk indicate > > this is a bug? i thought the asterisk indicated that multiple > values > > were allowed for that argument? > > Ah okay, my ignorance of this module is showing. :) > > > > on a related note, are we supposed to be able to pass file names > that > > have spaces to command exts? on the few cases where this came up, i > > have never seemed to get this to work right, so i just got rid of > the > > spaces. > > Sorry, I don't know. > > > Paging Mark Jensen ? have you got a moment to look into this? > > > Dave From Russell.Smithies at agresearch.co.nz Mon May 24 18:01:25 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 25 May 2010 10:01:25 +1200 Subject: [Bioperl-l] taxonomy nightmare Message-ID: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> We've upgraded BioPerl recently and now lots of stuff appears broken though I'm sure it's not as bad as it looks. Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm deluged with errors. AFAIK, there were no changes to Perl 5.8.8 Any help greatly appreciated!!! Thanx, Russell Smithies ----------------------------------- #! /usr/local/bin/perl use strict; use warnings; use Bio::DB::Taxonomy; use Data::Dumper; my $idx_dir = '/data/home/smithiesr/taxonomy'; my $TAXDIR = "/data/home/smithiesr/taxdump"; my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -nodesfile => $nodefile, -namesfile => $namesfile, -directory => $idx_dir, -force => 1) or die $!; my $human = $db->get_taxon(-name => 'Homo sapiens'); print Dumper $human; ----------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references are not implemented in the version of perl at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. Compilation failed in require at (eval 21) line 3. ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 STACK: Bio::DB::Taxonomy::_load_tax_module /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 STACK: Bio::DB::Taxonomy::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 STACK: taxonomyTest.pl:15 ----------------------------------------------------------- ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Mon May 24 22:17:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:17:57 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: On May 24, 2010, at 7:46 PM, Thomas Sharpton wrote: > Hi all, > > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and hmmsearch output. It appears to be fully functional and I have had a handful of users test and integrate this module. > > We decided to push this module into a standalone svn repo (bioperl-hmmer3). I am a bit confused about why the repo is empty, as I committed the code back in March and have made a few updates since then to correct bugs identified by test users. Perhaps I screwed something up during the last commit. The commit doesn't show any added files. The original code apparently is on a branch of bioperl-dev, though (think this was pointed out on IRC): http://github.com/bioperl/bioperl-dev/tree/bioperl-hmmer3 Maybe that was the mixup? > Chris, should I just add the code to the github repo? I might need a pointer on how to do this without screwing it up. I started up a new github repo for it. You would just need to let me know your github ID so I can add you to it. Then (after you are added) the instructions are here: http://github.com/bioperl/bioperl-hmmer3 > Kai, I can mail an archive of the parser your way if you're in a hurry. With some assistance from Chris et. al., I expect the code to be in the github repo by the day's end. > > Apologies for any confusion and the delayed reply - I've been on the road. > > Best, > Tom No problem. Thanks for letting us know. chris > >> On May 21, 2010 4:24 PM, "Chris Fields" wrote: >> >> To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. >> >> Relevant commit msg here: >> >> http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html >> >> perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl >> =========================================== >> dev.open-bio.org - Authorized Access Only >> =========================================== >> ... >> bioperl-hmmer3/ >> ... >> perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 >> =========================================== >> dev.open-bio.org - Authorized Access Only >> =========================================== >> perllib cjfields$ >> >> chris >> >> On May 21, 2010, at 4:56 PM, Kai Blin wrote: >> >> > Hi list, hi Thomas, >> > >> > I've just bumped into the ... >> > From cjfields at illinois.edu Mon May 24 22:20:38 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:20:38 -0500 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: On May 24, 2010, at 5:03 PM, Dave Messina wrote: > From: Graham Barr via RT >> IMO it is confusing to include 2 different copies of the same module. > > I agree. > > It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). > > In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). > > I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. > > So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. > > Dave I agree. We should either prevent indexing or remove it, unless someone can suggest it's utility. chris From thomas.sharpton at gmail.com Mon May 24 20:46:04 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Mon, 24 May 2010 17:46:04 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: Hi all, To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and hmmsearch output. It appears to be fully functional and I have had a handful of users test and integrate this module. We decided to push this module into a standalone svn repo (bioperl-hmmer3). I am a bit confused about why the repo is empty, as I committed the code back in March and have made a few updates since then to correct bugs identified by test users. Perhaps I screwed something up during the last commit. Chris, should I just add the code to the github repo? I might need a pointer on how to do this without screwing it up. Kai, I can mail an archive of the parser your way if you're in a hurry. With some assistance from Chris et. al., I expect the code to be in the github repo by the day's end. Apologies for any confusion and the delayed reply - I've been on the road. Best, Tom On May 21, 2010 4:24 PM, "Chris Fields" wrote: To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. Relevant commit msg here: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html perllib cjfields$ svn ls svn+ssh:// dev.open-bio.org/home/svn-repositories/bioperl =========================================== dev.open-bio.org - Authorized Access Only =========================================== ... bioperl-hmmer3/ ... perllib cjfields$ svn ls svn+ssh:// dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 =========================================== dev.open-bio.org - Authorized Access Only =========================================== perllib cjfields$ chris On May 21, 2010, at 4:56 PM, Kai Blin wrote: > Hi list, hi Thomas, > > I've just bumped into the ... From Russell.Smithies at agresearch.co.nz Mon May 24 22:25:41 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 25 May 2010 14:25:41 +1200 Subject: [Bioperl-l] taxonomy nightmare In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32D88D065AA@exchsth.agresearch.co.nz> Fixed I think, some file permissions got screwed somewhere ;-( --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > Sent: Tuesday, 25 May 2010 10:01 a.m. > To: 'bioperl-l' > Subject: [Bioperl-l] taxonomy nightmare > > We've upgraded BioPerl recently and now lots of stuff appears broken > though I'm sure it's not as bad as it looks. > Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm > deluged with errors. > AFAIK, there were no changes to Perl 5.8.8 > > Any help greatly appreciated!!! > > Thanx, > > Russell Smithies > > ----------------------------------- > #! /usr/local/bin/perl > > use strict; > use warnings; > use Bio::DB::Taxonomy; > use Data::Dumper; > > my $idx_dir = '/data/home/smithiesr/taxonomy'; > my $TAXDIR = "/data/home/smithiesr/taxdump"; > > my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); > > my $db = new Bio::DB::Taxonomy(-source => 'flatfile', > -nodesfile => $nodefile, > -namesfile => $namesfile, > -directory => $idx_dir, > -force => 1) or die $!; > > my $human = $db->get_taxon(-name => 'Homo sapiens'); > print Dumper $human; > > ----------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references > are not implemented in the version of perl at > /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. > Compilation failed in require at (eval 21) line 3. > ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. > > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::Root::Root::_load_module > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 > STACK: Bio::DB::Taxonomy::_load_tax_module > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 > STACK: Bio::DB::Taxonomy::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 > STACK: taxonomyTest.pl:15 > ----------------------------------------------------------- > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dimitark at bii.a-star.edu.sg Mon May 24 22:28:19 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 25 May 2010 10:28:19 +0800 Subject: [Bioperl-l] about gene names Message-ID: <4BFB35C3.4010808@bii.a-star.edu.sg> Hi guys, i have a question How can I get only the gene names from NCBI Gene when i have the sequence id? For example with this id - NP_005264.2 i can search NCBI Gene online but i want to get only the gene name automatically. I was checking the Bio::DB::EntrezGene module but it didnt became clear to me if i can use it for my purposes. Thank you in advance. Greetings Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From David.Messina at sbc.su.se Mon May 24 18:23:32 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 00:23:32 +0200 Subject: [Bioperl-l] Pfam database In-Reply-To: <28650160.post@talk.nabble.com> References: <28650160.post@talk.nabble.com> Message-ID: Hi, The release notes for the latest Pfam (24.0) do mention file format changes, but I could not find documentation describing those changes. Your questions relating to that would best be answered by the people at Pfam. You can contact them here: pfam-help at sanger.ac.uk However, please do report back to us what you learn. It's quite likely our code is not compatible with Pfam 24.0, and we would need that information to fix it. Thanks, Dave On May 23, 2010, at 5:57 PM, NamNAme wrote: > > Dear all, > A few weeks ago I wrote a program that need the pfam database, and I tested > it on the first version of pfam where each protein family sequences are in > one file. > But now I would like to test it on the last version of pfam but the > organization changed. > I've found a file called Pfam-A.fasta which contains sequences and the > family they belong to. But the sequences inside are not complete. > So, I've two questions : Why these sequences are not complete ? > And, How can I find a file with complete sequences and the family they > belong to ? > Thank you for your help. > Bye. > P-S : There is the file pfamseq, I tried to make a script to read it and > then retreive the database structure i want but, this file is enourmous and > use too much memory so it crashed. > -- > View this message in context: http://old.nabble.com/Pfam-database-tp28650160p28650160.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 24 22:54:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:54:03 -0500 Subject: [Bioperl-l] taxonomy nightmare In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> Message-ID: You may have a version of perl that either doesn't include Scalar::Util or includes a broken version. Try installing Scalar::Util from CPAN to see if it fixes the problem. Here's a link on the problem: http://www.perlmonks.org/?node_id=424737 chris On May 24, 2010, at 5:01 PM, Smithies, Russell wrote: > We've upgraded BioPerl recently and now lots of stuff appears broken though I'm sure it's not as bad as it looks. > Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm deluged with errors. > AFAIK, there were no changes to Perl 5.8.8 > > Any help greatly appreciated!!! > > Thanx, > > Russell Smithies > > ----------------------------------- > #! /usr/local/bin/perl > > use strict; > use warnings; > use Bio::DB::Taxonomy; > use Data::Dumper; > > my $idx_dir = '/data/home/smithiesr/taxonomy'; > my $TAXDIR = "/data/home/smithiesr/taxdump"; > > my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); > > my $db = new Bio::DB::Taxonomy(-source => 'flatfile', > -nodesfile => $nodefile, > -namesfile => $namesfile, > -directory => $idx_dir, > -force => 1) or die $!; > > my $human = $db->get_taxon(-name => 'Homo sapiens'); > print Dumper $human; > > ----------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references are not implemented in the version of perl at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. > Compilation failed in require at (eval 21) line 3. > ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. > Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 > STACK: Bio::DB::Taxonomy::_load_tax_module /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 > STACK: Bio::DB::Taxonomy::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 > STACK: taxonomyTest.pl:15 > ----------------------------------------------------------- > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Tue May 25 01:58:27 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 07:58:27 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: <1274767107.2271.11.camel@gonzo.home.kblin.org> On Mon, 2010-05-24 at 17:46 -0700, Thomas Sharpton wrote: > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and > hmmsearch output. It appears to be fully functional and I have had a handful > of users test and integrate this module. That's pretty much what I need. Thanks to the folks on IRC, I got pointed at the correct repository yesterday evening. > Kai, I can mail an archive of the parser your way if you're in a hurry. With > some assistance from Chris et. al., I expect the code to be in the github > repo by the day's end. No worries, that's fine. I've got a checkout of the standalone repository that I can play with now. Is there any particular reason you decided to create a new parser instead of integrating the code into the existing hmmer.pm module? I haven't looked at how the hmmer2 hmmsearch output looks compared to the hmmer3 version and if there's any conflicts. Cheers, Kai PS: Tom, sorry for the repost, forgot to CC the list. Pre-coffee email sending, it never works. -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From dan.kortschak at adelaide.edu.au Tue May 25 02:12:27 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 25 May 2010 15:42:27 +0930 Subject: [Bioperl-l] Bioperl-l Digest, Vol 85, Issue 34 In-Reply-To: References: Message-ID: <1274767947.32025.49.camel@zoidberg.mbs.adelaide.edu.au> Dimitar, Try having a look through the EUtilities cookbook: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook cheers Dan On Tue, 2010-05-25 at 01:58 -0400, Dimitar Kenanov wrote: > Date: Tue, 25 May 2010 10:28:19 +0800 > From: Dimitar Kenanov > Subject: [Bioperl-l] about gene names > To: "'bioperl-l at bioperl.org'" > Message-ID: <4BFB35C3.4010808 at bii.a-star.edu.sg> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi guys, > i have a question How can I get only the gene names from NCBI Gene > when > i have the sequence id? For example with this id - NP_005264.2 i can > search NCBI Gene online but i want to get only the gene name > automatically. I was checking the Bio::DB::EntrezGene module but it > didnt became clear to me if i can use it for my purposes. > > Thank you in advance. > > Greetings > Dimitar > From kai.blin at biotech.uni-tuebingen.de Tue May 25 07:41:59 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 13:41:59 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> On Mon, 2010-05-24 at 17:46 -0700, Thomas Sharpton wrote: Hi Tom, > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and > hmmsearch output. It appears to be fully functional and I have had a handful > of users test and integrate this module. I've tried using the hmmer3 parser for my script, but it seems like the hmm_name member of the result object isn't set, and I'm using that. I saw this before when trying to write a test case that integrates into the Bioperl test framework. (Error output is Can't locate object method "hmm_name" via package "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, line 152.) I'm happy to work on this a bit myself if you're not working on this anyway, so we don't duplicate efforts. I just don't get why the hmm_name isn't picked up correctly, and I haven't been able to figure out how to get at the output that $self->debug() when running the tests. Oh well, it's a learning experience in any case. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Tue May 25 08:37:47 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 14:37:47 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> Message-ID: <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> On Tue, 2010-05-25 at 13:41 +0200, Kai Blin wrote: Whined a little too early. > I've tried using the hmmer3 parser for my script, but it seems like the > hmm_name member of the result object isn't set, and I'm using that. > > I saw this before when trying to write a test case that integrates into > the Bioperl test framework. > (Error output is Can't locate object method "hmm_name" via package > "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, > line 152.) I just found the stuff I needed to add to the hmmer3Result.pm file. I'm currently busy adding a comprehensive test case for this module that integrates into the bioperl test harness. What's the best way to publish my additions? Do I create a fork of bioperl-live on Github or how is this handled? Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Tue May 25 08:46:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 07:46:48 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> Message-ID: On May 25, 2010, at 7:37 AM, Kai Blin wrote: > On Tue, 2010-05-25 at 13:41 +0200, Kai Blin wrote: > > Whined a little too early. > >> I've tried using the hmmer3 parser for my script, but it seems like the >> hmm_name member of the result object isn't set, and I'm using that. >> >> I saw this before when trying to write a test case that integrates into >> the Bioperl test framework. >> (Error output is Can't locate object method "hmm_name" via package >> "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, >> line 152.) > > I just found the stuff I needed to add to the hmmer3Result.pm file. I'm > currently busy adding a comprehensive test case for this module that > integrates into the bioperl test harness. > > What's the best way to publish my additions? Do I create a fork of > bioperl-live on Github or how is this handled? Create a fork of the proper repository, which will eventually be bioperl-hmmer3. However, Thomas hasn't added that code in yet; not sure how much has changed since the original deposition into bioperl-dev in March, but it's possible more has been done. chris > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin > Abteilung Mikrobiologie/Biotechnologie > Eberhard-Karls-Universit?t T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > From dueldor at yahoo.com Tue May 25 08:30:59 2010 From: dueldor at yahoo.com (Dubi Eldor) Date: Tue, 25 May 2010 05:30:59 -0700 (PDT) Subject: [Bioperl-l] How to find secondary structures Message-ID: <766825.32163.qm@web37308.mail.mud.yahoo.com> Hi, I am a new user of BioPerl. I would like to find secondary sturctures in sequences of ~10K nt long. Are there any functions that can help me? Thanks, Dubi From David.Messina at sbc.su.se Tue May 25 09:58:38 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 15:58:38 +0200 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <3065CE83-3E61-4080-B475-F609E74A9FD4@sbc.su.se> On May 25, 2010, at 15:54, Staffa, Nick (NIH/NIEHS) [C] wrote: > The tutorial, I discovered, has an error. > a very bad experience for a trusting newby. > whereas the tutorial has these bold examples in the first box under > Identifying restriction enzyme sites (Bio::Restriction) > > use Bio::Restriction::EnzymeCollection; > my $all_collection = Bio::Restriction::EnzymeCollection; > > This is the form of the statement that seems to work: > my $all_collection = Bio::Restriction::EnzymeCollection->new(); Thanks, fixed. From bosborne11 at verizon.net Tue May 25 09:04:01 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 25 May 2010 09:04:01 -0400 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: Dave, I looked at the scripts, and like you I concluded they didn't use that local Bio/ directory. Then I ran then with and without that Bio/ directory, same results. So I removed that local Bio/ directory. Rob, does some additional action need to be taken by Chris, or some other Bioperl maintainer, at CPAN/PAUSE? Brian O. On May 24, 2010, at 6:03 PM, Dave Messina wrote: > From: Graham Barr via RT >> IMO it is confusing to include 2 different copies of the same module. > > I agree. > > It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). > > In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). > > I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. > > So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Tue May 25 09:54:17 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Tue, 25 May 2010 09:54:17 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: <4046E576-2109-45BB-969C-F0B6F5749957@sbc.su.se> Message-ID: The tutorial, I discovered, has an error. a very bad experience for a trusting newby. whereas the tutorial has these bold examples in the first box under Identifying restriction enzyme sites (Bio::Restriction) use Bio::Restriction::EnzymeCollection; my $all_collection = Bio::Restriction::EnzymeCollection; This is the form of the statement that seems to work: my $all_collection = Bio::Restriction::EnzymeCollection->new(); All the other stuff necessary for my purpose of getting fragment lengths is there and seems to work if the $enzyme database has the enzyme under the name you enter. Updating the database with the file from NEB seems to be up to the user or his sysadmin. On 5/24/10 11:55 AM, "Dave Messina" wrote: Hi Nick, Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that module, see also: http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > How hard would it be to keep things backward compatible. > Have I missed something here? I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones are intended to be at least partially backwards compatible. Dave From cjfields at illinois.edu Tue May 25 10:30:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 09:30:09 -0500 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> I have added a 'no_index' to that specific directory in Build.PL, suppose we can change that back if there is no purpose to it (though it might come in handy with spots we don't need to be indexed). chris On May 25, 2010, at 8:04 AM, Brian Osborne wrote: > Dave, > > I looked at the scripts, and like you I concluded they didn't use that local Bio/ directory. Then I ran then with and without that Bio/ directory, same results. So I removed that local Bio/ directory. > > Rob, does some additional action need to be taken by Chris, or some other Bioperl maintainer, at CPAN/PAUSE? > > Brian O. > > On May 24, 2010, at 6:03 PM, Dave Messina wrote: > >> From: Graham Barr via RT >>> IMO it is confusing to include 2 different copies of the same module. >> >> I agree. >> >> It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). >> >> In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). >> >> I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. >> >> So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Tue May 25 10:51:02 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Tue, 25 May 2010 10:51:02 -0400 Subject: [Bioperl-l] New Restriction Analysis Message-ID: I have tried both these methods for getting new enzyme info into the system: use Bio::Restriction::IO; my $re_io = Bio::Restriction::IO->new(-file => $file, -format=>'withrefm'); my $rebase_collection = $re_io->read; A REBASE file in the correct format can be found at ftp://ftp.neb.com/pub/rebase - it will have a name like "withrefm.308". If need be you can also create new enzymes, like this: my $re = new Bio::Restriction::Enzyme(-enzyme => 'BioRI', -seq => 'GG^AATTCC'); But the BioPerl sends an error without informing me which of my statements caused it: Using first the withreftm.005 file from rebase and then these statements (not both at the same time): my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'SgrDI', -seq => 'CG^TCGACG'); Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.8/Bio/Restriction/Analysis.pm line 529. This works: my $pattern = $enzyme->site; print "pattern = $pattern\n"; which would lead me to believe there is nothing wrong with my enzyme. Could there be a problem if there were no cuts? That must be it, because putting info for EcoRI in instead of SgrDI, the program works: [Not the whole program, but only the bioPerl stuff. my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'EcoRI', -seq => 'G^AATTC'); use Bio::Restriction::Analysis; my $pattern = $enzyme->site; print "pattern = $pattern\n"; my $db = Bio::DB::Fasta->new("/uoldhome/estaffa/westmoreland/$filename", -makeid => \&make_my_id); my $obj = $db->get_Seq_by_id("$sequenceID"); #Sequence Object my $analysis = Bio::Restriction::Analysis->new(-seq => $obj); my @strings = $analysis->fragments($enzyme); What to do? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Enterprise-Wide Information Technology Support Contract National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From maj at fortinbras.us Tue May 25 12:20:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:20:41 -0400 Subject: [Bioperl-l] How to find secondary structures In-Reply-To: <766825.32163.qm@web37308.mail.mud.yahoo.com> References: <766825.32163.qm@web37308.mail.mud.yahoo.com> Message-ID: <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> Sounds like a job for infernal and it's Bioperl wrapper (in Bio::Tools::Run); right Chris? MAJ ----- Original Message ----- From: "Dubi Eldor" To: Sent: Tuesday, May 25, 2010 8:30 AM Subject: [Bioperl-l] How to find secondary structures > Hi, > > I am a new user of BioPerl. > I would like to find secondary sturctures in sequences of ~10K nt long. > Are there any functions that can help me? > > Thanks, > Dubi > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue May 25 12:19:42 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:19:42 -0400 Subject: [Bioperl-l] New Restriction Analysis In-Reply-To: References: Message-ID: Hi Nick, You're right, as far as I can tell; the offending line is @cut_positions=@{$self->{'_cut_positions'}->{$enz}}; so $self->{_cut_positions}->{$enz} must be null. I would say this is a bug; if you can put what you've reported below in a bug report at http://bugzilla.bioperl.org, that would be great. A workaround would be to check whether you have cuts first before calling the method; but that may be impossible, in which case a truly awful kludge would be to append a recognized site at the end of your sequences. Just till we can get to the fix. cheers Mark ----- Original Message ----- From: "Staffa, Nick (NIH/NIEHS) [C]" To: "Bioperl-l" Sent: Tuesday, May 25, 2010 10:51 AM Subject: [Bioperl-l] New Restriction Analysis >I have tried both these methods for getting new enzyme info into the system: > > use Bio::Restriction::IO; > my $re_io = Bio::Restriction::IO->new(-file => $file, > -format=>'withrefm'); > my $rebase_collection = $re_io->read; > A REBASE file in the correct format can be found at > ftp://ftp.neb.com/pub/rebase - it will have a name like "withrefm.308". If > need be you can also create new enzymes, like this: > my $re = new Bio::Restriction::Enzyme(-enzyme => 'BioRI', > -seq => 'GG^AATTCC'); > But the BioPerl sends an error without informing me which of my statements > caused it: > > Using first the withreftm.005 file from rebase and then these statements (not > both at the same time): > my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'SgrDI', > -seq => 'CG^TCGACG'); > > > Can't use an undefined value as an ARRAY reference at > /usr/lib/perl5/site_perl/5.8.8/Bio/Restriction/Analysis.pm line 529. > > This works: > my $pattern = $enzyme->site; > print "pattern = $pattern\n"; > which would lead me to believe there is nothing wrong with my enzyme. > Could there be a problem if there were no cuts? > That must be it, because putting info for EcoRI in instead of SgrDI, the > program works: > > [Not the whole program, but only the bioPerl stuff. > my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'EcoRI', > -seq => 'G^AATTC'); > use Bio::Restriction::Analysis; > my $pattern = $enzyme->site; > print "pattern = $pattern\n"; > my $db = Bio::DB::Fasta->new("/uoldhome/estaffa/westmoreland/$filename", > -makeid => \&make_my_id); > my $obj = $db->get_Seq_by_id("$sequenceID"); #Sequence Object > my $analysis = Bio::Restriction::Analysis->new(-seq => $obj); > my @strings = $analysis->fragments($enzyme); > > What to do? > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Enterprise-Wide Information Technology Support Contract > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue May 25 12:38:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 11:38:12 -0500 Subject: [Bioperl-l] How to find secondary structures In-Reply-To: <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> References: <766825.32163.qm@web37308.mail.mud.yahoo.com> <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> Message-ID: <2B6207D9-7221-4949-A7EE-EE6ED54EFF7B@illinois.edu> Yes, that would look for Rfam-based conserved structures. Should work for the latest infernal release, but let me know if you run into problems. Should also look at ERPIN and RNAMotif (both have similar BioPerl wrappers). chris On May 25, 2010, at 11:20 AM, Mark A. Jensen wrote: > Sounds like a job for infernal and it's Bioperl wrapper (in Bio::Tools::Run); right Chris? > MAJ > ----- Original Message ----- From: "Dubi Eldor" > To: > Sent: Tuesday, May 25, 2010 8:30 AM > Subject: [Bioperl-l] How to find secondary structures > > >> Hi, >> >> I am a new user of BioPerl. >> I would like to find secondary sturctures in sequences of ~10K nt long. >> Are there any functions that can help me? >> >> Thanks, >> Dubi >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue May 25 12:43:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:43:41 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <8EE661A4491C4A0FAD9875CF790F8164@NewLife> Thanks for the headsup on that-- we can fix. The refm file should be downloaded relatively transparently by the class directly... MAJ ----- Original Message ----- From: "Staffa, Nick (NIH/NIEHS) [C]" To: "Dave Messina" ; "Chris Fields" ; "Mark A. Jensen" Cc: "Bioperl-l" Sent: Tuesday, May 25, 2010 9:54 AM Subject: Re: [Bioperl-l] Restriction Enzymes > The tutorial, I discovered, has an error. > a very bad experience for a trusting newby. > whereas the tutorial has these bold examples in the first box under > Identifying restriction enzyme sites (Bio::Restriction) > > use Bio::Restriction::EnzymeCollection; > my $all_collection = Bio::Restriction::EnzymeCollection; > > This is the form of the statement that seems to work: > my $all_collection = Bio::Restriction::EnzymeCollection->new(); > > All the other stuff necessary for my purpose of getting fragment lengths is > there and seems to work > if the $enzyme database has the enzyme under the name you enter. > Updating the database with the file from NEB seems to be up to the user or his > sysadmin. > > > On 5/24/10 11:55 AM, "Dave Messina" wrote: > > Hi Nick, > > Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that > module, see also: > > http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > > >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme > was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones > are intended to be at least partially backwards compatible. > > > Dave > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue May 25 13:14:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 13:14:24 -0400 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: <409221E1D1E947108DEDBB5F34E1EBB7@NewLife> Don't think you want 'no strict'; the error's saying something about syntax to you. In the snippet, I see a missing opening single quote for output_file.bam. The asterisk means "expect an array ref", so that's ok. ----- Original Message ----- From: "Ben Bimber" To: "bioperl-l" Sent: Friday, May 21, 2010 9:58 AM Subject: [Bioperl-l] CommandExts and arrays >I am getting an error when trying to pass an array as a param with > command exts. I hope there is something obvious i'm missing, but I > cant seem to figure this out. > > I am trying to run the merge two BAM files using > Bio::Tools::Run::Samtools using something like this: > > my $new_bam = Bio::Tools::Run::Samtools->new( > -command => 'merge', > -program_dir => '/usr/bin/samtools/', > )->run( > -obm => output_file.bam', > -ibm => ['file1.bam', 'file2.bam'], > ); > > When i use an array for the -ibm param, I get an error saying 'cannot > use string 'file1' as an arrayref while strict refs in place'. The > error comes from this code in CommandExts.pm, around line 989. adding > 'no strict' right before the final line stops the error: > > # expand arrayrefs > my $l = $#files; > for (0..$l) { > if (ref($files[$_]) eq 'ARRAY') { > splice(@files, $_, 1, @{$files[$_]}); > #error thrown from this line > splice(@switches, $_, 1, ($switches[$_]) x @{$files[$_]}); > } > > > Thanks for the help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From thomas.sharpton at gmail.com Tue May 25 14:33:06 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 25 May 2010 11:33:06 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274767107.2271.11.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> Message-ID: <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> Hi Kai, I've just pushed the code to github, which you can find here: http://github.com/bioperl/bioperl-hmmer3 Please use this updated code before making any significant changes - I think I may have already fixed the bug you brought up earlier (but maybe not?). Do let me know if you have any problems getting ahold of this data or if you find any bugs in the code I'd deposited. Still getting my head wrapped around github. > No worries, that's fine. I've got a checkout of the standalone > repository that I can play with now. Is there any particular reason > you > decided to create a new parser instead of integrating the code into > the > existing hmmer.pm module? I haven't looked at how the hmmer2 hmmsearch > output looks compared to the hmmer3 version and if there's any > conflicts. Trying to integrate hmmer3 into the old hmmer searchIO module was the original idea. But after talking to some of the BioPerl gurus and considering the inherent differences between hmmer3 and hmmer2 (at least during beta, though there are still some major output report differences in the live release), we decided as separate module would be ideal. I don't want to speak out of turn, but it sounds like this might be one of the ways that the bioperl project is expanded in the future without overbloating bioperl-live. In theory, we can extend Bio::Run into this module as well in the future, such that bioperl- hmmer3 has a SearchIO path in addition to a Run path. I don't know what the more experienced developers currently think about this idea. This is an obvious statement, but I feel it's important to be clear on these matters - you should feel free to make any and all contributions to the development of this module as you see fit. BioPerl has been wonderful to me and I started this module to give a little back, but this remains community generated software. FYI - I have a fix that I'm working on to handle the secondary structure track in the alignment report, so if you're particularly interested in that data, give me a bit and I'll have it up and running. All the best, Tom From David.Messina at sbc.su.se Tue May 25 14:52:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 20:52:29 +0200 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> References: <4BFAAF45.4090400@cornell.edu> <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> Message-ID: <704A3AD7-BF8E-4C52-A3C5-D402B59BFD66@sbc.su.se> On May 25, 2010, at 4:30 PM, Chris Fields wrote: > I have added a 'no_index' to that specific directory in Build.PL, suppose we can change that back if there is no purpose to it (though it might come in handy with spots we don't need to be indexed). Good idea ? it's bound to come up at some point. On May 25, 2010, at 3:04 PM, Brian Osborne wrote: > So I removed that local Bio/ directory. Great, thanks Brian! Dave From hlapp at gmx.net Tue May 25 17:10:42 2010 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 May 2010 15:10:42 -0600 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> Message-ID: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> I'm a little concerned that this discussion is disconnected from the list and so misses a lot of possible input. Are we moving our development discussion to IRC or github commit comments? Regarding $feature->seq(), the API documentation expressly states that the return type is Bio::PrimarySeqI, as it does for $feature- >entire_seq(). The original rationale for that was to avoid circular references. Bio::SeqI objects contain references to attached features, which in turn contain a reference to the seq object they are attached to. A Bio::SeqI object holds the basic sequence properties (everything except annotation and feature objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a reference to, not the containing Bio::SeqI object. It's possible that S::U::weaken() can solve the circular reference problem, but this fact should be tested. I.e., attach a feature with a SeqI-reference to a SeqI, dispose the SeqI, and then test that the feature has lost the reference to the SeqI too. This still leaves the issue though that then you have a SeqFeatureI object with a dangling reference to a sequence object. If you have those SeqFeatureI objects stored in a feature store, this may wreak havoc. I'd like to see convincing arguments that it doesn't. Bottom line - just forking on git and committing a change isn't a substitute for bringing up an issue and possible solutions on the list, and the vetting of pull requests can fall upon only one or two core developers. Two eyeballs often spot a lot less than a hundred. -hilmar On May 25, 2010, at 2:02 PM, GitHub wrote: > Ah, but my link's old, forget it. This one is better: http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html > > From: cjfields > View this commit online: http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From kai.blin at biotech.uni-tuebingen.de Tue May 25 17:50:29 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 23:50:29 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> Message-ID: <1274824229.2271.60.camel@gonzo.home.kblin.org> On Tue, 2010-05-25 at 11:33 -0700, Thomas Sharpton wrote: Hi Thomas, > http://github.com/bioperl/bioperl-hmmer3 > > Please use this updated code before making any significant changes - I > think I may have already fixed the bug you brought up earlier (but > maybe not?). Do let me know if you have any problems getting ahold of > this data or if you find any bugs in the code I'd deposited. Still > getting my head wrapped around github. I've seen the repo, and forked from it already to push my changes. Some of the folks from IRC gave me write access and Chris Fields actually pushed my changes. Most notable about the changes is probably a bit hidden by the noise, but I've changed the Hit->raw_score to contain the overall score, not the "best domain" score. > Trying to integrate hmmer3 into the old hmmer searchIO module was the > original idea. But after talking to some of the BioPerl gurus and > considering the inherent differences between hmmer3 and hmmer2 (at > least during beta, though there are still some major output report > differences in the live release), we decided as separate module would > be ideal. Some of the folks on IRC suggested that we might want to integrate the hmmer.pm parser as well, modularizing this a bit and loading the correct parser depending on the requested format. > This is an obvious statement, but I feel it's important to be clear on > these matters - you should feel free to make any and all contributions > to the development of this module as you see fit. BioPerl has been > wonderful to me and I started this module to give a little back, but > this remains community generated software. I'm planning on adding even more tests, but the basic features for hmmscan parsing seem to be there. I'm currently running an extensive test run on real genome data, hopefully I can see the results of that in a couple of days. Cheers, and thanks for the help, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Tue May 25 17:55:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 16:55:53 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: I agree, but we spotted this from IRC, then added the comments on that merge. Dave also spotted my original code comments (which appeared in the fork queue, and which echo the very same concerns you have) after the commit as well, and managed to revert it. So, with forked where it appears further discussion is warranted (like this), we should bring it to the main list (and IRC, if anyone happens to be there) for discussion. Sounds good to me. For those on list, here are Adam's and my comments on this (linked here: http://github.com/adsj/bioperl-live/commit/24ec961b217084e248f4fdbd174aadace1a27ac4#comments): adsj: "Hi Chris, thanks for the comment. The reason is this: I have a class, MyApp::Seq, which ISA Bio::Seq::RichSeq and adds some extra methods I use in the application. When I call ->seq() on a feature from one of my MyApp::Seq objects, I want to get a MyApp::Seq object back (because of the extra methods). Am I making sense? I have been running with this patch since at least 1.5.2, so it has been a while since I digged into it. Maybe there is a cleaner solution. I am not sure what your comment about changing the API means - I think it is quite reasonable/natural that MyApp::Seq->get_Features"->seq" returns MyApp::Seq objects?" My response: "Calling seq() on a feature should return a truncation of whatever your Bio::SeqFeatureI does (it normally calls trunc(start, end) on it's attached sequence). For Bio::Seq it's normally returning a simple Bio::PrimarySeq, not a Bio::Seq, b/c that is what is attached to the Feature. This is why we don't need GC. There are no circular refs: Bio::Seq has-a PrimarySeq and has-a Features (via FeatureHolderI), each Feature has the same PrimarySeq as the parent Bio::Seq. It's hard to know if there is a workaround w/o knowing what you are asking for (e.g. what MyApp::Seq does), but you can certainly override the default methods to DTRT for your specific case. For instance, redefine add_SeqFeature() for your class to attach self as you have above for Bio::Seq. In this case, we should patch SeqFeature::Generic to use weaken() as you show above just in case this is needed by others, but maybe in the context of (pseudocode) 'weaken if $seq to be attached is-a Bio::SeqI', and not hammered down to check the very specific 'Bio::PrimarySeq'. Anyway, this is what I mean by changing the default API, which is what the above Bio::Seq change does. This would change the context of what is currently being returned (self, instead of a simpler contained Bio::PrimarySeqI). Also, anything gained by abstracting the raw seq handling of Feature data by linking to PrimarySeq is lost when you link to the parent, thus always requiring GC and weaken() (which is notoriously flaky dep. on context)." chris On May 25, 2010, at 4:10 PM, Hilmar Lapp wrote: > I'm a little concerned that this discussion is disconnected from the list and so misses a lot of possible input. Are we moving our development discussion to IRC or github commit comments? > > Regarding $feature->seq(), the API documentation expressly states that the return type is Bio::PrimarySeqI, as it does for $feature->entire_seq(). > > The original rationale for that was to avoid circular references. Bio::SeqI objects contain references to attached features, which in turn contain a reference to the seq object they are attached to. A Bio::SeqI object holds the basic sequence properties (everything except annotation and feature objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a reference to, not the containing Bio::SeqI object. > > It's possible that S::U::weaken() can solve the circular reference problem, but this fact should be tested. I.e., attach a feature with a SeqI-reference to a SeqI, dispose the SeqI, and then test that the feature has lost the reference to the SeqI too. > > This still leaves the issue though that then you have a SeqFeatureI object with a dangling reference to a sequence object. If you have those SeqFeatureI objects stored in a feature store, this may wreak havoc. I'd like to see convincing arguments that it doesn't. > > Bottom line - just forking on git and committing a change isn't a substitute for bringing up an issue and possible solutions on the list, and the vetting of pull requests can fall upon only one or two core developers. Two eyeballs often spot a lot less than a hundred. > > -hilmar > > On May 25, 2010, at 2:02 PM, GitHub wrote: > >> Ah, but my link's old, forget it. This one is better: http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html >> >> From: cjfields >> View this commit online: http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From thomas.sharpton at gmail.com Tue May 25 18:29:38 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 25 May 2010 15:29:38 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274824229.2271.60.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: Thanks for the contributions, Kai. > I've seen the repo, and forked from it already to push my changes. > Some > of the folks from IRC gave me write access and Chris Fields actually > pushed my changes. Just saw this. Thanks for doing that, Chris. > Most notable about the changes is probably a bit hidden by the noise, > but I've changed the Hit->raw_score to contain the overall score, not > the "best domain" score. So this brings up an interesting point. At some point, we'll have to build out a few additional SearchIO methods to incorporate some of the additional information encoded in the HMMER v3 reports. Sean talks a bit in the user manual about the importance of looking at both the full sequence and the best domain (see page 18 in the manual linked to on this page http://hmmer.janelia.org/#documentation). For example, he mentions that one should consider the e-value of both the full sequence and best domain to ascertain if the query is homologous to a profile being considered via hmmsearch. He also mentions that looking at the full sequence report values without consideration of the best domain report values can be misleading. I'm not saying that your approach regarding Hit->raw_score is wrong - proper interpretation of the results is up to the end user and there are benefits to looking at the full sequence (again, communicated on page 18) - but we might consider how to best encode the SearchIO methods to mitigate end user confusion and mistakes. >> Trying to integrate hmmer3 into the old hmmer searchIO module was the >> original idea. But after talking to some of the BioPerl gurus and >> considering the inherent differences between hmmer3 and hmmer2 (at >> least during beta, though there are still some major output report >> differences in the live release), we decided as separate module would >> be ideal. > > Some of the folks on IRC suggested that we might want to integrate the > hmmer.pm parser as well, modularizing this a bit and loading the > correct > parser depending on the requested format. This might make sense, given that HMMER v3 is now live and seems to be adopted by researchers at an increasing rate. Since I used hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult to do, either. I think a thorough conversation on this point is warranted as others I've talked to have preferred the modules to be separate. I'd be interested to hear what other have to say on this point. >> This is an obvious statement, but I feel it's important to be clear >> on >> these matters - you should feel free to make any and all >> contributions >> to the development of this module as you see fit. BioPerl has been >> wonderful to me and I started this module to give a little back, but >> this remains community generated software. > > I'm planning on adding even more tests, but the basic features for > hmmscan parsing seem to be there. I'm currently running an extensive > test run on real genome data, hopefully I can see the results of > that in > a couple of days. Awesome! > Cheers, and thanks for the help, Likewise. T From kannabiran.nandakumar at gmail.com Tue May 25 18:30:18 2010 From: kannabiran.nandakumar at gmail.com (Kanna) Date: Tue, 25 May 2010 15:30:18 -0700 (PDT) Subject: [Bioperl-l] new to this group Message-ID: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Hi guys, I am new to this group. I work in bioinformatics and would like to contribute to the BioPerl project. I am interested in the OBO file parsing module to start with. I visited the project priority list and the page seems to have been modified around 6 months ago. If it is already completed could anyone suggest modules I can contribute to? Thanks, Kanna From David.Messina at sbc.su.se Tue May 25 18:41:27 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 00:41:27 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: On May 25, 2010, at 11:55 PM, Chris Fields wrote: > Sounds good to me. Me too, and just to clarify for everyone following along, I erroneously committed the code in question to bioperl-live master (head), reverted that commit, and moved it to a branch (http://github.com/bioperl/bioperl-live/commits/topic/adsj-seqobj-return). Dave From maj at fortinbras.us Tue May 25 21:37:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 21:37:38 -0400 Subject: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: <525D25AC2CDF42E99C1F4072B02D0C1B@NewLife> I +1 Hilmar, but note that already git is doing what it is designed to do: devolve development. My $0.02 is: that is how BioPerl will keep from becoming a dinosaur. I believe that we as a community, judging from the track of the last year or so, are committed to this evolution by devolution, and the move to git is part of that overall plan. The increase in IRC chatter, led by deafferet and rbuels, prefigured this and it was generally considered a Good Thing. So, I would propose that people (devs and users) make their views known (on list and elsewhere) about how best to communicate and have dev-oriented conversations: it may be that a listserv alone is not nimble enough. MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "BioPerl List" Sent: Tuesday, May 25, 2010 5:10 PM Subject: Re: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) > I'm a little concerned that this discussion is disconnected from the list and > so misses a lot of possible input. Are we moving our development discussion > to IRC or github commit comments? > > Regarding $feature->seq(), the API documentation expressly states that the > return type is Bio::PrimarySeqI, as it does for $feature- > >entire_seq(). > > The original rationale for that was to avoid circular references. Bio::SeqI > objects contain references to attached features, which in turn contain a > reference to the seq object they are attached to. A Bio::SeqI object holds > the basic sequence properties (everything except annotation and feature > objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a > reference to, not the containing Bio::SeqI object. > > It's possible that S::U::weaken() can solve the circular reference problem, > but this fact should be tested. I.e., attach a feature with a SeqI-reference > to a SeqI, dispose the SeqI, and then test that the feature has lost the > reference to the SeqI too. > > This still leaves the issue though that then you have a SeqFeatureI object > with a dangling reference to a sequence object. If you have those SeqFeatureI > objects stored in a feature store, this may wreak havoc. I'd like to see > convincing arguments that it doesn't. > > Bottom line - just forking on git and committing a change isn't a substitute > for bringing up an issue and possible solutions on the list, and the vetting > of pull requests can fall upon only one or two core developers. Two eyeballs > often spot a lot less than a hundred. > > -hilmar > > On May 25, 2010, at 2:02 PM, GitHub wrote: > >> Ah, but my link's old, forget it. This one is better: >> http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html >> >> From: cjfields >> View this commit online: >> http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From asjo at koldfront.dk Wed May 26 01:41:52 2010 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 26 May 2010 07:41:52 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: <87zkznb4nz.fsf@topper.koldfront.dk> On Tue, 25 May 2010 15:10:42 -0600, Hilmar wrote: > Bottom line - just forking on git and committing a change isn't a > substitute for bringing up an issue and possible solutions on the > list, and the vetting of pull requests can fall upon only one or two > core developers. Two eyeballs often spot a lot less than a hundred. Just to clarify: I specifically _didn't_ make a Pull request yet. I simply created the fork store the patch in a visible way - my intention was then to clean the patch up and make it ready for comments/discussion (I just haven't had time to do so yet). I am new to github, but as I understood the interface there, anyone is free (encouraged?) to "fork" their own clone to work in, as a kind of "public" personal workspace, and when you feel that your clone is ready to be merged, then - only then - you do a "Pull request". If that isn't the way github is supposed to be used, or that isn't the way BioPerl wants to use it, let me know and I'll adjust. I appreciate the comments so far, and will get back to this as soon as I can. Thanks, Adam -- "Sunday morning when the rain begins to fall Adam Sj?gren I believe I have seen the end of it all" asjo at koldfront.dk From David.Messina at sbc.su.se Wed May 26 05:24:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 11:24:11 +0200 Subject: [Bioperl-l] Bio::Species irritated with "unclassified sequences" In-Reply-To: <4BF59B2F.9000300@bms.com> References: <4BF59B2F.9000300@bms.com> Message-ID: <50665C57-007D-49CC-86A7-4595D176EA73@sbc.su.se> Hi Charles, Thanks for your report. I believe your interpretation of Bio::Species::classification is correct. It looks like this is going to require a little more investigation. Could you please submit this as a bug report along with a little test case? http://www.bioperl.org/wiki/Bugs Dave On May 20, 2010, at 22:27, Charles Tilford wrote: > Bio::Species::classification() is irritated with me when I provide it with a @class_array that is composed of one node, particularly: > > $obj->classification("unclassified sequences") > > AFAICT this is a valid, single node taxa "tree": > > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=12908 > > Subroutine classification is expecting at least two class members, the problem with the above call crops up as: > > Use of uninitialized value $vals[1] in quotemeta at /stf/biocgi/tilfordc/patch_lib/Bio/Species.pm line 179 > ( $Id: Species.pm 16700 2010-01-15 19:50:11Z dave_messina $) > > > ... and the relevant code is: > > sub classification { > my ($self, @vals) = @_; > > if (@vals) { > if (ref($vals[0]) eq 'ARRAY') { > @vals = @{$vals[0]}; > } > > # make sure the lineage contains us as first or second element > # (lineage may have subspecies, species, genus ...) > my $name = $self->node_name; > my ($genus, $species) = (quotemeta($vals[1]), quotemeta($vals[0])); > > > That is, it's expecting at least (species, genus) in the array. Am I misusing classification(), or Bio::Species in general? I know it's named "Species", but I've been using it as a generic tree object for arbitrary taxonomy nodes, not just species and subspecies. This block a little lower down: > > unless ($self->rank) { > # and that we are rank species > $self->rank('species'); > } > > > ... implies that the module can be used for taxa ranks other than species. However, doing so would not prevent the module being aggravated over a null $vals[1]. > > The use case here is building Bio::Seq::RichSeq objects pulled from a (very large) sequence database, and then dumped / displayed with SeqIO. Most are well behaved, but there's a non-trivial number of 'artificial' constructs that don't root to an organism. > > -CAT > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed May 26 07:53:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 06:53:50 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <87zkznb4nz.fsf@topper.koldfront.dk> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> Message-ID: On May 26, 2010, at 12:41 AM, Adam Sj?gren wrote: > On Tue, 25 May 2010 15:10:42 -0600, Hilmar wrote: > >> Bottom line - just forking on git and committing a change isn't a >> substitute for bringing up an issue and possible solutions on the >> list, and the vetting of pull requests can fall upon only one or two >> core developers. Two eyeballs often spot a lot less than a hundred. > > Just to clarify: I specifically _didn't_ make a Pull request yet. > > I simply created the fork store the patch in a visible way - my > intention was then to clean the patch up and make it ready for > comments/discussion (I just haven't had time to do so yet). > > I am new to github, but as I understood the interface there, anyone is > free (encouraged?) to "fork" their own clone to work in, as a kind of > "public" personal workspace, and when you feel that your clone is ready > to be merged, then - only then - you do a "Pull request". That's odd; I recall receiving a pull request from your fork at some point, but maybe I simply looked into the fork queue instead (which I thought was derived from pull requests, but maybe not). > If that isn't the way github is supposed to be used, or that isn't the > way BioPerl wants to use it, let me know and I'll adjust. > > I appreciate the comments so far, and will get back to this as soon as I > can. > > > Thanks, > > Adam No problem Adam, we're going through the learning curve on this end as well re: this specific github feature. I think how you are going about this is fine, we'll need to come up with some documentation as to how our collabs pull in forked code. chrus From hlapp at drycafe.net Wed May 26 09:27:55 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 26 May 2010 07:27:55 -0600 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <87zkznb4nz.fsf@topper.koldfront.dk> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> Message-ID: <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> On May 25, 2010, at 11:41 PM, Adam Sj?gren wrote: > as I understood the interface there, anyone is free (encouraged?) to > "fork" their own clone to work in, as a kind of "public" personal > workspace, and when you feel that your clone is ready to be merged, > then - only then - you do a "Pull request". That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) And yes, encouraged to fork indeed. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Wed May 26 10:03:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 16:03:14 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> Message-ID: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> On May 26, 2010, at 15:27, Hilmar Lapp wrote: > That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) That would be me. :) His commits were sitting in the fork queue, which I mistakenly understood to mean a pull request had been made. Turns out that's not the case (See http://github.com/blog/270-the-fork-queue). Dave From David.Messina at sbc.su.se Wed May 26 10:52:05 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 16:52:05 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: > So this brings up an interesting point. At some point, we'll have to build out a few additional SearchIO methods to incorporate some of the additional information encoded in the HMMER v3 reports. Would the new methods need to be added to SearchIO if they're specific to H3? (as opposed to just being in the H3 sub-class) > Sean talks a bit in the user manual about the importance of looking at both the full sequence and the best domain (see page 18 in the manual linked to on this page http://hmmer.janelia.org/#documentation). For example, he mentions that one should consider the e-value of both the full sequence and best domain to ascertain if the query is homologous to a profile being considered via hmmsearch. > > He also mentions that looking at the full sequence report values without consideration of the best domain report values can be misleading. I'm not saying that your approach regarding Hit->raw_score is wrong - proper interpretation of the results is up to the end user and there are benefits to looking at the full sequence (again, communicated on page 18) - but we might consider how to best encode the SearchIO methods to mitigate end user confusion and mistakes. I think this is a great idea. Of course it's always best for end-users to RTFM and understand the tools they're using, but it's clearly beneficial to make it easier to do the right thing. Having not considered it too much, I'm not sure how to accomplish this without breaking the SearchIO idiom. But presumably a way could be found. >> Some of the folks on IRC suggested that we might want to integrate the >> hmmer.pm parser as well, modularizing this a bit and loading the correct >> parser depending on the requested format. > This might make sense, given that HMMER v3 is now live and seems to be adopted by researchers at an increasing rate. Since I used hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult to do, either. I think a thorough conversation on this point is warranted as others I've talked to have preferred the modules to be separate. > > I'd be interested to hear what other have to say on this point. I did not follow the IRC discussion, so I confess I'm not totally clear on what "integrate the hmmer.pm parser" means. I'm taking it to mean combining the code that parses HMMER2 with the code that parses HMMER3. But then "modularizing this a bit and loading the correct parser depending on the requested format" seems to contradict that assumption. Perhaps you (or someone) could clarify a bit what the HMMER2 - HMMER3 integration would look like (and the goal of doing so) ? Dave From thomas.sharpton at gmail.com Wed May 26 11:25:24 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 26 May 2010 08:25:24 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: Thanks for the feedback, Dave. >> So this brings up an interesting point. At some point, we'll have >> to build out a few additional SearchIO methods to incorporate some >> of the additional information encoded in the HMMER v3 reports. > > Would the new methods need to be added to SearchIO if they're > specific to H3? (as opposed to just being in the H3 sub-class) Sorry for being unclear - the methods in question would be, at least in my mind, specific to the H3 sub-class. > >> Sean talks a bit in the user manual about the importance of looking >> at both the full sequence and the best domain (see page 18 in the >> manual linked to on this page http://hmmer.janelia.org/#documentation) >> . For example, he mentions that one should consider the e-value of >> both the full sequence and best domain to ascertain if the query is >> homologous to a profile being considered via hmmsearch. >> >> He also mentions that looking at the full sequence report values >> without consideration of the best domain report values can be >> misleading. I'm not saying that your approach regarding Hit- >> >raw_score is wrong - proper interpretation of the results is up to >> the end user and there are benefits to looking at the full sequence >> (again, communicated on page 18) - but we might consider how to >> best encode the SearchIO methods to mitigate end user confusion and >> mistakes. > > I think this is a great idea. > > Of course it's always best for end-users to RTFM and understand the > tools they're using, but it's clearly beneficial to make it easier > to do the right thing. > > Having not considered it too much, I'm not sure how to accomplish > this without breaking the SearchIO idiom. But presumably a way could > be found. > I'll see if I can't hit the drawing board and come up with a naming scheme for additional H3 methods that retrieve some of the extra data encoded in the new reports. It *probably* makes most sense, at least from the standpoint of the user's perspective, to adopt the full- length report values as the standard hit->significance and hit- >raw_score while having something like hit->best_significance and hit- >best_score as H3 methods that return the best-domain report values. Again, this could use some thought/discussion. > >>> Some of the folks on IRC suggested that we might want to integrate >>> the >>> hmmer.pm parser as well, modularizing this a bit and loading the >>> correct >>> parser depending on the requested format. > >> This might make sense, given that HMMER v3 is now live and seems to >> be adopted by researchers at an increasing rate. Since I used >> hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult >> to do, either. I think a thorough conversation on this point is >> warranted as others I've talked to have preferred the modules to be >> separate. >> >> I'd be interested to hear what other have to say on this point. > > I did not follow the IRC discussion, so I confess I'm not totally > clear on what "integrate the hmmer.pm parser" means. I'm taking it > to mean combining the code that parses HMMER2 with the code that > parses HMMER3.= > But then "modularizing this a bit and loading the correct parser > depending on the requested format" seems to contradict that > assumption. > > Perhaps you (or someone) could clarify a bit what the HMMER2 - > HMMER3 integration would look like (and the goal of doing so) ? > I was not a part of that conversation either and I'm also operating under a similar assumption about what "integrating the hmmer.pm parser" means. I too am confused about the statement regarding modularization; I assume Kai meant that next_result would leverage the HMMER version number (which it already grabs) to guide the appropriate parsing of the datafile. Not thinking about this too carefully, it might be a simple as: next_result{ version = get_hmmer_version if version == 2 parse V2 report file if version == 3 parse V3 report file } to make the code a bit more manageable, the various version parsers could be appropriated to independent subroutines. Kai, is this along the lines of what you were thinking? If this is correct (that is, merging the H2 and H3 parsers into a single hmmer.pm module), I see one primary benefit - the end user need not specify which HMMER module they want to implement, just use Bio::SearchIO::hmmer - and one secondary benefit - there's enough similarity between H2 and H3 reports that some from the H2 parser redundantly appears in the H3 parser. There are certainly other benefits that I'm overlooking. The only real downside I see at the moment is that the hmmer.pm parser becomes a bit more complicated and bloated. But I suspect this can be remedied with careful partitioning of the code into appropriate subroutines and thorough documentation. I am a bit concerned about how the aforementioned H3 specific methods are incorporated, but that should be manageable. I wonder if anyone involved in the IRC discussion cares to weigh in? Regardless, I'd advocate getting the H3 version fully flushed out to deal with the issues brought up in the first half of this message prior to an attempt to merge the two modules, as the merging process may be affected by the structure of the H3 parser. Best, Tom From cjfields at illinois.edu Wed May 26 12:13:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 11:13:59 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: On May 26, 2010, at 9:03 AM, Dave Messina wrote: > > On May 26, 2010, at 15:27, Hilmar Lapp wrote: > >> That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) > > > That would be me. :) > > His commits were sitting in the fork queue, which I mistakenly understood to mean a pull request had been made. Turns out that's not the case (See http://github.com/blog/270-the-fork-queue). > > > Dave We can clarify that in the docs on the bioperl site, maybe in a github-specific section. chris From cjfields at illinois.edu Wed May 26 12:17:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 11:17:50 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: <3826604E-CD90-42A5-A0B2-004D9922B6AA@illinois.edu> On May 26, 2010, at 10:25 AM, Thomas Sharpton wrote: >> ... >> I did not follow the IRC discussion, so I confess I'm not totally clear on what "integrate the hmmer.pm parser" means. I'm taking it to mean combining the code that parses HMMER2 with the code that parses HMMER3.= > >> But then "modularizing this a bit and loading the correct parser depending on the requested format" seems to contradict that assumption. >> >> Perhaps you (or someone) could clarify a bit what the HMMER2 - HMMER3 integration would look like (and the goal of doing so) ? >> > > I was not a part of that conversation either and I'm also operating under a similar assumption about what "integrating the hmmer.pm parser" means. I too am confused about the statement regarding modularization; I assume Kai meant that next_result would leverage the HMMER version number (which it already grabs) to guide the appropriate parsing of the datafile. Not thinking about this too carefully, it might be a simple as: > > next_result{ > version = get_hmmer_version > if version == 2 > parse V2 report file > if version == 3 > parse V3 report file > } > > to make the code a bit more manageable, the various version parsers could be appropriated to independent subroutines. > > Kai, is this along the lines of what you were thinking? > > If this is correct (that is, merging the H2 and H3 parsers into a single hmmer.pm module), I see one primary benefit - the end user need not specify which HMMER module they want to implement, just use Bio::SearchIO::hmmer - and one secondary benefit - there's enough similarity between H2 and H3 reports that some from the H2 parser redundantly appears in the H3 parser. There are certainly other benefits that I'm overlooking. > > The only real downside I see at the moment is that the hmmer.pm parser becomes a bit more complicated and bloated. But I suspect this can be remedied with careful partitioning of the code into appropriate subroutines and thorough documentation. I am a bit concerned about how the aforementioned H3 specific methods are incorporated, but that should be manageable. > > I wonder if anyone involved in the IRC discussion cares to weigh in? > > Regardless, I'd advocate getting the H3 version fully flushed out to deal with the issues brought up in the first half of this message prior to an attempt to merge the two modules, as the merging process may be affected by the structure of the H3 parser. > > Best, > Tom That's essentially the idea, though it can be cleaner than that if we're expecting the entire stream of reports will be of the same version (set the proper next_result method at instantiation). SearchIO::infernal does something like this. Or it can call out to a handler, like SearchIO::blastxml. YMMV. chris From maj at fortinbras.us Wed May 26 13:43:37 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 May 2010 13:43:37 -0400 Subject: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail><9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net><87zkznb4nz.fsf@topper.koldfront.dk><1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: <85C731A2326D45FB903FB1B0D5C5DEBF@NewLife> No zeal is is overweening that is on the side of the Right. ----- Original Message ----- From: "Dave Messina" To: "Hilmar Lapp" Cc: "Adam Sj?gren" ; Sent: Wednesday, May 26, 2010 10:03 AM Subject: Re: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) > > On May 26, 2010, at 15:27, Hilmar Lapp wrote: > >> That would be my understanding too. Maybe some overzealous Bioperl gitizens >> at work who weren't going to wait for this? ;) > > > That would be me. :) > > His commits were sitting in the fork queue, which I mistakenly understood to > mean a pull request had been made. Turns out that's not the case (See > http://github.com/blog/270-the-fork-queue). > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Wed May 26 15:03:21 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:03:21 +0200 Subject: [Bioperl-l] new to this group In-Reply-To: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: Hi Kanna, Welcome! We're always happy to have more people jump in the deep end of the pool and help out. >From my reading of the project priority page, the OBO file parsing stuff has been done: > (This appears to be basically solved with the new OBOEngine, Sohel will need to comment if it is indeed finished). --jason stajich 20:10, 19 June 2006 (EDT) ( see http://www.bioperl.org/wiki/Project_priority_list#Ontology_file_parsing ) Can anyone (Hilmar?) who knows where we're at with this verify that our OBO parser is in good shape? I did notice this open bug, Kanna: bp_load_ontology ISBN title parsing error in OBO format http://bugzilla.open-bio.org/show_bug.cgi?id=2730 Is that something you might be interested in? > I visited the project priority list and the page seems to have been modified around 6 months ago. Agreed, it's probably time for someone to go through and update it. I'll post to the list separately about this. > If it is already completed could anyone suggest modules I can contribute to? But even though the project priority list is outdated, the open bugs list is not: http://bugzilla.open-bio.org/buglist.cgi?product=Bioperl&bug_status=NEW I would recommend you look for something relatively small to start with and submit a patch for that. And then as you go along we'll get a better idea of how to direct you as you get a better idea of what needs to be done. Dave From David.Messina at sbc.su.se Wed May 26 15:22:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:22:40 +0200 Subject: [Bioperl-l] project priority list Message-ID: <0DC6E827-8855-4463-8C58-79CC26BDF42D@sbc.su.se> So, as pointed out by Kanna in another thread, our Project Priority list is getting a little stale. http://www.bioperl.org/wiki/Project_priority_list There are lot of things on there that have been crossed off for years now. I propose that we do some housecleaning, including deleting long-finished projects from the list. (They'll still live on in the wiki history of the page.) Unless someone objects, I'll start poking at it a bit, but if other core devs with relevant knowledge of various projects could take a moment to peruse and edit too, that would be great. Dave From jay at jays.net Wed May 26 15:27:01 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 26 May 2010 14:27:01 -0500 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: <1D273263-F9B4-4612-961B-E2B0F480FBC3@jays.net> On May 26, 2010, at 2:03 PM, Dave Messina wrote: > I would recommend you look for something relatively small to start with and submit a patch for that. Ideally "submit a patch" means create a github.com account, click "fork" on the bioperl-live repo, commit your changes into your fork, then send us a "pull request". :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From scott at scottcain.net Wed May 26 15:36:16 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 26 May 2010 15:36:16 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git Message-ID: Hi all, For GBrowse on the 1.X branch there is a network install script that people can download and execute and it will install all of the prerequisites and then install GBrowse. For this script, we also support a -d(eveloper) option, to get GBrowse and BioPerl from their repositories. Now that BioPerl has moved to git, I have a question: does anybody know if there is a way (preferably via url) to get bioperl from git in a non-interactive way? The read-only url on the bioperl-live git page, http://github.com/bioperl/bioperl-live.git, leads to a 404 error, and even if it didn't, I have a feeling that it would take a click or two to get to downloading source. Does anybody with more git-fu than me (which isn't a hard thing to have, since I don't have much) have any suggestions? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From David.Messina at sbc.su.se Wed May 26 15:41:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:41:10 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: <1F539D4E-D352-4F93-AF1E-E9324B970D34@sbc.su.se> > We can clarify that in the docs on the bioperl site, maybe in a github-specific section. I've stubbed it in on Using Git http://www.bioperl.org/wiki/Using_Git Please modify or expand as you see fit. Dave From scott at scottcain.net Wed May 26 15:57:21 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 26 May 2010 15:57:21 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: Also on the bioperl git page is a "download master" link, which pops up a cute javascript window offering me a choice of zip or tar files. If I copy the url of the tar file, I get a page that says: You are being redirected. where presumably, the digits after "bioperl-release" will change on a regular basis (right?), so that doesn't help much either (yes, I know I could parse the redirect message and get that url, but really, is there such a thing as a HEAD url?) Thanks, Scott On Wed, May 26, 2010 at 3:36 PM, Scott Cain wrote: > Hi all, > > For GBrowse on the 1.X branch there is a network install script that > people can download and execute and it will install all of the > prerequisites and then install GBrowse. ?For this script, we also > support a -d(eveloper) option, to get GBrowse and BioPerl from their > repositories. ?Now that BioPerl has moved to git, I have a question: > does anybody know if there is a way (preferably via url) to get > bioperl from git in a non-interactive way? > > The read-only url on the bioperl-live git page, > http://github.com/bioperl/bioperl-live.git, leads to a 404 error, and > even if it didn't, I have a feeling that it would take a click or two > to get to downloading source. ?Does anybody with more git-fu than me > (which isn't a hard thing to have, since I don't have much) have any > suggestions? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From kai.blin at biotech.uni-tuebingen.de Wed May 26 16:07:02 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Wed, 26 May 2010 22:07:02 +0200 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: <1274904422.3019.2.camel@gonzo.home.kblin.org> On Wed, 2010-05-26 at 15:36 -0400, Scott Cain wrote: Hi Scott, > For GBrowse on the 1.X branch there is a network install script that > people can download and execute and it will install all of the > prerequisites and then install GBrowse. For this script, we also > support a -d(eveloper) option, to get GBrowse and BioPerl from their > repositories. Now that BioPerl has moved to git, I have a question: > does anybody know if there is a way (preferably via url) to get > bioperl from git in a non-interactive way? A quick look on the "BioPerl moved to git" announcement (http://news.open-bio.org/news/2010/05/bioperl-has-moved-to-github/) you can find the following link: http://github.com/bioperl/bioperl-live/archives/master This page gives links to a zip and a tar version of BioPerl's master repository, which seems to be what you want. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From David.Messina at sbc.su.se Wed May 26 16:09:22 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 22:09:22 +0200 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Hi Scott, I think the URLs you want are these http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots snapshots of the current repository. If you want instead to grab a static version of a repository, say a tagged revision, you can do like this: http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 (where "for_gmod_0_003" is the tag). By the way, I am getting these URLs on GitHub by: 1. going to the GitHub page for the relevant repository e.g. http://github.com/bioperl/bioperl-live 2. navigating to the tag or branch of interest using the "Switch Branches" or "Switch Tags" pulldowns 3. clicking on the Download Source button 4. right-clicking on the big TAR icon to copy the link underlying it Dave From rmb32 at cornell.edu Wed May 26 16:48:13 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 26 May 2010 13:48:13 -0700 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: <4BFD890D.4080205@cornell.edu> Sigh .... once we get our house in order to the point where it's easy to and quick to make releases with bugfixes, you'll be able to just get the most recent copies of the parts you need from CPAN. That'll be the day. Rob From hlapp at drycafe.net Wed May 26 18:05:36 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 26 May 2010 16:05:36 -0600 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: On May 26, 2010, at 1:03 PM, Dave Messina wrote: > Can anyone (Hilmar?) who knows where we're at with this verify that > our OBO parser is in good shape? The obo parser should be working. It's not wrapping the go-perl parser though. I should revisit the code I've written for that, I know ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Wed May 26 19:27:27 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 18:27:27 -0500 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> On May 26, 2010, at 5:05 PM, Hilmar Lapp wrote: > > On May 26, 2010, at 1:03 PM, Dave Messina wrote: > >> Can anyone (Hilmar?) who knows where we're at with this verify that our OBO parser is in good shape? > > > The obo parser should be working. It's not wrapping the go-perl parser though. I should revisit the code I've written for that, I know ... > > -hilmar So, that might be an area for someone to work on? chris From hlapp at drycafe.net Thu May 27 09:30:05 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 27 May 2010 07:30:05 -0600 Subject: [Bioperl-l] new to this group In-Reply-To: <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> Message-ID: <292C7384-2EF0-45F7-85F9-BB173FE2B6E5@drycafe.net> On May 26, 2010, at 5:27 PM, Chris Fields wrote: >> The obo parser should be working. It's not wrapping the go-perl >> parser though. I should revisit the code I've written for that, I >> know ... >> > > So, that might be an area for someone to work on? Certainly if you want to start from scratch. The code I've written isn't committed (yes, shame on me). That said, I suppose I could now easily commit it to a branch and not cause any harm, right :-) It's not a very good target for a newcomer at all, though. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From kai.blin at biotech.uni-tuebingen.de Thu May 27 10:50:40 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 27 May 2010 16:50:40 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: <1274971840.9545.316.camel@mikropc7.biotech.uni-tuebingen.de> On Wed, 2010-05-26 at 08:25 -0700, Thomas Sharpton wrote: > > Having not considered it too much, I'm not sure how to accomplish > > this without breaking the SearchIO idiom. But presumably a way could > > be found. > > > > I'll see if I can't hit the drawing board and come up with a naming > scheme for additional H3 methods that retrieve some of the extra data > encoded in the new reports. It *probably* makes most sense, at least > from the standpoint of the user's perspective, to adopt the full- > length report values as the standard hit->significance and hit- > >raw_score while having something like hit->best_significance and hit- > >best_score as H3 methods that return the best-domain report values. > Again, this could use some thought/discussion. My reasoning for the change was that you can get at the best sequence score by (at worst) iterating over the top sequences. Without the change there was no way to get at the overall profile score, so that data was lost. Arguably this is just one way to try and make the data from the HMMer results accessible via the SearchIO interface. > I was not a part of that conversation either and I'm also operating > under a similar assumption about what "integrating the hmmer.pm > parser" means. I too am confused about the statement regarding > modularization; I assume Kai meant that next_result would leverage the > HMMER version number (which it already grabs) to guide the appropriate > parsing of the datafile. Not thinking about this too carefully, it > might be a simple as: > > next_result{ > version = get_hmmer_version > if version == 2 > parse V2 report file > if version == 3 > parse V3 report file > } > > to make the code a bit more manageable, the various version parsers > could be appropriated to independent subroutines. > > Kai, is this along the lines of what you were thinking? Yes, this is more or less what I meant. But I agree that we first want to get the hmmer3 parser sorted out and working nicely. More test cases for the parser would be nice, I just got sidetracked by another bug affecting my code. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From scott at scottcain.net Thu May 27 11:29:42 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 27 May 2010 11:29:42 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: Hi All, Thanks for pointing out the links. It's weird: using curl on those urls retrieves a "redirect" page, whereas LWP::Simple::mirror gets the tarball. Anyway, the script works again :-) Scott On Wed, May 26, 2010 at 4:09 PM, Dave Messina wrote: > Hi Scott, > > I think the URLs you want are these > > ? ? ? ?http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots > > snapshots of the current repository. > > > If you want instead to grab a static version of a repository, say a tagged revision, you can do like this: > > http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 > > (where "for_gmod_0_003" is the tag). > > > By the way, I am getting these URLs on GitHub by: > > 1. ?going to the GitHub page for the relevant repository > > ? ? ? ?e.g. http://github.com/bioperl/bioperl-live > > 2. ?navigating to the tag or branch of interest using the "Switch Branches" or "Switch Tags" pulldowns > > 3. ?clicking on the Download Source button > > 4. ?right-clicking on the big TAR icon to copy the link underlying it > > > > Dave > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Thu May 27 11:40:37 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 May 2010 11:40:37 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Message-ID: Chris, Removed all erroneous references to Subversion except for these pages, which require detailed editing and/or a familiarity with Git: http://www.bioperl.org/wiki/Emacs_bioperl-mode http://www.bioperl.org/wiki/HOWTO:Wrappers http://www.bioperl.org/wiki/Making_a_BioPerl_release http://www.bioperl.org/w/index.php/HOWTO:BlastPlus One issue now is the references to pedigree, microarray, GUI, pipeline, and ext, which only exist in SVN. Also GUI, pipeline, and microarray are unsupported, and have been unsupported for many years. Yet they are still listed in pages like: http://www.bioperl.org/wiki/Getting_BioPerl They shouldn't be listed alongside bioperl-live or -run, or they should not be listed at all. Should they be removed? or put into their own "unsupported" section? Brian O. On May 20, 2010, at 11:37 AM, Chris Fields wrote: > Yes, if you have time. I have started along that path already, but I'm sure there are lingering spots where links point to the wrong place, or subversion/svn is mentioned. > > chris > > On May 20, 2010, at 10:34 AM, Brian Osborne wrote: > >> Chris, >> >> Done, easy. Should I remove all references to SVN from the Wiki? >> >> Brian O. >> >> On May 18, 2010, at 2:04 PM, Chris Fields wrote: >> >>> Yes. >>> >>> chris >>> >>> On May 18, 2010, at 11:06 AM, Brian Osborne wrote: >>> >>>> bioperl-l, >>>> >>>> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >>>> >>>> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >>>> >>>> Brian O. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From cjfields at illinois.edu Thu May 27 11:58:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 May 2010 10:58:06 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Message-ID: On May 27, 2010, at 10:40 AM, Brian Osborne wrote: > Chris, > > Removed all erroneous references to Subversion except for these pages, which require detailed editing and/or a familiarity with Git: > > http://www.bioperl.org/wiki/Emacs_bioperl-mode > > http://www.bioperl.org/wiki/HOWTO:Wrappers > > http://www.bioperl.org/wiki/Making_a_BioPerl_release > > http://www.bioperl.org/w/index.php/HOWTO:BlastPlus Okay, looks good so far. I know the emacs mode stuff will be handled by Mark (I'm assuming the others will follow suit). I'll have to go in and clean up the 'making a release' page myself to update it. > One issue now is the references to pedigree, microarray, GUI, pipeline, and ext, which only exist in SVN. By 'only existing in svn', do you mean they are only found there? I moved everything over for archiving: http://github.com/bioperl/bioperl-gui http://github.com/bioperl/bioperl-microarray http://github.com/bioperl/bioperl-pedigree http://github.com/bioperl/bioperl-pipeline > Also GUI, pipeline, and microarray are unsupported, and have been unsupported for many years. Yet they are still listed in pages like: > > http://www.bioperl.org/wiki/Getting_BioPerl > > They shouldn't be listed alongside bioperl-live or -run, or they should not be listed at all. > > Should they be removed? or put into their own "unsupported" section? I think to an 'unsupported' or 'unmaintained' section; could add the corba and pise ones as well (just noticed that the pise repo was missing from github, so just added it for archiving). > Brian O. Thanks brian! chris From sdavis2 at mail.nih.gov Thu May 27 12:04:04 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 27 May 2010 12:04:04 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: On Thu, May 27, 2010 at 11:29 AM, Scott Cain wrote: > Hi All, > > Thanks for pointing out the links. It's weird: using curl on those > urls retrieves a "redirect" page, whereas LWP::Simple::mirror gets the > tarball. Anyway, the script works again :-) > > Hi, Scott. For curl, try: curl -L .... The -L follows redirects. Sean > > On Wed, May 26, 2010 at 4:09 PM, Dave Messina > wrote: > > Hi Scott, > > > > I think the URLs you want are these > > > > http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots > > > > snapshots of the current repository. > > > > > > If you want instead to grab a static version of a repository, say a > tagged revision, you can do like this: > > > > http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 > > > > (where "for_gmod_0_003" is the tag). > > > > > > By the way, I am getting these URLs on GitHub by: > > > > 1. going to the GitHub page for the relevant repository > > > > e.g. http://github.com/bioperl/bioperl-live > > > > 2. navigating to the tag or branch of interest using the "Switch > Branches" or "Switch Tags" pulldowns > > > > 3. clicking on the Download Source button > > > > 4. right-clicking on the big TAR icon to copy the link underlying it > > > > > > > > Dave > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From remi.planel at free.fr Fri May 28 06:29:50 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 12:29:50 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult Message-ID: <4BFF9B1E.10500@free.fr> Hi all, I would like to get a clone of a Bio::Search::Result::GenericResult object and I'm not sure of what I'm doing ... I've tried something like : /my $searchIn = Bio::SearchIO->new( -file => 'result.bls', -format => 'blastxml', ); my $result = $searchIn->next_result; my $result_copy = $result->new($result); /It seems to work but I'm not sure to understand how. So I would like to know if I'll get in trouble using this code and if all the fields are copied one by one. Thank you, R?mi // From David.Messina at sbc.su.se Fri May 28 07:32:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 28 May 2010 13:32:40 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFF9B1E.10500@free.fr> References: <4BFF9B1E.10500@free.fr> Message-ID: Hi R?mi, As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). So I don't think the code you showed will work. However, there are modules such as Clone::More and Clone::Fast that can do it. http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. Dave From remi.planel at free.fr Fri May 28 08:17:01 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 14:17:01 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: References: <4BFF9B1E.10500@free.fr> Message-ID: <4BFFB43D.50409@free.fr> You're right, it's not working there is some missing fields ... Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : -Display Result object as HTML -Ask for filter criteria -Filter Result object -Display filtered Result object as HTML. ... etc And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. I'll have a look to the modules you've mentioned, thanks. Dave Messina wrote: > Hi R?mi, > > As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). > > So I don't think the code you showed will work. > > However, there are modules such as Clone::More and Clone::Fast that can do it. > > http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm > http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm > > > Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. > > Dave > > > From cjfields at illinois.edu Fri May 28 09:25:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 28 May 2010 08:25:54 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFFB43D.50409@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> Message-ID: <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> Remi, Using the constructor that way is not supported. But it's completely unnecessary. Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. Something like the following should work (of course completely untested :) my $result = $in->next_result; # filter on HSP write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); # rewind the result to go back to the beginning $result->rewind; # open a new filehandle here for second report output # filter on hit and HSP write_html('result2.html', $result, { 'HIT' => \&hit_filter, 'HSP' => \&hsp_filter }); # rewind the result to go back to the beginning $result->rewind; # and so on.... sub write_html { my ($file, $result, $filters) = @_; # note that $filter is a hash ref above my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new (-filters => $filters ); my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); $out->write_result($result); } sub hsp_filter { my $hsp = shift; return 1 if $hsp->length('total') > 100; } sub hit_filter { my $hit = shift; return 1 if $hit->significance < 1e-5; } chris On May 28, 2010, at 7:17 AM, Remi wrote: > You're right, it's not working there is some missing fields ... > > Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : > > -Display Result object as HTML > -Ask for filter criteria > -Filter Result object > -Display filtered Result object as HTML. > ... etc > > And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. > > I'll have a look to the modules you've mentioned, thanks. > > > > > Dave Messina wrote: >> Hi R?mi, >> >> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >> >> So I don't think the code you showed will work. >> >> However, there are modules such as Clone::More and Clone::Fast that can do it. >> >> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >> >> >> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >> >> Dave >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri May 28 10:34:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 28 May 2010 09:34:13 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFFD3D5.2000409@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> Message-ID: Let us know how it goes, and if you run into any bugs. chris On May 28, 2010, at 9:31 AM, Remi wrote: > Thank you very much !!!! > I'm gonna try it right away > > Chris Fields wrote: >> Remi, >> >> Using the constructor that way is not supported. But it's completely unnecessary. >> >> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >> >> Something like the following should work (of course completely untested :) >> >> my $result = $in->next_result; >> >> # filter on HSP >> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >> >> # rewind the result to go back to the beginning >> $result->rewind; >> >> # open a new filehandle here for second report output >> # filter on hit and HSP >> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >> 'HSP' => \&hsp_filter }); >> >> # rewind the result to go back to the beginning >> $result->rewind; >> >> # and so on.... >> >> sub write_html { >> my ($file, $result, $filters) = @_; >> # note that $filter is a hash ref above >> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >> (-filters => $filters ); >> >> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >> $out->write_result($result); >> } >> >> sub hsp_filter { >> my $hsp = shift; >> return 1 if $hsp->length('total') > 100; >> } >> >> sub hit_filter { >> my $hit = shift; >> return 1 if $hit->significance < 1e-5; >> } >> >> chris >> >> >> On May 28, 2010, at 7:17 AM, Remi wrote: >> >> >> >>> You're right, it's not working there is some missing fields ... >>> >>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>> >>> -Display Result object as HTML >>> -Ask for filter criteria >>> -Filter Result object >>> -Display filtered Result object as HTML. >>> ... etc >>> >>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>> >>> I'll have a look to the modules you've mentioned, thanks. >>> >>> >>> >>> >>> Dave Messina wrote: >>> >>> >>>> Hi R?mi, >>>> >>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>> >>>> So I don't think the code you showed will work. >>>> >>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>> >>>> >>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>> >>>> >>>> >>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>> >>>> Dave >>>> >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> > From remi.planel at free.fr Fri May 28 10:31:49 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 16:31:49 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> Message-ID: <4BFFD3D5.2000409@free.fr> An HTML attachment was scrubbed... URL: From fij at elte.hu Sun May 30 05:32:58 2010 From: fij at elte.hu (Farkas, Illes) Date: Sun, 30 May 2010 11:32:58 +0200 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) Message-ID: Hi, I've ran across a relatively simple, but specific task. I would like to put interaction (, , ) data from many sources (databases) into a single list containing the following in each record: , , , . (I am aware that there will be some loss during the ID conversion.) I have found so far the following possibilities: (1) BioMart perl API. Seems to be much smarter (and more complex) than what I would need. Also, I would need to parse input and output just as much as with newly written subroutines/modules. (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and KEGG IDs, but I could not find them on the "From" list. (3) Synergizer. I cannot run it in remote batch mode. From what I would need I could not find BioGrid, ENSP and KEGG identifiers. (4) Writing it all with ID mapping files downloaded from each database and contributing it to BioPerl. How can I contribute? How do I find the best place within BioPerl to add a particular module? Whom do I need to ask for approval? Thanks in advance for any comments. Illes -- http://hal.elte.hu/fij From maj at fortinbras.us Sun May 30 09:42:50 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 30 May 2010 09:42:50 -0400 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: Message-ID: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Illes-- no approval necessary (or, if you like, I approve). What you can do is describe what you want to do as an enhancement request at http://bugzilla.bioperl.org, and then attach your new code to that request. We can review it from there. cheers MAJ ----- Original Message ----- From: "Farkas, Illes" To: Sent: Sunday, May 30, 2010 5:32 AM Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > Hi, > > I've ran across a relatively simple, but specific task. I would like to put > interaction (, , ) data from many sources > (databases) into a single list containing the following in each record: > , , , > . (I am aware that there will be some loss during the ID > conversion.) > > I have found so far the following possibilities: > > (1) BioMart perl API. Seems to be much smarter (and more complex) than what > I would need. Also, I would need to parse input and output just as much as > with newly written subroutines/modules. > > (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and > KEGG IDs, but I could not find them on the "From" list. > > (3) Synergizer. I cannot run it in remote batch mode. From what I would need > I could not find BioGrid, ENSP and KEGG identifiers. > > (4) Writing it all with ID mapping files downloaded from each database and > contributing it to BioPerl. How can I contribute? How do I find the best > place within BioPerl to add a particular module? Whom do I need to ask for > approval? > > Thanks in advance for any comments. > Illes > > -- > http://hal.elte.hu/fij > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun May 30 11:00:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 30 May 2010 10:00:09 -0500 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> References: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Message-ID: Another couple of options: 1) for code changes, fork the code on GitHub, add your code there, then make a push request 2) for adding code, create a repo on github with the code, chris On May 30, 2010, at 8:42 AM, Mark A. Jensen wrote: > Illes-- no approval necessary (or, if you like, I approve). What you can do is describe what you want to do as an enhancement request at http://bugzilla.bioperl.org, and then attach your new code to that request. We can review it from there. > cheers MAJ > ----- Original Message ----- From: "Farkas, Illes" > To: > Sent: Sunday, May 30, 2010 5:32 AM > Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > > >> Hi, >> >> I've ran across a relatively simple, but specific task. I would like to put >> interaction (, , ) data from many sources >> (databases) into a single list containing the following in each record: >> , , , >> . (I am aware that there will be some loss during the ID >> conversion.) >> >> I have found so far the following possibilities: >> >> (1) BioMart perl API. Seems to be much smarter (and more complex) than what >> I would need. Also, I would need to parse input and output just as much as >> with newly written subroutines/modules. >> >> (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and >> KEGG IDs, but I could not find them on the "From" list. >> >> (3) Synergizer. I cannot run it in remote batch mode. From what I would need >> I could not find BioGrid, ENSP and KEGG identifiers. >> >> (4) Writing it all with ID mapping files downloaded from each database and >> contributing it to BioPerl. How can I contribute? How do I find the best >> place within BioPerl to add a particular module? Whom do I need to ask for >> approval? >> >> Thanks in advance for any comments. >> Illes >> >> -- >> http://hal.elte.hu/fij >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun May 30 11:05:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 30 May 2010 10:05:37 -0500 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: Message-ID: <84D300DB-C22D-494E-ABAF-EBC10FEE0E7C@illinois.edu> On May 30, 2010, at 4:32 AM, Farkas, Illes wrote: > Hi, > > I've ran across a relatively simple, but specific task. I would like to put > interaction (, , ) data from many sources > (databases) into a single list containing the following in each record: > , , , > . (I am aware that there will be some loss during the ID > conversion.) > > I have found so far the following possibilities: > > (1) BioMart perl API. Seems to be much smarter (and more complex) than what > I would need. Also, I would need to parse input and output just as much as > with newly written subroutines/modules. Or, wondering whether you could create a set of BioPerl<->BioMart bridge modules. > (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and > KEGG IDs, but I could not find them on the "From" list. I added an id_mapper to Bio::DB::SwissProt that calls to this. It hasn't been broadly tested yet, but you are welcome to add more to it. Might also be useful to have a DB wrapper around a locally-built ID mapping database, which would give you more flexibility than the web interface. > (3) Synergizer. I cannot run it in remote batch mode. From what I would need > I could not find BioGrid, ENSP and KEGG identifiers. > > (4) Writing it all with ID mapping files downloaded from each database and > contributing it to BioPerl. How can I contribute? How do I find the best > place within BioPerl to add a particular module? Whom do I need to ask for > approval? > > Thanks in advance for any comments. > Illes A generalized ID mapping interface would be nice. You could also incorporate some of NCBI's eutils stuff along these lines, or their gi2acc mappings. chris From maj at fortinbras.us Sun May 30 19:59:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 30 May 2010 19:59:38 -0400 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Message-ID: <6553B9DFF86F472B8B2D0D8A72171056@NewLife> Yes, that's definitely the Way to Do It post-git- MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Farkas, Illes" ; Sent: Sunday, May 30, 2010 11:00 AM Subject: Re: [Bioperl-l] ID mapping (or: contributing to BioPerl) Another couple of options: 1) for code changes, fork the code on GitHub, add your code there, then make a push request 2) for adding code, create a repo on github with the code, chris On May 30, 2010, at 8:42 AM, Mark A. Jensen wrote: > Illes-- no approval necessary (or, if you like, I approve). What you can do is > describe what you want to do as an enhancement request at > http://bugzilla.bioperl.org, and then attach your new code to that request. We > can review it from there. > cheers MAJ > ----- Original Message ----- From: "Farkas, Illes" > To: > Sent: Sunday, May 30, 2010 5:32 AM > Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > > >> Hi, >> >> I've ran across a relatively simple, but specific task. I would like to put >> interaction (, , ) data from many sources >> (databases) into a single list containing the following in each record: >> , , , >> . (I am aware that there will be some loss during the ID >> conversion.) >> >> I have found so far the following possibilities: >> >> (1) BioMart perl API. Seems to be much smarter (and more complex) than what >> I would need. Also, I would need to parse input and output just as much as >> with newly written subroutines/modules. >> >> (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and >> KEGG IDs, but I could not find them on the "From" list. >> >> (3) Synergizer. I cannot run it in remote batch mode. From what I would need >> I could not find BioGrid, ENSP and KEGG identifiers. >> >> (4) Writing it all with ID mapping files downloaded from each database and >> contributing it to BioPerl. How can I contribute? How do I find the best >> place within BioPerl to add a particular module? Whom do I need to ask for >> approval? >> >> Thanks in advance for any comments. >> Illes >> >> -- >> http://hal.elte.hu/fij >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 31 09:23:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 31 May 2010 08:23:13 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4C037F22.3090209@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> Message-ID: <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> That sounds like a bug. Does filtering at the hit level work around this? sub hit_filter { my $hit = shift; # filter hsps here my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; @passing_hsps; } sub hsp_filter { # original filter } chris On May 31, 2010, at 4:19 AM, Remi wrote: > Hi, > > Everything is working well but there is still one point that giving me some trouble. > When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. > Is there a way to get rid of this description line ? > Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? > > Thanks, > > R?mi > > > Chris Fields wrote: >> Let us know how it goes, and if you run into any bugs. >> >> chris >> >> On May 28, 2010, at 9:31 AM, Remi wrote: >> >> >> >>> Thank you very much !!!! >>> I'm gonna try it right away >>> >>> Chris Fields wrote: >>> >>> >>>> Remi, >>>> >>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>> >>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>> >>>> Something like the following should work (of course completely untested :) >>>> >>>> my $result = $in->next_result; >>>> >>>> # filter on HSP >>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>> >>>> # rewind the result to go back to the beginning >>>> $result->rewind; >>>> >>>> # open a new filehandle here for second report output >>>> # filter on hit and HSP >>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>> 'HSP' => \&hsp_filter }); >>>> >>>> # rewind the result to go back to the beginning >>>> $result->rewind; >>>> >>>> # and so on.... >>>> >>>> sub write_html { >>>> my ($file, $result, $filters) = @_; >>>> # note that $filter is a hash ref above >>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>> (-filters => $filters ); >>>> >>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >>>> $out->write_result($result); >>>> } >>>> >>>> sub hsp_filter { >>>> my $hsp = shift; >>>> return 1 if $hsp->length('total') > 100; >>>> } >>>> >>>> sub hit_filter { >>>> my $hit = shift; >>>> return 1 if $hit->significance < 1e-5; >>>> } >>>> >>>> chris >>>> >>>> >>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>> >>>> >>>> >>>> >>>> >>>>> You're right, it's not working there is some missing fields ... >>>>> >>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>> >>>>> -Display Result object as HTML >>>>> -Ask for filter criteria >>>>> -Filter Result object >>>>> -Display filtered Result object as HTML. >>>>> ... etc >>>>> >>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>> >>>>> I'll have a look to the modules you've mentioned, thanks. >>>>> >>>>> >>>>> >>>>> >>>>> Dave Messina wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Hi R?mi, >>>>>> >>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>> >>>>>> So I don't think the code you showed will work. >>>>>> >>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>> >>>>>> >>>>>> >>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>> >>>>>> Dave >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> >>>>> >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >> >> >> > From remi.planel at free.fr Mon May 31 09:47:40 2010 From: remi.planel at free.fr (Remi) Date: Mon, 31 May 2010 15:47:40 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> Message-ID: <4C03BDFC.5050109@free.fr> Yes, at the hit level everything works fine. Actually, at the hsp level, the alignment part is not written to the HTML file but the description before the alignment and the description of the hit at the beginning of the file are written. I had a quick look to the code and I'm not sure this is a bug. Chris Fields wrote: > That sounds like a bug. Does filtering at the hit level work around this? > > sub hit_filter { > my $hit = shift; > # filter hsps here > my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; > @passing_hsps; > } > > sub hsp_filter { > # original filter > } > > chris > > On May 31, 2010, at 4:19 AM, Remi wrote: > > >> Hi, >> >> Everything is working well but there is still one point that giving me some trouble. >> When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. >> Is there a way to get rid of this description line ? >> Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? >> >> Thanks, >> >> R?mi >> >> >> Chris Fields wrote: >> >>> Let us know how it goes, and if you run into any bugs. >>> >>> chris >>> >>> On May 28, 2010, at 9:31 AM, Remi wrote: >>> >>> >>> >>> >>>> Thank you very much !!!! >>>> I'm gonna try it right away >>>> >>>> Chris Fields wrote: >>>> >>>> >>>> >>>>> Remi, >>>>> >>>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>>> >>>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>>> >>>>> Something like the following should work (of course completely untested :) >>>>> >>>>> my $result = $in->next_result; >>>>> >>>>> # filter on HSP >>>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>>> >>>>> # rewind the result to go back to the beginning >>>>> $result->rewind; >>>>> >>>>> # open a new filehandle here for second report output >>>>> # filter on hit and HSP >>>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>>> 'HSP' => \&hsp_filter }); >>>>> >>>>> # rewind the result to go back to the beginning >>>>> $result->rewind; >>>>> >>>>> # and so on.... >>>>> >>>>> sub write_html { >>>>> my ($file, $result, $filters) = @_; >>>>> # note that $filter is a hash ref above >>>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>>> (-filters => $filters ); >>>>> >>>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >>>>> $out->write_result($result); >>>>> } >>>>> >>>>> sub hsp_filter { >>>>> my $hsp = shift; >>>>> return 1 if $hsp->length('total') > 100; >>>>> } >>>>> >>>>> sub hit_filter { >>>>> my $hit = shift; >>>>> return 1 if $hit->significance < 1e-5; >>>>> } >>>>> >>>>> chris >>>>> >>>>> >>>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> You're right, it's not working there is some missing fields ... >>>>>> >>>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>>> >>>>>> -Display Result object as HTML >>>>>> -Ask for filter criteria >>>>>> -Filter Result object >>>>>> -Display filtered Result object as HTML. >>>>>> ... etc >>>>>> >>>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>>> >>>>>> I'll have a look to the modules you've mentioned, thanks. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Dave Messina wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Hi R?mi, >>>>>>> >>>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>>> >>>>>>> So I don't think the code you showed will work. >>>>>>> >>>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>>> >>>>>>> Dave >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> >>>>>> >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> > > From cjfields at illinois.edu Mon May 31 09:54:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 31 May 2010 08:54:22 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4C03BDFC.5050109@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> <4C03BDFC.5050109@free.fr> Message-ID: <454FE98D-4EE5-4DFB-A877-6DE7822C4DA4@illinois.edu> My concern is to ensure we aren't filtering twice as much (one at the hit level, one pass at the HSP level). It should be one pass. chris On May 31, 2010, at 8:47 AM, Remi wrote: > Yes, at the hit level everything works fine. > Actually, at the hsp level, the alignment part is not written to the HTML file but the description before the alignment and the description of the hit at the beginning of the file are written. > > I had a quick look to the code and I'm not sure this is a bug. > > Chris Fields wrote: >> That sounds like a bug. Does filtering at the hit level work around this? >> >> sub hit_filter { >> my $hit = shift; >> # filter hsps here >> my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; >> @passing_hsps; >> } >> >> sub hsp_filter { >> # original filter >> } >> >> chris >> >> On May 31, 2010, at 4:19 AM, Remi wrote: >> >> >>> Hi, >>> >>> Everything is working well but there is still one point that giving me some trouble. >>> When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. >>> Is there a way to get rid of this description line ? >>> Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? >>> >>> Thanks, >>> >>> R?mi >>> >>> >>> Chris Fields wrote: >>> >>>> Let us know how it goes, and if you run into any bugs. >>>> >>>> chris >>>> >>>> On May 28, 2010, at 9:31 AM, Remi wrote: >>>> >>>> >>>> >>>>> Thank you very much !!!! >>>>> I'm gonna try it right away >>>>> >>>>> Chris Fields wrote: >>>>> >>>>> >>>>>> Remi, >>>>>> >>>>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>>>> Something like the following should work (of course completely untested :) >>>>>> >>>>>> my $result = $in->next_result; >>>>>> >>>>>> # filter on HSP >>>>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>>>> >>>>>> # rewind the result to go back to the beginning >>>>>> $result->rewind; >>>>>> >>>>>> # open a new filehandle here for second report output >>>>>> # filter on hit and HSP >>>>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>>>> 'HSP' => \&hsp_filter }); >>>>>> >>>>>> # rewind the result to go back to the beginning >>>>>> $result->rewind; >>>>>> >>>>>> # and so on.... >>>>>> >>>>>> sub write_html { >>>>>> my ($file, $result, $filters) = @_; >>>>>> # note that $filter is a hash ref above >>>>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>>>> (-filters => $filters ); >>>>>> >>>>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); $out->write_result($result); >>>>>> } >>>>>> >>>>>> sub hsp_filter { my $hsp = shift; >>>>>> return 1 if $hsp->length('total') > 100; >>>>>> } >>>>>> >>>>>> sub hit_filter { my $hit = shift; >>>>>> return 1 if $hit->significance < 1e-5; >>>>>> } >>>>>> >>>>>> chris >>>>>> >>>>>> >>>>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> You're right, it's not working there is some missing fields ... >>>>>>> >>>>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>>>> >>>>>>> -Display Result object as HTML >>>>>>> -Ask for filter criteria >>>>>>> -Filter Result object >>>>>>> -Display filtered Result object as HTML. >>>>>>> ... etc >>>>>>> >>>>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>>>> >>>>>>> I'll have a look to the modules you've mentioned, thanks. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Dave Messina wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi R?mi, >>>>>>>> >>>>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>>>> >>>>>>>> So I don't think the code you showed will work. >>>>>>>> >>>>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>>>> >>>>>>>> Dave >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> >>>>>>> >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From remi.planel at free.fr Mon May 31 05:19:30 2010 From: remi.planel at free.fr (Remi) Date: Mon, 31 May 2010 11:19:30 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> Message-ID: <4C037F22.3090209@free.fr> An HTML attachment was scrubbed... URL: From aradwen at gmail.com Sat May 1 10:45:18 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Sat, 1 May 2010 12:45:18 +0200 Subject: [Bioperl-l] Pfam_Scan Message-ID: Hello everyone, I would like to know if there is a way to cluster the output of Pfam_Scan results. I mean is we can parse it and then output clusters containing sequences sharing the same domains or Pfams. This is a bit special since we could have multidomains proteins inside, which rule we have to follow in this case ? Rad -- R. ANIBA From David.Messina at sbc.su.se Sat May 1 22:28:48 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 00:28:48 +0200 Subject: [Bioperl-l] Pfam_Scan In-Reply-To: References: Message-ID: <6CA3B4F2-CF3E-45DD-BE51-9F7218C5CEE9@sbc.su.se> Hi Rad, As far as I can tell the Pfam_Scan output is simply tab-delimited text (see details below), so you should be able to group sequences which share domains by sorting on the sixth column. I suspect that sequences with multiple domain hits will have multiple lines in the output, one per hit, so if you want to identify sequences which share the same _set_ of domains you will have to do the bookkeeping yourself. That being said, Pfam_Scan is not part of BioPerl ? it's distributed by the Pfam team ? so you may want to contact them directly for help (pfam-help at sanger.ac.uk). Dave [from the Pfam_Scan documentation] The output format is: Example output (with -pfamB, -as options): Q5NEL3.1 2 224 2 227 PB013481 Pfam-B_13481 Pfam-B 1 184 226 358.5 1.4e-107 NA NA O65039.1 38 93 38 93 PF08246 Inhibitor_I29 Domain 1 58 58 45.9 2.8e-12 1 No_clan O65039.1 126 342 126 342 PF00112 Peptidase_C1 Domain 1 216 216 296.0 1.1e-88 1 CL0125 predicted_active_site[150,285,307] From David.Messina at sbc.su.se Sun May 2 08:54:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 10:54:54 +0200 Subject: [Bioperl-l] RFC: SNP::Inherit In-Reply-To: References: Message-ID: Hi Christopher, Looks good! The only recommendation I would make is to change the namespace to Bio::SNP::Inherit. The convention on CPAN is to minimize the number of new toplevel namespaces (which SNP would be), and although many of the Bio::* modules are part of BioPerl, that namespace is not restricted to BioPerl and there are plenty of non-BioPerl packages there. Dave On Apr 29, 2010, at 10:26 PM, Christopher Bottoms wrote: > Dear Bioperl community, > > I was thinking of uploading a module to CPAN that converts SNP genotype data > to parental allele designations. Below is the perldoc. This is not a > "BioPerl" module per se, so I'm not sure what namespace to put it under. > > I would be glad to send anyone the source if they are interested in checking > it out more. I just did not want to send everyone an unsolicited attachment. > > Thank you for your time, > Christopher Bottoms (molecules) > From David.Messina at sbc.su.se Sun May 2 09:59:07 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 2 May 2010 11:59:07 +0200 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <4BDA986D.3020302@bii.a-star.edu.sg> References: <4BDA986D.3020302@bii.a-star.edu.sg> Message-ID: <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> Hi Dimitar, The syntax you want is: # Build a Genewise alignment factory my $factory = Bio::Tools::Run::Genewise->new(); # turn on the quiet switch $factory->QUIET(1); # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects my @genes = $factory->run($protein_seq, $genomic_seq); This turns out be incorrectly documented on the man page, at least in part: > Available Params: > > NB: These should be passed without the '-' or they will be ignored, > except switches such as 'hmmer' (which have no corresponding value) > which should be set on the factory object using the AUTOLOADed methods > of the same name. > > Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] > Alg [-kbyte,-alg] > HMM [-hmmer] > Output [-gff,-gener,-alb,-pal,-block,-divide] > Standard [-help,-version,-silent,-quiet,-errorlog] That is, these don't work as expected: $factory->quiet; $factory->quiet(1); due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. Dave From maj at fortinbras.us Sun May 2 19:28:22 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 2 May 2010 15:28:22 -0400 Subject: [Bioperl-l] new core developers Rob Buels and Dave Messina Message-ID: Hi Folks, On behalf of the core team, I am delighted to announce two new members: Rob Buels and Dave Messina. They are so, er, honored on the basis of their selfless work on the list, on IRC, in development of new modules and their active and sustained participation in BioPerl maintenance, design and promotion. Welcome Rob and Dave! MAJ and the BioPerl core developers From skastu01 at students.poly.edu Mon May 3 02:41:04 2010 From: skastu01 at students.poly.edu (Lakshmi Kastury) Date: Mon, 3 May 2010 02:41:04 +0000 Subject: [Bioperl-l] Using BIO::SEARCHIO Message-ID: I am attempting to use the BIO::SEARCHIO system to parse a Blast output file. A new instance is he file is read through the following: my $input = new BIO::SearchIO (-file =>'blast_report_0.txt', -format =>'blast'); When I run my program, I receive the following message: "Can't locate object method "new" via package "BIO::SearchIO" (perhaps you forgot to load "BIO::SearchIO"? Is this an optional module which needs to be installed separately? Thanks, Lakshmi Kastury From maj at fortinbras.us Mon May 3 02:57:28 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 2 May 2010 22:57:28 -0400 Subject: [Bioperl-l] Using BIO::SEARCHIO In-Reply-To: References: Message-ID: you need to say "Bio::SearchIO", and not "BIO::SearchIO" MAJ ----- Original Message ----- From: "Lakshmi Kastury" To: Sent: Sunday, May 02, 2010 10:41 PM Subject: [Bioperl-l] Using BIO::SEARCHIO > > > > > > > > > > > > I am attempting to use the BIO::SEARCHIO system to parse a Blast output file. > > A new instance is he file is read through the following: > my $input = new BIO::SearchIO (-file =>'blast_report_0.txt', -format > =>'blast'); > > When I run my program, I receive the following message: > "Can't locate object method "new" via package "BIO::SearchIO" (perhaps you > forgot to load "BIO::SearchIO"? > > Is this an optional module which needs to be installed separately? > > > > Thanks, > Lakshmi Kastury > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 3 04:22:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 2 May 2010 23:22:46 -0500 Subject: [Bioperl-l] Full bioperl-live github demo Message-ID: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> All, I have pushed a demo of the bioperl-live (all branches and tags) to github here: http://github.com/bioperl/bioperl-test This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. chris From heikki.lehvaslaiho at gmail.com Mon May 3 11:45:10 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 3 May 2010 14:45:10 +0300 Subject: [Bioperl-l] BLAST parsing broken Message-ID: Chris, latest additions to Bio::SearchIO::blast.pm broke the parsing of normal blast output. $result->query_name returns now undef. (Using the anonymous git now). This change still works: commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 Author: cjfields Date: Sun Dec 20 04:39:58 2009 +0000 Robson's patch for buggy blastpgp output But this does not: commit 9a89c3434597104dd50553e3562983d78d14a544 Author: cjfields Date: Thu Apr 15 04:21:17 2010 +0000 [bug 3031] patches for catching algorithm ref, courtesy Razi Khaja. That makes it easy to find the diffs: $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm index 378023a..6f7eeeb 100644 --- a/Bio/SearchIO/blast.pm +++ b/Bio/SearchIO/blast.pm @@ -209,6 +209,7 @@ BEGIN { 'BlastOutput_program' => 'RESULT-algorithm_name', 'BlastOutput_version' => 'RESULT-algorithm_version', + 'BlastOutput_algorithm-reference' => 'RESULT-algorithm_reference', 'BlastOutput_query-def' => 'RESULT-query_name', 'BlastOutput_query-len' => 'RESULT-query_length', 'BlastOutput_query-acc' => 'RESULT-query_accession', @@ -504,6 +505,26 @@ sub next_result { } ); } + # parse the BLAST algorithm reference + elsif(/^Reference:\s+(.*)$/) { + # want to preserve newlines for the BLAST algorithm reference + my $algorithm_reference = "$1\n"; + $_ = $self->_readline; + # while the current line, does not match an empty line, a RID:, or a Database:, we are still looking at the + # algorithm_reference, append it to what we parsed so far + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { + $algorithm_reference .= "$_"; + $_ = $self->_readline; + } + # if we exited the while loop, we saw an empty line, a RID:, or a Database:, so push it back + $self->_pushback($_); + $self->element( + { + 'Name' => 'BlastOutput_algorithm-reference', + 'Data' => $algorithm_reference + } + ); + } # added Windows workaround for bug 1985 elsif (/^(Searching|Results from round)/) { next unless $1 =~ /Results from round/; I am not sure why reference parsing messes things up. Maybe it eats too many lines from the result file. Yours, -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia From cjfields at illinois.edu Mon May 3 12:08:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 07:08:01 -0500 Subject: [Bioperl-l] BLAST parsing broken In-Reply-To: References: Message-ID: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Odd, I ran tests on that prior to commit. I'll work on fixing that (in svn of course, until the migration is complete). chris On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > Chris, > > latest additions to Bio::SearchIO::blast.pm broke the parsing of normal > blast output. $result->query_name returns now undef. > > (Using the anonymous git now). This change still works: > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > Author: cjfields > Date: Sun Dec 20 04:39:58 2009 +0000 > > Robson's patch for buggy blastpgp output > > But this does not: > > commit 9a89c3434597104dd50553e3562983d78d14a544 > Author: cjfields > Date: Thu Apr 15 04:21:17 2010 +0000 > > [bug 3031] > > patches for catching algorithm ref, courtesy Razi Khaja. > > That makes it easy to find the diffs: > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > index 378023a..6f7eeeb 100644 > --- a/Bio/SearchIO/blast.pm > +++ b/Bio/SearchIO/blast.pm > @@ -209,6 +209,7 @@ BEGIN { > > 'BlastOutput_program' => 'RESULT-algorithm_name', > 'BlastOutput_version' => 'RESULT-algorithm_version', > + 'BlastOutput_algorithm-reference' => 'RESULT-algorithm_reference', > 'BlastOutput_query-def' => 'RESULT-query_name', > 'BlastOutput_query-len' => 'RESULT-query_length', > 'BlastOutput_query-acc' => 'RESULT-query_accession', > @@ -504,6 +505,26 @@ sub next_result { > } > ); > } > + # parse the BLAST algorithm reference > + elsif(/^Reference:\s+(.*)$/) { > + # want to preserve newlines for the BLAST algorithm reference > + my $algorithm_reference = "$1\n"; > + $_ = $self->_readline; > + # while the current line, does not match an empty line, a RID:, > or a Database:, we are still looking at the > + # algorithm_reference, append it to what we parsed so far > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > + $algorithm_reference .= "$_"; > + $_ = $self->_readline; > + } > + # if we exited the while loop, we saw an empty line, a RID:, or > a Database:, so push it back > + $self->_pushback($_); > + $self->element( > + { > + 'Name' => 'BlastOutput_algorithm-reference', > + 'Data' => $algorithm_reference > + } > + ); > + } > # added Windows workaround for bug 1985 > elsif (/^(Searching|Results from round)/) { > next unless $1 =~ /Results from round/; > > > I am not sure why reference parsing messes things up. Maybe it eats too many > lines from the result file. > > Yours, > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon May 3 12:25:10 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 3 May 2010 08:25:10 -0400 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> Message-ID: Hi Chris, I attempted a clone and got the following. Is this my problem? thanks MAJ $ git clone http://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ Getting alternates list for http://github.com/bioperl/bioperl-test.git Getting pack list for http://github.com/bioperl/bioperl-test.git Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, May 03, 2010 12:22 AM Subject: [Bioperl-l] Full bioperl-live github demo > All, > > I have pushed a demo of the bioperl-live (all branches and tags) to github > here: > > http://github.com/bioperl/bioperl-test > > This is separate from the 'bioperl-live' repo at the same github account for > the time being. The conversion was performed using svn2git (the gitorious > C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), > using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and > rerun can be performed very quickly. The actual conversion of the entire > bioperl repo took very little time, actually (less than 3 minutes). I think, > with some additional small work using the svn2git rules pretty much everything > is ready for migration. > > In this run, all subversion tags are converted to git tags (branches remain > git branches as expected). Just in case I'm missing something, I would like > everyone to take a look at this, though. In particular, I would like to make > sure tags and branches are as they are expected. So far I haven't seen > anything that stands out as odd. > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 3 13:07:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 08:07:46 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> Message-ID: <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): cjfields$ git clone git://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ remote: Counting objects: 86737, done. remote: Compressing objects: 100% (22309/22309), done. remote: Total 86737 (delta 64759), reused 85957 (delta 63979) Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. Resolving deltas: 100% (64759/64759), done. For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? chris On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > Hi Chris, > I attempted a clone and got the following. Is this my problem? > thanks MAJ > > $ git clone http://github.com/bioperl/bioperl-test.git > > Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ > Getting alternates list for http://github.com/bioperl/bioperl-test.git > Getting pack list for http://github.com/bioperl/bioperl-test.git > Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c > Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 > Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c > which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f > error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile > fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed > > > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, May 03, 2010 12:22 AM > Subject: [Bioperl-l] Full bioperl-live github demo > > >> All, >> >> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >> >> http://github.com/bioperl/bioperl-test >> >> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >> >> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 3 13:19:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 08:19:17 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <8796492301724F2CA132F97AE57C2700@NewLife> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> Message-ID: Added you in. SSH access should work with any ssh keys you have set in github. We can play around with this for the time being (try post commit hooks, etc), but obviously can't make any serious commits to it until we are ready for complete migration; everything will still need to go to dev svn until then. Also noticed that we are topping the account out at the moment, but removing the old read-only repos should help. May need to think about that in the long-term. chris On May 3, 2010, at 8:13 AM, Mark A. Jensen wrote: > That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with > majensen > cheers Chris- MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Monday, May 03, 2010 9:07 AM > Subject: Re: [Bioperl-l] Full bioperl-live github demo > > > This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): > > cjfields$ git clone git://github.com/bioperl/bioperl-test.git > Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ > remote: Counting objects: 86737, done. > remote: Compressing objects: 100% (22309/22309), done. > remote: Total 86737 (delta 64759), reused 85957 (delta 63979) > Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. > Resolving deltas: 100% (64759/64759), done. > > For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? > > chris > > On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > >> Hi Chris, >> I attempted a clone and got the following. Is this my problem? >> thanks MAJ >> >> $ git clone http://github.com/bioperl/bioperl-test.git >> >> Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ >> Getting alternates list for http://github.com/bioperl/bioperl-test.git >> Getting pack list for http://github.com/bioperl/bioperl-test.git >> Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 >> Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f >> error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile >> fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed >> >> >> ----- Original Message ----- From: "Chris Fields" >> To: "BioPerl List" >> Sent: Monday, May 03, 2010 12:22 AM >> Subject: [Bioperl-l] Full bioperl-live github demo >> >> >>> All, >>> >>> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >>> >>> http://github.com/bioperl/bioperl-test >>> >>> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >>> >>> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From maj at fortinbras.us Mon May 3 13:13:27 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 3 May 2010 09:13:27 -0400 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> Message-ID: <8796492301724F2CA132F97AE57C2700@NewLife> That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with majensen cheers Chris- MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Monday, May 03, 2010 9:07 AM Subject: Re: [Bioperl-l] Full bioperl-live github demo This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): cjfields$ git clone git://github.com/bioperl/bioperl-test.git Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ remote: Counting objects: 86737, done. remote: Compressing objects: 100% (22309/22309), done. remote: Total 86737 (delta 64759), reused 85957 (delta 63979) Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. Resolving deltas: 100% (64759/64759), done. For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? chris On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > Hi Chris, > I attempted a clone and got the following. Is this my problem? > thanks MAJ > > $ git clone http://github.com/bioperl/bioperl-test.git > > Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ > Getting alternates list for http://github.com/bioperl/bioperl-test.git > Getting pack list for http://github.com/bioperl/bioperl-test.git > Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c > Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 > Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c > which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f > error: file > /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack > is not a GIT packfile > fatal: packfile > /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack > cannot be accessed > > > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, May 03, 2010 12:22 AM > Subject: [Bioperl-l] Full bioperl-live github demo > > >> All, >> >> I have pushed a demo of the bioperl-live (all branches and tags) to github >> here: >> >> http://github.com/bioperl/bioperl-test >> >> This is separate from the 'bioperl-live' repo at the same github account for >> the time being. The conversion was performed using svn2git (the gitorious >> C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), >> using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and >> rerun can be performed very quickly. The actual conversion of the entire >> bioperl repo took very little time, actually (less than 3 minutes). I think, >> with some additional small work using the svn2git rules pretty much >> everything is ready for migration. >> >> In this run, all subversion tags are converted to git tags (branches remain >> git branches as expected). Just in case I'm missing something, I would like >> everyone to take a look at this, though. In particular, I would like to make >> sure tags and branches are as they are expected. So far I haven't seen >> anything that stands out as odd. >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 3 14:04:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 3 May 2010 09:04:16 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <8796492301724F2CA132F97AE57C2700@NewLife> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> Message-ID: <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> I like this: http://github.com/bioperl/bioperl-test/graphs/impact Kinda cool yet scary. chris On May 3, 2010, at 8:13 AM, Mark A. Jensen wrote: > That's it-- the github site sez http, but that must be for a simple copy (however you do that...) I'm on github with > majensen > cheers Chris- MAJ > ----- Original Message ----- From: "Chris Fields" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Monday, May 03, 2010 9:07 AM > Subject: Re: [Bioperl-l] Full bioperl-live github demo > > > This worked for me (note the URL is git, not http; on Mac OS X 10.6, git 1.6.4.1): > > cjfields$ git clone git://github.com/bioperl/bioperl-test.git > Initialized empty Git repository in /Users/cjfields/gittest/bioperl-test/.git/ > remote: Counting objects: 86737, done. > remote: Compressing objects: 100% (22309/22309), done. > remote: Total 86737 (delta 64759), reused 85957 (delta 63979) > Receiving objects: 100% (86737/86737), 143.24 MiB | 1536 KiB/s, done. > Resolving deltas: 100% (64759/64759), done. > > For dev access (ssh or https) we need to set up collaborators within the github bioperl account. I'll add a few. Do you have a github acct set up? > > chris > > On May 3, 2010, at 7:25 AM, Mark A. Jensen wrote: > >> Hi Chris, >> I attempted a clone and got the following. Is this my problem? >> thanks MAJ >> >> $ git clone http://github.com/bioperl/bioperl-test.git >> >> Initialized empty Git repository in /...../bioperl/github/bioperl-test/.git/ >> Getting alternates list for http://github.com/bioperl/bioperl-test.git >> Getting pack list for http://github.com/bioperl/bioperl-test.git >> Getting index for pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> Getting index for pack 9530b04c1b4f494c2c9163775fdc00eab975caa6 >> Getting pack 809561bb87edbf2bef183164ceb96cd6099ee06c >> which contains 5ac325fa636b50ca1163d79feceb00eff1fa738f >> error: file /...../github/bioperl-test/.git/objects/pack/pack-809561bb87edbf2bef183164ceb96cd6099ee06c.pack is not a GIT packfile >> fatal: packfile /...../github/bioperl-test/.git/objects/pack/pack809561bb87edbf2bef183164ceb96cd6099ee06c.pack cannot be accessed >> >> >> ----- Original Message ----- From: "Chris Fields" >> To: "BioPerl List" >> Sent: Monday, May 03, 2010 12:22 AM >> Subject: [Bioperl-l] Full bioperl-live github demo >> >> >>> All, >>> >>> I have pushed a demo of the bioperl-live (all branches and tags) to github here: >>> >>> http://github.com/bioperl/bioperl-test >>> >>> This is separate from the 'bioperl-live' repo at the same github account for the time being. The conversion was performed using svn2git (the gitorious C++/Qt version from the KDE project migration, Jonathan Leto's suggestion), using the rsync'ed svn repo via ssh from dev.open-bio.org, so an update and rerun can be performed very quickly. The actual conversion of the entire bioperl repo took very little time, actually (less than 3 minutes). I think, with some additional small work using the svn2git rules pretty much everything is ready for migration. >>> >>> In this run, all subversion tags are converted to git tags (branches remain git branches as expected). Just in case I'm missing something, I would like everyone to take a look at this, though. In particular, I would like to make sure tags and branches are as they are expected. So far I haven't seen anything that stands out as odd. >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mnrusimh at gmail.com Mon May 3 22:42:41 2010 From: mnrusimh at gmail.com (Ram Podicheti) Date: Mon, 03 May 2010 18:42:41 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID Message-ID: <4BDF5161.4030209@gmail.com> Is there a way to obtain the Ensembl Gene ID from an Entrez Gene ID? In other words, I am hoping to get 'ENSMUSG00000029372' as the output when I supply 57349. Many thanks, Ram Podicheti From sdavis2 at mail.nih.gov Mon May 3 23:14:58 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 3 May 2010 19:14:58 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID In-Reply-To: <4BDF5161.4030209@gmail.com> References: <4BDF5161.4030209@gmail.com> Message-ID: On Mon, May 3, 2010 at 6:42 PM, Ram Podicheti wrote: > Is there a way to obtain the Ensembl Gene ID from an Entrez Gene ID? In > other words, I am hoping to get 'ENSMUSG00000029372' as the output when > I supply 57349. > Check out the Biomart interface to Ensembl. You can supply any type of ID as a filter and get back gene information, including the ID, that map to that ID. I believe there is a perl interface to biomart, but I haven't used it to comment directly. There is also an R/Bioconductor interface. Sean From mnrusimh at gmail.com Tue May 4 00:42:49 2010 From: mnrusimh at gmail.com (Ram Podicheti) Date: Mon, 03 May 2010 20:42:49 -0400 Subject: [Bioperl-l] Mapping Entrez Gene ID to Ensembl Gene ID In-Reply-To: References: <4BDF5161.4030209@gmail.com> Message-ID: <4BDF6D89.2000408@gmail.com> Thanks Sean, that definitely helped. Ram Sean Davis wrote: > > > On Mon, May 3, 2010 at 6:42 PM, Ram Podicheti > wrote: > > Is there a way to obtain the Ensembl Gene ID from an Entrez Gene > ID? In > other words, I am hoping to get 'ENSMUSG00000029372' as the output > when > I supply 57349. > > > Check out the Biomart interface to Ensembl. You can supply any type > of ID as a filter and get back gene information, including the ID, > that map to that ID. I believe there is a perl interface to biomart, > but I haven't used it to comment directly. There is also an > R/Bioconductor interface. > > Sean > From razi.khaja at gmail.com Tue May 4 17:55:00 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Tue, 4 May 2010 13:55:00 -0400 Subject: [Bioperl-l] BLAST parsing broken In-Reply-To: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: That is odd. Heikki, do you have a blast output file that produces this error? Could you attach the file and either send to the list or myself (if the list does not accept attachments). Thanks, Razi On Mon, May 3, 2010 at 8:08 AM, Chris Fields wrote: > Odd, I ran tests on that prior to commit. I'll work on fixing that (in svn > of course, until the migration is complete). > > chris > > On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > > > Chris, > > > > latest additions to Bio::SearchIO::blast.pm broke the parsing of normal > > blast output. $result->query_name returns now undef. > > > > (Using the anonymous git now). This change still works: > > > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > Author: cjfields > > Date: Sun Dec 20 04:39:58 2009 +0000 > > > > Robson's patch for buggy blastpgp output > > > > But this does not: > > > > commit 9a89c3434597104dd50553e3562983d78d14a544 > > Author: cjfields > > Date: Thu Apr 15 04:21:17 2010 +0000 > > > > [bug 3031] > > > > patches for catching algorithm ref, courtesy Razi Khaja. > > > > That makes it easy to find the diffs: > > > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > > index 378023a..6f7eeeb 100644 > > --- a/Bio/SearchIO/blast.pm > > +++ b/Bio/SearchIO/blast.pm > > @@ -209,6 +209,7 @@ BEGIN { > > > > 'BlastOutput_program' => 'RESULT-algorithm_name', > > 'BlastOutput_version' => 'RESULT-algorithm_version', > > + 'BlastOutput_algorithm-reference' => > 'RESULT-algorithm_reference', > > 'BlastOutput_query-def' => 'RESULT-query_name', > > 'BlastOutput_query-len' => 'RESULT-query_length', > > 'BlastOutput_query-acc' => 'RESULT-query_accession', > > @@ -504,6 +505,26 @@ sub next_result { > > } > > ); > > } > > + # parse the BLAST algorithm reference > > + elsif(/^Reference:\s+(.*)$/) { > > + # want to preserve newlines for the BLAST algorithm > reference > > + my $algorithm_reference = "$1\n"; > > + $_ = $self->_readline; > > + # while the current line, does not match an empty line, a > RID:, > > or a Database:, we are still looking at the > > + # algorithm_reference, append it to what we parsed so far > > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > > + $algorithm_reference .= "$_"; > > + $_ = $self->_readline; > > + } > > + # if we exited the while loop, we saw an empty line, a RID:, > or > > a Database:, so push it back > > + $self->_pushback($_); > > + $self->element( > > + { > > + 'Name' => 'BlastOutput_algorithm-reference', > > + 'Data' => $algorithm_reference > > + } > > + ); > > + } > > # added Windows workaround for bug 1985 > > elsif (/^(Searching|Results from round)/) { > > next unless $1 =~ /Results from round/; > > > > > > I am not sure why reference parsing messes things up. Maybe it eats too > many > > lines from the result file. > > > > Yours, > > > > -Heikki > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > #4216 > > 4700 King Abdullah University of Science and Technology (KAUST) > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue May 4 18:18:02 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 4 May 2010 14:18:02 -0400 Subject: [Bioperl-l] parsing GenBank file Message-ID: Hi All, i have a huge GenBank file ( downloaded from RDP containing all bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). I wrote a simple script for this: #!/usr/bin/perl -w use Bio::SeqIO; my $seqio_object = Bio::SeqIO->new(-file => "$ARGV[0]"); while(my $seq_object = $seqio_object->next_seq){ my $id = $seq_object->id; print "$id\t"; my $species_object = $seq_object->species; my @classification = $seq_object->species->classification; foreach my $val (@classification){print "$val\t";} print "\n"; } I am getting the output like: S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root S000148973 uncultured Geothrix sp. Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root S000431649 uncultured Acidobacteria bacterium Geothrix Holophagaceae Holophagales Holophagae "Acidobacteria" Bacteria Root .. .. This is the exact output i want, but i am missing lot of records (they are there in the genbank file but not in my output). I also got a warning during parsing: --------------------- WARNING --------------------- MSG: Unbalanced quote in: /db_xref="taxon:35783" /germline" /mol_type="genomic DNA" /organism="Enterococcus sp." /strain="LMG12316"No further qualifiers will be added for this feature --------------------------------------------------- So i was just wondering that is this warning message causing that problem or i am doing something wrong? Thanks Shalabh From jay at jays.net Wed May 5 03:30:25 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 4 May 2010 22:30:25 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? Message-ID: $work[0] wants me to fire up Buildbot + Smolder to know when and who broke our tests, and how quickly (or not) our test count is growing over time. Then #moose asked me if I could also host the same for Moose and Class::MOP. And $work[1] uses the heck out of BioPerl. So I'm wondering if I can leverage all my synergies somehow and also host for BioPerl. http://buildbot.net/trac http://sourceforge.net/projects/smolder/ Has anything happened since this 2008 thread?: Subject: Test coverage for BioPerl now available http://article.gmane.org/gmane.comp.lang.perl.bio.general/17731/match=smolder If this would be a Good Thing for BioPerl I could try to try... :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Wed May 5 04:24:51 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 4 May 2010 23:24:51 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: Message-ID: On May 4, 2010, at 10:30 PM, Jay Hannah wrote: > http://sourceforge.net/projects/smolder/ Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) http://search.cpan.org/perldoc?Smolder http://github.com/mpeters/smolder Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From dimitark at bii.a-star.edu.sg Wed May 5 06:58:21 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Wed, 05 May 2010 14:58:21 +0800 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> References: <4BDA986D.3020302@bii.a-star.edu.sg> <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> Message-ID: <4BE1170D.8040108@bii.a-star.edu.sg> Hi Dave, thank you for the tip. Now it works like a charm :) Greetings Dimitar On 05/02/2010 05:59 PM, Dave Messina wrote: > Hi Dimitar, > > The syntax you want is: > > # Build a Genewise alignment factory > my $factory = Bio::Tools::Run::Genewise->new(); > > # turn on the quiet switch > $factory->QUIET(1); > > # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects > my @genes = $factory->run($protein_seq, $genomic_seq); > > > This turns out be incorrectly documented on the man page, at least in part: > >> Available Params: >> >> NB: These should be passed without the '-' or they will be ignored, >> except switches such as 'hmmer' (which have no corresponding value) >> which should be set on the factory object using the AUTOLOADed methods >> of the same name. >> >> Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] >> Alg [-kbyte,-alg] >> HMM [-hmmer] >> Output [-gff,-gener,-alb,-pal,-block,-divide] >> Standard [-help,-version,-silent,-quiet,-errorlog] >> > > That is, these don't work as expected: > > $factory->quiet; > $factory->quiet(1); > > due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. > > And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. > > > So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. > > > Dave > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore email: dimitark at bii.a-star.edu.sg tel: +65 6478 8514 From dimitark at bii.a-star.edu.sg Wed May 5 07:06:04 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Wed, 05 May 2010 15:06:04 +0800 Subject: [Bioperl-l] about gene "boundaries" In-Reply-To: References: <4BD8357B.5030804@bii.a-star.edu.sg> <24714E9B-B3E5-4703-92F8-64483FA59AFC@illinois.edu> <4BD90F94.4040608@bii.a-star.edu.sg> Message-ID: <4BE118DC.7000806@bii.a-star.edu.sg> Hi Malcolm, thank you very much for that information. Didnt even know such program existed :) I now use 'blastdbcmd' for extraction of DNA sequence from my DB. I only had to reformat my DB with 'parse seqids' parameter in order to be able to give the 'entry' parameter to 'blastdbcmd'. Now my script is working. Thanx again. Cheers Dimitar On 04/30/2010 10:16 PM, Cook, Malcolm wrote: > Dimitar, > > Since you have indexed your database with makeblastdb, you might simply use `blastdbcmd` to extract, in fasta format, sub-sequences from the indexed database using identifiers and integer ranges > > blastdbcmd is included in the blast+ suite of programs, which also included makeblastdb which you report you have running. > > see: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/user_maual.pdf > > I've not (yet) used the blast+ suite (still using the old blast) so I've not tested this myself yet, but I think something like the following will work for you: > > blastdbcmd -db yourBlastDatabase -entry chr2 -range 100-300 -outformat fasta > > will extract chr2:100-300 from yourBlastDatabase > > Good Luck > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dimitar Kenanov > Sent: Wednesday, April 28, 2010 11:48 PM > To: Chris Fields; bioperl-l at bioperl.org; scott at scottcain.net; hrh at fmi.ch > Subject: Re: [Bioperl-l] about gene "boundaries" > > Hi guys, > today with rested head and after some reading i found the solution to my problem in BioPerl. Its Bio::DB::Fasta. It does what i want sufficiently well. > Thank you again for the help and im sorry for the trouble caused. > > Cheers > Dimitar > > On 04/28/2010 11:10 PM, Chris Fields wrote: > >> By local DB, do you mean a BioPerl-based local DB? Or is it something else? This is a bit vague. >> >> On the BioPerl side I suggest looking into Bio::DB::SeqFeature::Store for storing and querying genome information (it does exactly what you want if the proper information is loaded), or maybe the Ensembl Perl API, which can be used with a local or remote Ensembl setup. Beyond that you'll need to be more specific. >> >> chris >> >> On Apr 28, 2010, at 8:17 AM, Dimitar Kenanov wrote: >> >> >> >>> Hello guys, >>> i have a question about gene "boundaries". Is there some module in BioPerl which can help me extract the DNA sequence from a genomic DB (from specific chromosome). I have my human genome in a local DB and some "from-to" data sets corresponding to different chromosomes. So i want to get the DNA seqs for these from-to's. I know i can do that the normal way but if there is a way to do it with BioPerl it will be more consistent with the rest of the code. >>> >>> Thanks for any tips :) >>> >>> Cheers >>> Dimitar >>> >>> -- >>> Dimitar Kenanov >>> Postdoctoral research fellow >>> Protein Sequence Analysis Group >>> Bioinformatics Institute >>> A*STAR, Singapore >>> email: dimitark at bii.a-star.edu.sg >>> tel: +65 6478 8514 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > email: dimitark at bii.a-star.edu.sg > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore email: dimitark at bii.a-star.edu.sg tel: +65 6478 8514 From David.Messina at sbc.su.se Wed May 5 07:46:17 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 09:46:17 +0200 Subject: [Bioperl-l] question about Bio::Tools::Run::Genewise In-Reply-To: <4BE1170D.8040108@bii.a-star.edu.sg> References: <4BDA986D.3020302@bii.a-star.edu.sg> <93826AEB-FD69-4741-B99D-16778F6F3C89@sbc.su.se> <4BE1170D.8040108@bii.a-star.edu.sg> Message-ID: <9F2DC6C9-7707-4C4A-8DE1-0B37387F7F8A@sbc.su.se> Great, glad to hear that. Thanks for letting us know about the problem! Dave On May 5, 2010, at 8:58, Dimitar Kenanov wrote: > Hi Dave, > thank you for the tip. Now it works like a charm :) > > Greetings > Dimitar > > > On 05/02/2010 05:59 PM, Dave Messina wrote: >> Hi Dimitar, >> >> The syntax you want is: >> >> # Build a Genewise alignment factory >> my $factory = Bio::Tools::Run::Genewise->new(); >> >> # turn on the quiet switch >> $factory->QUIET(1); >> >> # @genes is an array of Bio::SeqFeature::Gene::GeneStructure objects >> my @genes = $factory->run($protein_seq, $genomic_seq); >> >> >> This turns out be incorrectly documented on the man page, at least in part: >> >>> Available Params: >>> >>> NB: These should be passed without the '-' or they will be ignored, >>> except switches such as 'hmmer' (which have no corresponding value) >>> which should be set on the factory object using the AUTOLOADed methods >>> of the same name. >>> >>> Model [-codon,-gene,-cfreq,-splice,-subs,-indel,-intron,-null] >>> Alg [-kbyte,-alg] >>> HMM [-hmmer] >>> Output [-gff,-gener,-alb,-pal,-block,-divide] >>> Standard [-help,-version,-silent,-quiet,-errorlog] >>> >> >> That is, these don't work as expected: >> >> $factory->quiet; >> $factory->quiet(1); >> >> due to a conflict with the quiet() method inherited from Bio::Tools::Run::WrapperBase. >> >> And passing a true value such as 1 is in fact necessary or the switch-type parameters won't be set. >> >> >> So it looks like the parameter passing system in B::T::R::Genewise might benefit from some revision. I'll put this on the bugtracker. >> >> >> Dave >> >> > > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > email: dimitark at bii.a-star.edu.sg > tel: +65 6478 8514 > From torsten.seemann at infotech.monash.edu.au Wed May 5 07:48:55 2010 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 5 May 2010 17:48:55 +1000 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: > ? ? ?i have a huge GenBank file ( downloaded from RDP containing all > bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). > I am getting the output like: > S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > Holophagales Holophagae "Acidobacteria" Bacteria Root > This is the exact output i want, but i am missing lot of records (they are > there in the genbank file but not in my output). > I also got a warning during parsing: > --------------------- WARNING --------------------- > MSG: Unbalanced quote in: > /db_xref="taxon:35783" /germline" > /mol_type="genomic DNA" > /organism="Enterococcus sp." > /strain="LMG12316"No further qualifiers will be added for this feature > --------------------------------------------------- > So i was just wondering that is this warning message causing that problem or > i am doing something wrong? "Unbalanced quote" means there is not an even number (multiple of 2) double-quote (") symbols around the tag's value. I can see that the first line (below) looks problematic: YOU HAVE: /db_xref="taxon:35783" /germline" SHOULD BE: /db_xref="taxon:35783" /germline I suspect there is a problem either with RDP's genbank producer, or Bioperl is having problem with the "germline" qualifier which is a 'null valued' qualifier like /pseudo - it takes no ="value" string. (I think in Bioperl this is handled by setting the value to "_no_value" ?) http://www.ncbi.nlm.nih.gov/collab/FT/ Qualifier /germline Definition the sequence presented in the entry has not undergone somatic rearrangement as part of an adaptive immune response; it is the unrearranged sequence that was inherited from the parental germline Value format none Example /germline Comment /germline should not be used to indicate that the source of the sequence is a gamete or germ cell; /germline and /rearranged cannot be used in the same source feature; /germline and /rearranged should only be used for molecules that can undergo somatic rearrangements as part of an adaptive immune response; these are the T-cell receptor (TCR) and immunoglobulin loci in the jawed vertebrates, and the unrelated variable lymphocyte receptor (VLR) locus in the jawless fish (lampreys and hagfish); /germline and /rearranged should not be used outside of the Craniata (taxid=89593) --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From cjfields at illinois.edu Wed May 5 12:12:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 07:12:30 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: Message-ID: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> On May 4, 2010, at 11:24 PM, Jay Hannah wrote: > On May 4, 2010, at 10:30 PM, Jay Hannah wrote: >> http://sourceforge.net/projects/smolder/ > > Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) > > http://search.cpan.org/perldoc?Smolder > http://github.com/mpeters/smolder > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? chris From cjfields at illinois.edu Wed May 5 12:30:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 07:30:30 -0500 Subject: [Bioperl-l] using default string values for undef/empty, was Re: parsing GenBank file In-Reply-To: References: Message-ID: On May 5, 2010, at 2:48 AM, Torsten Seemann wrote: >> i have a huge GenBank file ( downloaded from RDP containing all >> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM). >> I am getting the output like: >> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >> Holophagales Holophagae "Acidobacteria" Bacteria Root >> This is the exact output i want, but i am missing lot of records (they are >> there in the genbank file but not in my output). >> I also got a warning during parsing: >> --------------------- WARNING --------------------- >> MSG: Unbalanced quote in: >> /db_xref="taxon:35783" /germline" >> /mol_type="genomic DNA" >> /organism="Enterococcus sp." >> /strain="LMG12316"No further qualifiers will be added for this feature >> --------------------------------------------------- >> So i was just wondering that is this warning message causing that problem or >> i am doing something wrong? > > "Unbalanced quote" means there is not an even number (multiple of 2) > double-quote (") symbols around the tag's value. I can see that the > first line (below) looks problematic: > > YOU HAVE: > > /db_xref="taxon:35783" /germline" > > SHOULD BE: > > /db_xref="taxon:35783" > /germline > > I suspect there is a problem either with RDP's genbank producer, or > Bioperl is having problem with the "germline" qualifier which is a > 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > think in Bioperl this is handled by setting the value to "_no_value" > ?) > ... > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA Ugh, didn't notice the '_no_value' bit. Probably my opinion, but I don't like stubs like that as they tend to be brittle and run into issues (like this one, for instance). I would prefer we just leave that as undef and only quote defined values (with the exceptions in %FTQUAL_NO_QUOTE). Any reason for this behavior (is it related to ORM-related stuff like bioperl-db)? Can we change that to something a bit more realistic? chris From David.Messina at sbc.su.se Wed May 5 13:00:39 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 15:00:39 +0200 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> Message-ID: <252790EC-6A2D-4DFA-B2A0-8D0F8E169E30@sbc.su.se> Yeah, absolutely, Jay! it would be wonderful to have this for BioPerl. Dave On May 5, 2010, at 14:12, Chris Fields wrote: > On May 4, 2010, at 11:24 PM, Jay Hannah wrote: > >> On May 4, 2010, at 10:30 PM, Jay Hannah wrote: >>> http://sourceforge.net/projects/smolder/ >> >> Correction: Smolder abandoned Sourceforge, and barely left a forwarding address. :) >> >> http://search.cpan.org/perldoc?Smolder >> http://github.com/mpeters/smolder >> >> Jay Hannah >> http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? > > chris From cjfields at illinois.edu Wed May 5 14:46:23 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 09:46:23 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub Message-ID: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> All, I would like to finalize moving over to git/github very soon. We're sort of in limbo on this, so it needs to progress forward. We'll need to do some initial cleanup after the move (Heikki is already doing a few things on the test repo, which we'll need to diff over to the new one). So with that in mind, here are my thoughts. This is copied over to this wiki page, in case you don't want to reply here: http://www.bioperl.org/wiki/From_SVN_to_Git (thanks Mark!) 1) Timeline When? Sooner the better (weeks as opposed to months). Our anon. svn is down, likely permanently (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). 2) Migration strategy Now mainly worked out using svn2git, which is very fast. We would need to make the svn repo on dev read-only during this transition. My guess is it would take very little time. Do we want to retain the git-SVN metadata on commits? This is viewable with our current read-only mirror on github: http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca 3) Developers Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly The current authors file used for mapping commit authors to emails used their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I think, once one has signed up with github, you can add that same address to your current ones, and it should map to your github account. If we use dev.open-bio.org as our central git repo, we won't need to go through with that, but we will need a viewable version of dev available somehow (mirrored on github or otherwise). Speaking of... 4) Development strategy Are we sticking with a single centralized repo (SVN-like)? Will that be github, or will github be a downstream repo to our work on dev? We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). Git makes it very easy to make branches and merge in code to trunk. With that in mind, I would highly suggest we start working on branches for almost everything and merge over to trunk. There is very little to no overhead in doing so with git. I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model Also, several points were raised in a related project (Parrot) considering a move to git/github from svn. One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. 5) Encouraging outside contributors Do we want to adopt a policy similar to Moose? http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod This is easy with github and forks. 6) SVN Read/Write to GitHub It was recently announced that one can access a github repo using subversion as read-only, and just yesterday experimental write to github is allowed: http://github.com/blog/644-subversion-write-support I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? 7) Others? chris From shalabh.sharma7 at gmail.com Wed May 5 14:46:19 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 10:46:19 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: Hi Torsten, Thanks for pointing that out. But this is just a warning, it will not break the script. i found the the point where script is breaking. Its breaking and giving this message: Can't call method "classification" on an undefined value at parseGB.pl line 9, line 10067733. So the script is breaking when its coming to this record: LOCUS S001198291 1521 bp rRNA linear BCT 02-Feb-2009 DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2. ACCESSION AP010656 REGION: 61786..63306 PROJECT GenomeProject:29025 SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". REFERENCE 1 (bases 1 to 1521) AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; TITLE ; JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. Contact:Atsushi Toyoda National Institute of Genetics, Comparative Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, Japan REFERENCE 2 AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor T.D., Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; It is unable to parse this record, but i don't understand why it is doing so? The only reason i can think of is the organism's name which is very long as compared to others. Thanks Shalabh On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < torsten.seemann at infotech.monash.edu.au> wrote: > > i have a huge GenBank file ( downloaded from RDP containing all > > bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > linage (in ORGANISM). > > I am getting the output like: > > S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > > Holophagales Holophagae "Acidobacteria" Bacteria Root > > This is the exact output i want, but i am missing lot of records (they > are > > there in the genbank file but not in my output). > > I also got a warning during parsing: > > --------------------- WARNING --------------------- > > MSG: Unbalanced quote in: > > /db_xref="taxon:35783" /germline" > > /mol_type="genomic DNA" > > /organism="Enterococcus sp." > > /strain="LMG12316"No further qualifiers will be added for this feature > > --------------------------------------------------- > > So i was just wondering that is this warning message causing that problem > or > > i am doing something wrong? > > "Unbalanced quote" means there is not an even number (multiple of 2) > double-quote (") symbols around the tag's value. I can see that the > first line (below) looks problematic: > > YOU HAVE: > > /db_xref="taxon:35783" /germline" > > SHOULD BE: > > /db_xref="taxon:35783" > /germline > > I suspect there is a problem either with RDP's genbank producer, or > Bioperl is having problem with the "germline" qualifier which is a > 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > think in Bioperl this is handled by setting the value to "_no_value" > ?) > > http://www.ncbi.nlm.nih.gov/collab/FT/ > > Qualifier /germline > Definition the sequence presented in the entry has not undergone > somatic > rearrangement as part of an adaptive immune response; it is > the > unrearranged sequence that was inherited from the parental > germline > Value format none > Example /germline > Comment /germline should not be used to indicate that the source of > the sequence is a gamete or germ cell; > /germline and /rearranged cannot be used in the same source > feature; > /germline and /rearranged should only be used for molecules > that > can undergo somatic rearrangements as part of an > adaptive immune > response; these are the T-cell receptor (TCR) and > immunoglobulin > loci in the jawed vertebrates, and the unrelated variable > lymphocyte receptor (VLR) locus in the jawless fish > (lampreys > and hagfish); > /germline and /rearranged should not be used outside of the > Craniata (taxid=89593) > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > From cjfields at illinois.edu Wed May 5 15:32:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 10:32:41 -0500 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: Message-ID: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Shalabh, What is the source of this file? It's not from GenBank; if I look up the parent sequence using Bio::DB::GenBank it works fine: use Modern::Perl; use Bio::DB::GenBank; my $id = 'AP010656'; my $gb = Bio::DB::GenBank->new(); my $seq = $gb->get_Seq_by_acc($id); say join(',',$seq->species->classification); chris On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > Hi Torsten, > Thanks for pointing that out. But this is just a warning, > it will not break the script. i found the the point where script is > breaking. > Its breaking and giving this message: > Can't call method "classification" on an undefined value at parseGB.pl line > 9, line 10067733. > > So the script is breaking when its coming to this record: > > LOCUS S001198291 1521 bp rRNA linear BCT > 02-Feb-2009 > DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2. > ACCESSION AP010656 REGION: 61786..63306 > PROJECT GenomeProject:29025 > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > "Bacteroidales"; > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > REFERENCE 1 (bases 1 to 1521) > AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; > TITLE ; > JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > Contact:Atsushi Toyoda National Institute of Genetics, > Comparative > Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > Japan > REFERENCE 2 > AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > T.D., > Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > > It is unable to parse this record, but i don't understand why it is doing > so? The only reason i can think of is the organism's name which is very long > as compared to others. > > Thanks > Shalabh > > > > On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > torsten.seemann at infotech.monash.edu.au> wrote: > >>> i have a huge GenBank file ( downloaded from RDP containing all >>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's >> linage (in ORGANISM). >>> I am getting the output like: >>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >>> Holophagales Holophagae "Acidobacteria" Bacteria Root >>> This is the exact output i want, but i am missing lot of records (they >> are >>> there in the genbank file but not in my output). >>> I also got a warning during parsing: >>> --------------------- WARNING --------------------- >>> MSG: Unbalanced quote in: >>> /db_xref="taxon:35783" /germline" >>> /mol_type="genomic DNA" >>> /organism="Enterococcus sp." >>> /strain="LMG12316"No further qualifiers will be added for this feature >>> --------------------------------------------------- >>> So i was just wondering that is this warning message causing that problem >> or >>> i am doing something wrong? >> >> "Unbalanced quote" means there is not an even number (multiple of 2) >> double-quote (") symbols around the tag's value. I can see that the >> first line (below) looks problematic: >> >> YOU HAVE: >> >> /db_xref="taxon:35783" /germline" >> >> SHOULD BE: >> >> /db_xref="taxon:35783" >> /germline >> >> I suspect there is a problem either with RDP's genbank producer, or >> Bioperl is having problem with the "germline" qualifier which is a >> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I >> think in Bioperl this is handled by setting the value to "_no_value" >> ?) >> >> http://www.ncbi.nlm.nih.gov/collab/FT/ >> >> Qualifier /germline >> Definition the sequence presented in the entry has not undergone >> somatic >> rearrangement as part of an adaptive immune response; it is >> the >> unrearranged sequence that was inherited from the parental >> germline >> Value format none >> Example /germline >> Comment /germline should not be used to indicate that the source of >> the sequence is a gamete or germ cell; >> /germline and /rearranged cannot be used in the same source >> feature; >> /germline and /rearranged should only be used for molecules >> that >> can undergo somatic rearrangements as part of an >> adaptive immune >> response; these are the T-cell receptor (TCR) and >> immunoglobulin >> loci in the jawed vertebrates, and the unrelated variable >> lymphocyte receptor (VLR) locus in the jawless fish >> (lampreys >> and hagfish); >> /germline and /rearranged should not be used outside of the >> Craniata (taxid=89593) >> >> >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >> University, AUSTRALIA >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed May 5 15:38:11 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 11:38:11 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Message-ID: Hi Chris, I downloaded this file from RDP, it contain all bacterial 16s. Thanks Shalabh On Wed, May 5, 2010 at 11:32 AM, Chris Fields wrote: > Shalabh, > > What is the source of this file? It's not from GenBank; if I look up the > parent sequence using Bio::DB::GenBank it works fine: > > use Modern::Perl; > use Bio::DB::GenBank; > > my $id = 'AP010656'; > > my $gb = Bio::DB::GenBank->new(); > > my $seq = $gb->get_Seq_by_acc($id); > > say join(',',$seq->species->classification); > > chris > > On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > > > Hi Torsten, > > Thanks for pointing that out. But this is just a warning, > > it will not break the script. i found the the point where script is > > breaking. > > Its breaking and giving this message: > > Can't call method "classification" on an undefined value at parseGB.pl > line > > 9, line 10067733. > > > > So the script is breaking when its coming to this record: > > > > LOCUS S001198291 1521 bp rRNA linear BCT > > 02-Feb-2009 > > DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. > CFP2. > > ACCESSION AP010656 REGION: 61786..63306 > > PROJECT GenomeProject:29025 > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > > "Bacteroidales"; > > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > > REFERENCE 1 (bases 1 to 1521) > > AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; > > TITLE ; > > JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > > Contact:Atsushi Toyoda National Institute of Genetics, > > Comparative > > Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > > Japan > > REFERENCE 2 > > AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > > T.D., > > Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > > > > It is unable to parse this record, but i don't understand why it is doing > > so? The only reason i can think of is the organism's name which is very > long > > as compared to others. > > > > Thanks > > Shalabh > > > > > > > > On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > > torsten.seemann at infotech.monash.edu.au> wrote: > > > >>> i have a huge GenBank file ( downloaded from RDP containing all > >>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > >> linage (in ORGANISM). > >>> I am getting the output like: > >>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > >>> Holophagales Holophagae "Acidobacteria" Bacteria Root > >>> This is the exact output i want, but i am missing lot of records (they > >> are > >>> there in the genbank file but not in my output). > >>> I also got a warning during parsing: > >>> --------------------- WARNING --------------------- > >>> MSG: Unbalanced quote in: > >>> /db_xref="taxon:35783" /germline" > >>> /mol_type="genomic DNA" > >>> /organism="Enterococcus sp." > >>> /strain="LMG12316"No further qualifiers will be added for this feature > >>> --------------------------------------------------- > >>> So i was just wondering that is this warning message causing that > problem > >> or > >>> i am doing something wrong? > >> > >> "Unbalanced quote" means there is not an even number (multiple of 2) > >> double-quote (") symbols around the tag's value. I can see that the > >> first line (below) looks problematic: > >> > >> YOU HAVE: > >> > >> /db_xref="taxon:35783" /germline" > >> > >> SHOULD BE: > >> > >> /db_xref="taxon:35783" > >> /germline > >> > >> I suspect there is a problem either with RDP's genbank producer, or > >> Bioperl is having problem with the "germline" qualifier which is a > >> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > >> think in Bioperl this is handled by setting the value to "_no_value" > >> ?) > >> > >> http://www.ncbi.nlm.nih.gov/collab/FT/ > >> > >> Qualifier /germline > >> Definition the sequence presented in the entry has not undergone > >> somatic > >> rearrangement as part of an adaptive immune response; it is > >> the > >> unrearranged sequence that was inherited from the parental > >> germline > >> Value format none > >> Example /germline > >> Comment /germline should not be used to indicate that the source > of > >> the sequence is a gamete or germ cell; > >> /germline and /rearranged cannot be used in the same source > >> feature; > >> /germline and /rearranged should only be used for molecules > >> that > >> can undergo somatic rearrangements as part of an > >> adaptive immune > >> response; these are the T-cell receptor (TCR) and > >> immunoglobulin > >> loci in the jawed vertebrates, and the unrelated variable > >> lymphocyte receptor (VLR) locus in the jawless fish > >> (lampreys > >> and hagfish); > >> /germline and /rearranged should not be used outside of the > >> Craniata (taxid=89593) > >> > >> > >> --Torsten Seemann > >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > >> University, AUSTRALIA > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed May 5 16:01:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 11:01:55 -0500 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> Message-ID: <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> Shalabh, There are several problems with this file that make it somewhat problematic and somewhat non-GenBank like. It does parse (it has seq data) but doesn't catch the SOURCE/ORGANISM b/c of the somewhat non-canonical way of displaying the classification: SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". It's different enough from the NCBI version (from here: http://www.ncbi.nlm.nih.gov/nuccore/212548595) that it's probably breaking the parser: SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 Bacteria; Bacteroidetes; Bacteroidia; Bacteroidales; Candidatus Azobacteroides. Please file this as a bug, we can take a look at it. It's a bit non-standard so I can't promise it'll be fixed unless it's fairly easy to do. chris On May 5, 2010, at 10:38 AM, shalabh sharma wrote: > Hi Chris, > I downloaded this file from RDP, it contain all bacterial 16s. > > Thanks > Shalabh > > > On Wed, May 5, 2010 at 11:32 AM, Chris Fields wrote: > >> Shalabh, >> >> What is the source of this file? It's not from GenBank; if I look up the >> parent sequence using Bio::DB::GenBank it works fine: >> >> use Modern::Perl; >> use Bio::DB::GenBank; >> >> my $id = 'AP010656'; >> >> my $gb = Bio::DB::GenBank->new(); >> >> my $seq = $gb->get_Seq_by_acc($id); >> >> say join(',',$seq->species->classification); >> >> chris >> >> On May 5, 2010, at 9:46 AM, shalabh sharma wrote: >> >>> Hi Torsten, >>> Thanks for pointing that out. But this is just a warning, >>> it will not break the script. i found the the point where script is >>> breaking. >>> Its breaking and giving this message: >>> Can't call method "classification" on an undefined value at parseGB.pl >> line >>> 9, line 10067733. >>> >>> So the script is breaking when its coming to this record: >>> >>> LOCUS S001198291 1521 bp rRNA linear BCT >>> 02-Feb-2009 >>> DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. >> CFP2. >>> ACCESSION AP010656 REGION: 61786..63306 >>> PROJECT GenomeProject:29025 >>> SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 >>> ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 >>> Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; >>> "Bacteroidales"; >>> "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". >>> REFERENCE 1 (bases 1 to 1521) >>> AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.; >>> TITLE ; >>> JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. >>> Contact:Atsushi Toyoda National Institute of Genetics, >>> Comparative >>> Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, >>> Japan >>> REFERENCE 2 >>> AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor >>> T.D., >>> Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; >>> >>> It is unable to parse this record, but i don't understand why it is doing >>> so? The only reason i can think of is the organism's name which is very >> long >>> as compared to others. >>> >>> Thanks >>> Shalabh >>> >>> >>> >>> On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < >>> torsten.seemann at infotech.monash.edu.au> wrote: >>> >>>>> i have a huge GenBank file ( downloaded from RDP containing all >>>>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's >>>> linage (in ORGANISM). >>>>> I am getting the output like: >>>>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae >>>>> Holophagales Holophagae "Acidobacteria" Bacteria Root >>>>> This is the exact output i want, but i am missing lot of records (they >>>> are >>>>> there in the genbank file but not in my output). >>>>> I also got a warning during parsing: >>>>> --------------------- WARNING --------------------- >>>>> MSG: Unbalanced quote in: >>>>> /db_xref="taxon:35783" /germline" >>>>> /mol_type="genomic DNA" >>>>> /organism="Enterococcus sp." >>>>> /strain="LMG12316"No further qualifiers will be added for this feature >>>>> --------------------------------------------------- >>>>> So i was just wondering that is this warning message causing that >> problem >>>> or >>>>> i am doing something wrong? >>>> >>>> "Unbalanced quote" means there is not an even number (multiple of 2) >>>> double-quote (") symbols around the tag's value. I can see that the >>>> first line (below) looks problematic: >>>> >>>> YOU HAVE: >>>> >>>> /db_xref="taxon:35783" /germline" >>>> >>>> SHOULD BE: >>>> >>>> /db_xref="taxon:35783" >>>> /germline >>>> >>>> I suspect there is a problem either with RDP's genbank producer, or >>>> Bioperl is having problem with the "germline" qualifier which is a >>>> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I >>>> think in Bioperl this is handled by setting the value to "_no_value" >>>> ?) >>>> >>>> http://www.ncbi.nlm.nih.gov/collab/FT/ >>>> >>>> Qualifier /germline >>>> Definition the sequence presented in the entry has not undergone >>>> somatic >>>> rearrangement as part of an adaptive immune response; it is >>>> the >>>> unrearranged sequence that was inherited from the parental >>>> germline >>>> Value format none >>>> Example /germline >>>> Comment /germline should not be used to indicate that the source >> of >>>> the sequence is a gamete or germ cell; >>>> /germline and /rearranged cannot be used in the same source >>>> feature; >>>> /germline and /rearranged should only be used for molecules >>>> that >>>> can undergo somatic rearrangements as part of an >>>> adaptive immune >>>> response; these are the T-cell receptor (TCR) and >>>> immunoglobulin >>>> loci in the jawed vertebrates, and the unrelated variable >>>> lymphocyte receptor (VLR) locus in the jawless fish >>>> (lampreys >>>> and hagfish); >>>> /germline and /rearranged should not be used outside of the >>>> Craniata (taxid=89593) >>>> >>>> >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash >>>> University, AUSTRALIA >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Wed May 5 16:10:33 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 May 2010 12:10:33 -0400 Subject: [Bioperl-l] parsing GenBank file In-Reply-To: <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> References: <3E1D9CB2-7FB3-4D21-A4D2-8251D003B520@illinois.edu> <3FA7D465-4F5A-4F7B-A551-118236A8D209@illinois.edu> Message-ID: Hi Chris, I will do that, so how i can solve my problem, do you have any suggestion? I am thinking of taking all the accessions from the file i have and use Bio::DB::Genbank to get classification. Thanks shalabh On Wed, May 5, 2010 at 12:01 PM, Chris Fields wrote: > Shalabh, > > There are several problems with this file that make it somewhat problematic > and somewhat non-GenBank like. It does parse (it has seq data) but doesn't > catch the SOURCE/ORGANISM b/c of the somewhat non-canonical way of > displaying the classification: > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; "Bacteroidales"; > "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > > It's different enough from the NCBI version (from here: > http://www.ncbi.nlm.nih.gov/nuccore/212548595) that it's probably breaking > the parser: > > SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > Bacteria; Bacteroidetes; Bacteroidia; Bacteroidales; Candidatus > Azobacteroides. > > Please file this as a bug, we can take a look at it. It's a bit > non-standard so I can't promise it'll be fixed unless it's fairly easy to > do. > > chris > > On May 5, 2010, at 10:38 AM, shalabh sharma wrote: > > > Hi Chris, > > I downloaded this file from RDP, it contain all bacterial 16s. > > > > Thanks > > Shalabh > > > > > > On Wed, May 5, 2010 at 11:32 AM, Chris Fields > wrote: > > > >> Shalabh, > >> > >> What is the source of this file? It's not from GenBank; if I look up > the > >> parent sequence using Bio::DB::GenBank it works fine: > >> > >> use Modern::Perl; > >> use Bio::DB::GenBank; > >> > >> my $id = 'AP010656'; > >> > >> my $gb = Bio::DB::GenBank->new(); > >> > >> my $seq = $gb->get_Seq_by_acc($id); > >> > >> say join(',',$seq->species->classification); > >> > >> chris > >> > >> On May 5, 2010, at 9:46 AM, shalabh sharma wrote: > >> > >>> Hi Torsten, > >>> Thanks for pointing that out. But this is just a warning, > >>> it will not break the script. i found the the point where script is > >>> breaking. > >>> Its breaking and giving this message: > >>> Can't call method "classification" on an undefined value at parseGB.pl > >> line > >>> 9, line 10067733. > >>> > >>> So the script is breaking when its coming to this record: > >>> > >>> LOCUS S001198291 1521 bp rRNA linear BCT > >>> 02-Feb-2009 > >>> DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. > >> CFP2. > >>> ACCESSION AP010656 REGION: 61786..63306 > >>> PROJECT GenomeProject:29025 > >>> SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. > CFP2 > >>> ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 > >>> Root; Bacteria; "Bacteroidetes"; "Bacteroidia"; > >>> "Bacteroidales"; > >>> "Porphyromonadaceae"; unclassified_"Porphyromonadaceae". > >>> REFERENCE 1 (bases 1 to 1521) > >>> AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki > Y.; > >>> TITLE ; > >>> JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases. > >>> Contact:Atsushi Toyoda National Institute of Genetics, > >>> Comparative > >>> Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540, > >>> Japan > >>> REFERENCE 2 > >>> AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor > >>> T.D., > >>> Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.; > >>> > >>> It is unable to parse this record, but i don't understand why it is > doing > >>> so? The only reason i can think of is the organism's name which is very > >> long > >>> as compared to others. > >>> > >>> Thanks > >>> Shalabh > >>> > >>> > >>> > >>> On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann < > >>> torsten.seemann at infotech.monash.edu.au> wrote: > >>> > >>>>> i have a huge GenBank file ( downloaded from RDP containing all > >>>>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's > >>>> linage (in ORGANISM). > >>>>> I am getting the output like: > >>>>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae > >>>>> Holophagales Holophagae "Acidobacteria" Bacteria Root > >>>>> This is the exact output i want, but i am missing lot of records > (they > >>>> are > >>>>> there in the genbank file but not in my output). > >>>>> I also got a warning during parsing: > >>>>> --------------------- WARNING --------------------- > >>>>> MSG: Unbalanced quote in: > >>>>> /db_xref="taxon:35783" /germline" > >>>>> /mol_type="genomic DNA" > >>>>> /organism="Enterococcus sp." > >>>>> /strain="LMG12316"No further qualifiers will be added for this > feature > >>>>> --------------------------------------------------- > >>>>> So i was just wondering that is this warning message causing that > >> problem > >>>> or > >>>>> i am doing something wrong? > >>>> > >>>> "Unbalanced quote" means there is not an even number (multiple of 2) > >>>> double-quote (") symbols around the tag's value. I can see that the > >>>> first line (below) looks problematic: > >>>> > >>>> YOU HAVE: > >>>> > >>>> /db_xref="taxon:35783" /germline" > >>>> > >>>> SHOULD BE: > >>>> > >>>> /db_xref="taxon:35783" > >>>> /germline > >>>> > >>>> I suspect there is a problem either with RDP's genbank producer, or > >>>> Bioperl is having problem with the "germline" qualifier which is a > >>>> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I > >>>> think in Bioperl this is handled by setting the value to "_no_value" > >>>> ?) > >>>> > >>>> http://www.ncbi.nlm.nih.gov/collab/FT/ > >>>> > >>>> Qualifier /germline > >>>> Definition the sequence presented in the entry has not undergone > >>>> somatic > >>>> rearrangement as part of an adaptive immune response; it is > >>>> the > >>>> unrearranged sequence that was inherited from the parental > >>>> germline > >>>> Value format none > >>>> Example /germline > >>>> Comment /germline should not be used to indicate that the > source > >> of > >>>> the sequence is a gamete or germ cell; > >>>> /germline and /rearranged cannot be used in the same source > >>>> feature; > >>>> /germline and /rearranged should only be used for molecules > >>>> that > >>>> can undergo somatic rearrangements as part of an > >>>> adaptive immune > >>>> response; these are the T-cell receptor (TCR) and > >>>> immunoglobulin > >>>> loci in the jawed vertebrates, and the unrelated variable > >>>> lymphocyte receptor (VLR) locus in the jawless fish > >>>> (lampreys > >>>> and hagfish); > >>>> /germline and /rearranged should not be used outside of the > >>>> Craniata (taxid=89593) > >>>> > >>>> > >>>> --Torsten Seemann > >>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > >>>> University, AUSTRALIA > >>>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jay at jays.net Wed May 5 16:28:10 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 11:28:10 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> Message-ID: <512A88E4-85A0-4841-B6A7-9915FE0800BA@jays.net> On May 5, 2010, at 10:59 AM, Jay Hannah wrote: > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah Oops. Should have checked Smolder before sending that email... Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah $ prove -v t/email_signatures.t t/email_signatures.t .. 1..7 ok 1 - $work->[0]->{Outlook} email signatures up to date ok 2 - $work->[0]->{Netmail} email signatures up to date ok 3 - $work->[1]->{Lotus_Notes} email signatures up to date not ok 4 - $home->[0]->{MacMini_Mail.app} email signatures up to date ok 5 - $home->[0]->{MacMini_Entourage.app} email signatures up to date ok 6 - $home->[0]->{laptop_Mail.app} email signatures up to date ok 7 - $home->[0]->{laptop_Entourage.app} email signatures up to date # Failed test '$home->[0]->{MacMini_Mail.app} email signatures up to date' # at t/email_signatures.t line 5. # Looks like you failed 1 test of 7. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/7 subtests Test Summary Report ------------------- t/email_signatures.t (Wstat: 256 Tests: 7 Failed: 1) Failed test: 4 Non-zero exit status: 1 Files=1, Tests=7, 0 wallclock secs ( 0.03 usr 0.01 sys + 0.03 cusr 0.00 csys = 0.07 CPU) Result: FAIL From jay at jays.net Wed May 5 15:59:37 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 10:59:37 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> Message-ID: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> On May 5, 2010, at 7:12 AM, Chris Fields wrote: > I think this would be great! Would you stick with trunk only, or other bioperl dists (run, for example)? I would definitely start with trunk and see how it goes. Last night I tried to smoke all our old $work[0] tags and failed impressively. Our tests were (and probably still are) too reliant on 3rd party black boxes being online and responsive, and servers tend to move and get reconfigured over the years. Presumably BioPerl and Moose and more self-contained (unless external deps are explicitly enabled), so perhaps historical smoking would work fairly well. In Moose land the request is that I smoke not only Moose, but everything on CPAN that *depends on Moose*: export MOOSE_TEST_MD=1; prove xt/test-my-dependents.t Which should be ... educational. :) While exciting, I don't think that concept translates to the BioPerl monolith. If I'm the only one smoking, you'll get a very limited number of architecture + perl version combinations reported. Which begs the question of how to harness a broader tester pool. It's great that 342 systems smoked our latest CPAN upload: http://static.cpantesters.org/distro/B/bioperl.html But the crazy I'm embarking on would mean several smokes each day (every svn/git commit?), compared to the cpantesters who haven't had a new CPAN release to smoke since Sep 2009 (1.6.1). Maybe I'd just do one or two a day or something? Whoever wanted to could report into our central Smolder server using their architectures + perl versions. A volunteer would just install Smolder from CPAN and run this in their bioperl-live directory: prove -I . --recurse --archive test_run.tar.gz smolder_smoke_signal --server smolder.jays.net \ --username MyUserName --password MyPass \ --file test_run.tar.gz --project bioperl-live --tags trunk Deep ponderings, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From David.Messina at sbc.su.se Wed May 5 21:27:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 23:27:24 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> > Do we want to retain the git-SVN metadata on commits? What are the tradeoffs with this? >From the little reading I've done, it seems that space and clutter are the chief drawbacks, but that it's easy to strip this metadata out later. Does that jibe with your impression? > Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly My github account name is: DaveMessina Do I have an @bioperl.org address? I tried sending mail to a few likely permutations without success. In any case, I added dave_messina -at- bioperl.org as an email address on my github account. > Are we sticking with a single centralized repo (SVN-like)? I am a total git novice, but it's my understanding that it's still a good idea, particularly with a big many-author project like BioPerl, to have a primary, official repo. But I'd be interested in hearing more discussion on this. We're at a good place to make large-ish changes to how we do things, I think. > Will that be github, or will github be a downstream repo to our work on dev? My only concern with github being primary is in case something happens to github. Not likely, I know, but it seems prudent to maintain a certain amount of control over our destiny. So I'm inclined to make dev be primary and github downstream, with the assumption that it'd trivial to abandon dev and make github primary in the future if we want. Or would it be enough to auto-mirror to dev.open-bio.org, which could serve as a fallback in case github goes offline, temporarily or permanently? > We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). Are there any git-familiar folks out there who could comment on the pros and cons of this? Perhaps some of the other Bio* projects who have switched to git could advise. Right now, without further technical details, I think it'd be better to have one true primary just because it's less confusing and easier to manage, particularly if we're to follow a model like the one mentioned just below: > I would highly suggest we start working on branches for almost everything and merge over to trunk. > [...] > I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model Yep, that looks good to me, too. > One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. We should try to make sure we have this sorted before going "live". > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? Yes! We want more people to jump in ? one of the benefits of git and github is that they encourage this. > 6) SVN Read/Write to GitHub > > I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? Read-only for sure ? that seems harmless, and we want to give people lots of ways to get BioPerl. Write ? let's play with it a bit, making a few test commits to bioperl-test, and see what happens. It would be nice if we don't force everyone who contributes to BioPerl to have to switch over to git immediately. Me included. :) > 7) Others? What happens when we start splitting up bioperl into separate distros? Do we put them each into a separate repo? Dave From David.Messina at sbc.su.se Wed May 5 21:40:46 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 5 May 2010 23:40:46 +0200 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> Message-ID: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> > Presumably BioPerl and Moose and more self-contained (unless external deps are explicitly enabled), so perhaps historical smoking would work fairly well. Very few of BioPerl's tests rely on outside servers, and those that do have to be turned on explicitly with a network-tests flag. So hopefully that won't be an issue. > In Moose land the request is that I smoke not only Moose, but everything on CPAN that *depends on Moose*: > [...] > While exciting, I don't think that concept translates to the BioPerl monolith. Agreed, not really. Except for some of the GMOD stuff. And anyway this could always be done later if desired. Probably much later. :) > Whoever wanted to could report into our central Smolder server using their architectures + perl versions. A volunteer would just install Smolder from CPAN and run this in their bioperl-live directory: > > prove -I . --recurse --archive test_run.tar.gz > smolder_smoke_signal --server smolder.jays.net \ > --username MyUserName --password MyPass \ > --file test_run.tar.gz --project bioperl-live --tags trunk Would the reporter need to have any special setup to do this? Could this kind of reporting be written into the BioPerl Build.PL as a user-settable option (just like the options for installing scripts or running network tests)? If so, then we could get lots of feedback on trunk (master) commits and not just releases. Dave From jason at bioperl.org Wed May 5 22:45:41 2010 From: jason at bioperl.org (Jason Stajich) Date: Wed, 05 May 2010 15:45:41 -0700 Subject: [Bioperl-l] Modules in Bio:Tree In-Reply-To: <4BE1D0E2.9010500@mail.mcgill.ca> References: <4BE1D0E2.9010500@mail.mcgill.ca> Message-ID: <4BE1F515.7090604@bioperl.org> Please use the mailing list for questions. The nodes are objects not strings you print - as it shows in http://bioperl.org/wiki/HOWTO:Trees#Example_Code you access information from them with the object methods like 'id' so print $leaf->id, "\n" would probably accomplish what you are looking for right now. -jason Sudeep Mehrotra wrote, On 5/5/10 1:11 PM: > Hello Jason, > I am using the Bio:Tree modules to get a list of all the leaves in > their respective clusters. I looked at the examples and followed the > functions of various modules but I am not able to get the desired result. > > My input looks as follows: > ((((Candidatus_Korarchaeum)Korarchaeota,((((Cenarchaeum_symbiosum)Cenarchaeum)Cenarchaeaceae)Cenarchaeales,((((Nitrosopumilus_maritimus)Nitrosopumilus)Nitrosopumilaceae)Nitrosopumilales)marine_archaeal_group_1)Thaumarchaeota,(((((Archaeoglobus_fulgidus)Archaeoglobus)Archaeoglobaceae)Archaeoglobales)Archaeoglobi, > > and so on.... > > Code is like this: > $input = new Bio::TreeIO(-file =>"$file1",-format => "newick"); > $tree = $input->next_tree; > @leaves = $tree->get_leaf_nodes(); > foreach $leaf (@leaves) > { > print "$leaf\n"; > } > The ouput I get is: > Bio::Tree::Node=HASH(0xa783e0) > Bio::Tree::Node=HASH(0xa78710) > Bio::Tree::Node=HASH(0xa78ab0) > > Not sure what I am doing wrong. > > Objective is to get a cluster of all the leaves. > > Thanks From florent.angly at gmail.com Thu May 6 00:16:05 2010 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 06 May 2010 10:16:05 +1000 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <4BE20A45.5090206@gmail.com> Hi Chris, On 06/05/10 00:46, Chris Fields wrote: > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I think, once one has signed up with github, you can add that same address to your current ones, and it should map to your github account. If we use dev.open-bio.org as our central git repo, we won't need to go through with that, but we will need a viewable version of dev available somehow (mirrored on github or otherwise). Speaking of... > I have a GitHub account, fangly, on which I just added the email address fangly at bioperl.org . Thanks for your efforts working on the Git migration. Florent From jay at jays.net Thu May 6 03:18:47 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:18:47 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: I smoked trunk a few times. Check out all the pretty buttons and graphs and such: http://biobase2.ist.unomaha.edu:8080/app/projects/smoke_reports/1 How you too can submit smoke results: http://jays.net/wiki/Smolder Neat? Not? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Thu May 6 03:31:05 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:31:05 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: On May 5, 2010, at 4:40 PM, Dave Messina wrote: > Very few of BioPerl's tests rely on outside servers, and those that do have to be turned on explicitly with a network-tests flag. So hopefully that won't be an issue. I said "no" to the network tests for my smoke runs. Haven't really examined the results enough to know if the failures are my fault or what. Since I always use bioperl-live out of SVN (soon git) I may not be following the ./Build.PL procedure correctly. > Agreed, not really. Except for some of the GMOD stuff. And anyway this could always be done later if desired. Probably much later. :) Ya. Some day http://smolder.open-bio.org hosting jillions of projects would be dreamy! :) Any open-bio.org projects using TAP other than BioPerl? Smolder can host anything TAP, and TAP producers are available in at least 17 languages: http://testanything.org/wiki/index.php/TAP_Producers > Would the reporter need to have any special setup to do this? LWP::UserAgent or Smolder's smolder_smoke_signal are the two methods I've successfully executed so far: http://jays.net/wiki/Smolder > Could this kind of reporting be written into the BioPerl Build.PL as a user-settable option (just like the options for installing scripts or running network tests)? > > If so, then we could get lots of feedback on trunk (master) commits and not just releases. Ya, wow. I've never built BioPerl "the right way" (I'm an SVN/git junkie) so I'm not sure how this would get put into Build.PL. Would you prompt the user, something like "Since you just installed BioPerl, we'd like to connect to the Internet and report in your test results. Is this ok? [yes] " ? It would be very cool to collect and trend thousands of reports, assuming it can be 100% automated for the user. Thanks for the feedback! :) Time to putter my motorcycle home before it gets too cold. G'night, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Thu May 6 03:43:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 May 2010 22:43:14 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. chris On May 5, 2010, at 10:18 PM, Jay Hannah wrote: > I smoked trunk a few times. Check out all the pretty buttons and graphs and such: > > http://biobase2.ist.unomaha.edu:8080/app/projects/smoke_reports/1 > > How you too can submit smoke results: > > http://jays.net/wiki/Smolder > > Neat? Not? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Thu May 6 03:55:40 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 5 May 2010 22:55:40 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: On May 5, 2010, at 10:43 PM, Chris Fields wrote: > Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. Ya, seems like the way to go. LWP is all over inside BioPerl already, whereas Smolder itself has 147 dependencies, most of which probably aren't relevant to most BioPerl users. :) http://deps.cpantesters.org/?module=Smolder;perl=latest So a stand-alone script that could be run whenever, plus (eventually) a prompt in Build.PL asking about running it? Not sure if Build.PL can somehow use the "prove --archive" hook to store the results during the normal installation run through all the tests... Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From lincoln.stein at gmail.com Thu May 6 12:01:09 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 6 May 2010 08:01:09 -0400 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: My github username is lstein and I've just added lstein at bioperl.org to my linked email addresses. I hope I have a bioperl.org address; I never use it! Lincoln On Wed, May 5, 2010 at 10:46 AM, Chris Fields wrote: > All, > > I would like to finalize moving over to git/github very soon. We're sort > of in limbo on this, so it needs to progress forward. We'll need to do some > initial cleanup after the move (Heikki is already doing a few things on the > test repo, which we'll need to diff over to the new one). > > So with that in mind, here are my thoughts. This is copied over to this > wiki page, in case you don't want to reply here: > > http://www.bioperl.org/wiki/From_SVN_to_Git > > (thanks Mark!) > > 1) Timeline > > When? Sooner the better (weeks as opposed to months). Our anon. svn is > down, likely permanently ( > http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > 2) Migration strategy > > Now mainly worked out using svn2git, which is very fast. We would need to > make the svn repo on dev read-only during this transition. My guess is it > would take very little time. Do we want to retain the git-SVN metadata on > commits? This is viewable with our current read-only mirror on github: > > > http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca > > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on > github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used > their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I > think, once one has signed up with github, you can add that same address to > your current ones, and it should map to your github account. If we use > dev.open-bio.org as our central git repo, we won't need to go through with > that, but we will need a viewable version of dev available somehow (mirrored > on github or otherwise). Speaking of... > > 4) Development strategy > > Are we sticking with a single centralized repo (SVN-like)? Will that be > github, or will github be a downstream repo to our work on dev? We could > feasibly have github be an active, forkable repo that could be > bidirectionally synced with dev, but I'm not sure of the logistics on this > (this popped up before with svn migration and was rejected b/c it was > considered too difficult to maintain). > > Git makes it very easy to make branches and merge in code to trunk. With > that in mind, I would highly suggest we start working on branches for almost > everything and merge over to trunk. There is very little to no overhead in > doing so with git. > > I like this strategy (Mark Jensen pointed this out): > http://nvie.com/git-model > > Also, several points were raised in a related project (Parrot) considering > a move to git/github from svn. One in particular was that git allows > destructive commits. Jonathan Leto indicated we can set up specific > branches that don't allow this, using commit hooks, so my guess is the > master branch and release branches wouldn't allow rewinds. > > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? > > http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod > > This is easy with github and forks. > > 6) SVN Read/Write to GitHub > > It was recently announced that one can access a github repo using > subversion as read-only, and just yesterday experimental write to github is > allowed: > > http://github.com/blog/644-subversion-write-support > > I can see allowing read-only svn, but write support is still experimental. > Do we want to allow that? > > 7) Others? > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Thu May 6 13:01:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 08:01:56 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> Message-ID: <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> (comments interspersed below) On May 5, 2010, at 4:27 PM, Dave Messina wrote: >> Do we want to retain the git-SVN metadata on commits? > > What are the tradeoffs with this? > > From the little reading I've done, it seems that space and clutter are the chief drawbacks, but that it's easy to strip this metadata out later. Does that jibe with your impression? I don't really see much use for it personally, beyond retaining the SVN commit #. >> Not everyone has a github account. Recent ones who I couldn't find on github: dmessina, fangly > > My github account name is: DaveMessina > > Do I have an @bioperl.org address? I tried sending mail to a few likely permutations without success. In any case, I added dave_messina -at- bioperl.org as an email address on my github account. I think if you have a bioperl dev account you should have a bioperl.org set up. That's one thing I'm not absolutely sure of. >> Are we sticking with a single centralized repo (SVN-like)? > > I am a total git novice, but it's my understanding that it's still a good idea, particularly with a big many-author project like BioPerl, to have a primary, official repo. But I'd be interested in hearing more discussion on this. We're at a good place to make large-ish changes to how we do things, I think. > > >> Will that be github, or will github be a downstream repo to our work on dev? > > My only concern with github being primary is in case something happens to github. Not likely, I know, but it seems prudent to maintain a certain amount of control over our destiny. > > So I'm inclined to make dev be primary and github downstream, with the assumption that it'd trivial to abandon dev and make github primary in the future if we want. > > Or would it be enough to auto-mirror to dev.open-bio.org, which could serve as a fallback in case github goes offline, temporarily or permanently? Well, the nice thing about git is essentially everyone who pulls has a copy of the repo. It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. We could also use alternate mirrors for github besides dev. http://repo.or.cz/w is one example. >> We could feasibly have github be an active, forkable repo that could be bidirectionally synced with dev, but I'm not sure of the logistics on this (this popped up before with svn migration and was rejected b/c it was considered too difficult to maintain). > > Are there any git-familiar folks out there who could comment on the pros and cons of this? Perhaps some of the other Bio* projects who have switched to git could advise. > > Right now, without further technical details, I think it'd be better to have one true primary just because it's less confusing and easier to manage, particularly if we're to follow a model like the one mentioned just below: We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. >> I would highly suggest we start working on branches for almost everything and merge over to trunk. >> [...] >> I like this strategy (Mark Jensen pointed this out): http://nvie.com/git-model > > Yep, that looks good to me, too. > > > >> One in particular was that git allows destructive commits. Jonathan Leto indicated we can set up specific branches that don't allow this, using commit hooks, so my guess is the master branch and release branches wouldn't allow rewinds. > > We should try to make sure we have this sorted before going "live". Would be adding a pre-commit hook to disallow this. I'll look into it. >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? > > Yes! > > We want more people to jump in ? one of the benefits of git and github is that they encourage this. > > > >> 6) SVN Read/Write to GitHub >> >> I can see allowing read-only svn, but write support is still experimental. Do we want to allow that? > > Read-only for sure ? that seems harmless, and we want to give people lots of ways to get BioPerl. > > Write ? let's play with it a bit, making a few test commits to bioperl-test, and see what happens. It would be nice if we don't force everyone who contributes to BioPerl to have to switch over to git immediately. Me included. :) Sounds good to me. >> 7) Others? > > What happens when we start splitting up bioperl into separate distros? Do we put them each into a separate repo? Yes. > Dave Thanks! chris From cjfields at illinois.edu Thu May 6 14:19:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 09:19:06 -0500 Subject: [Bioperl-l] Smoke test aggregator - Buildbot + Smolder ? In-Reply-To: References: <5499C846-E927-42C1-8F8B-C08D8F4F512A@illinois.edu> <4BFF05F0-A734-4856-AF2F-4F441E1E6307@jays.net> <556A667A-A3EE-4902-8665-673C07A57112@sbc.su.se> Message-ID: <3E35F38F-29A0-4419-AE24-AD25A0D6A6A1@illinois.edu> prove generally is just a perl script frontend for Test::Harness and App::Prove, correct? It is included in core from perl 5 on. Here is the code for 'prove' on my local setup: use strict; use App::Prove; my $app = App::Prove->new; $app->process_args(@ARGV); exit( $app->run ? 0 : 1 ); We could add a 'Build smoke' or somesuch that does this internally. I'm tending to shift away from Bio::Root::Build for such things at the moment, but maybe add something there? chris On May 5, 2010, at 10:55 PM, Jay Hannah wrote: > On May 5, 2010, at 10:43 PM, Chris Fields wrote: >> Nice! Maybe we should wrap the LWP version for posting into a script? We could place this with the distribution. > > Ya, seems like the way to go. LWP is all over inside BioPerl already, whereas Smolder itself has 147 dependencies, most of which probably aren't relevant to most BioPerl users. :) > > http://deps.cpantesters.org/?module=Smolder;perl=latest > > So a stand-alone script that could be run whenever, plus (eventually) a prompt in Build.PL asking about running it? Not sure if Build.PL can somehow use the "prove --archive" hook to store the results during the normal installation run through all the tests... > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Thu May 6 14:50:42 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 6 May 2010 09:50:42 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> Message-ID: <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> Chris, I added 'jhannah at bioperl.org' to my github list of email addresses. Can you add jhannah to the list of github committers in case github becomes the master repo? I need to clean up branches 'jhannah' and 'yapc10hackathon' whenever the transition is official and the master repo is declared (github or open-bio.org). Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Thu May 6 14:56:25 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 6 May 2010 09:56:25 -0500 Subject: [Bioperl-l] new core developers Rob Buels and Dave Messina In-Reply-To: References: Message-ID: On May 2, 2010, at 2:28 PM, Mark A. Jensen wrote: > On behalf of the core team, I am delighted to announce two new members: Rob Buels and Dave Messina. Woot! Congrats! Suddenly we WILL have a core dev at YAPC::NA for the hackathon! I'm now expecting great things from us. :) http://bioperl.org/wiki/YAPC Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Thu May 6 15:02:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 10:02:36 -0500 Subject: [Bioperl-l] Full bioperl-live github demo In-Reply-To: <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> References: <5D3A676B-8119-4712-B43C-FE0AA342B8ED@illinois.edu> <2656305E-2850-4DAB-8B08-3FDCE34EC1DE@illinois.edu> <8796492301724F2CA132F97AE57C2700@NewLife> <9D47D0A0-3524-446A-B360-3AEFE4A67F90@illinois.edu> <535B2F60-8CF8-4855-95A1-6E292B2A84DB@jays.net> Message-ID: Done. I think, unless there are a terrible number of objections, we'll push this in the next week or two. Need to look into the pre-commit hook setup for non-destructive commits, post-commit hook for posting commits to bioperl-guts, etc. chris On May 6, 2010, at 9:50 AM, Jay Hannah wrote: > Chris, > > I added 'jhannah at bioperl.org' to my github list of email addresses. Can you add jhannah to the list of github committers in case github becomes the master repo? > > I need to clean up branches 'jhannah' and 'yapc10hackathon' whenever the transition is official and the master repo is declared (github or open-bio.org). > > Thanks, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki.lehvaslaiho at gmail.com Thu May 6 17:26:48 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 6 May 2010 20:26:48 +0300 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: On 5 May 2010 17:46, Chris Fields wrote: > All, > > I would like to finalize moving over to git/github very soon. We're sort > of in limbo on this, so it needs to progress forward. We'll need to do some > initial cleanup after the move (Heikki is already doing a few things on the > test repo, which we'll need to diff over to the new one). > Do not worry about those, I'll move them into the final repo once it is there. I am just making sure everything works. > So with that in mind, here are my thoughts. This is copied over to this > wiki page, in case you don't want to reply here: > > http://www.bioperl.org/wiki/From_SVN_to_Git > > (thanks Mark!) > > 1) Timeline > > When? Sooner the better (weeks as opposed to months). Our anon. svn is > down, likely permanently ( > http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > ASAP. > 2) Migration strategy > > Now mainly worked out using svn2git, which is very fast. We would need to > make the svn repo on dev read-only during this transition. My guess is it > would take very little time. Do we want to retain the git-SVN metadata on > commits? This is viewable with our current read-only mirror on github: > > > http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca > > Keep it. It does no harm. > 3) Developers > > Not everyone has a github account. Recent ones who I couldn't find on > github: dmessina, fangly > > The current authors file used for mapping commit authors to emails used > their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I > think, once one has signed up with github, you can add that same address to > your current ones, and it should map to your github account. If we use > dev.open-bio.org as our central git repo, we won't need to go through with > that, but we will need a viewable version of dev available somehow (mirrored > on github or otherwise). Speaking of... > Let's go for github as the main repo. It adds visibility and has the coolness factor that helps. > 4) Development strategy > > Are we sticking with a single centralized repo (SVN-like)? Will that be > github, or will github be a downstream repo to our work on dev? We could > feasibly have github be an active, forkable repo that could be > bidirectionally synced with dev, but I'm not sure of the logistics on this > (this popped up before with svn migration and was rejected b/c it was > considered too difficult to maintain). > > Git makes it very easy to make branches and merge in code to trunk. With > that in mind, I would highly suggest we start working on branches for almost > everything and merge over to trunk. There is very little to no overhead in > doing so with git. > > I like this strategy (Mark Jensen pointed this out): > http://nvie.com/git-model > Lets try to follow this strategy. I do not think moving away from svn and going decentralized at one go would work at all. > Also, several points were raised in a related project (Parrot) considering > a move to git/github from svn. One in particular was that git allows > destructive commits. Jonathan Leto indicated we can set up specific > branches that don't allow this, using commit hooks, so my guess is the > master branch and release branches wouldn't allow rewinds. > I would not worry too much about that. With git we'll have dozens if not not hundreds of full copies of the repo as a backup. > 5) Encouraging outside contributors > > Do we want to adopt a policy similar to Moose? > > http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod > Interesting and educational document. Let's learn as much a we can from it. This is easy with github and forks. > The more the merrier. BTW, I can see Moose using Shipit, http://search.cpan.org/~bradfitz/ShipIt-0.55/ that might be worth using in BioPerl. > 6) SVN Read/Write to GitHub > > It was recently announced that one can access a github repo using > subversion as read-only, and just yesterday experimental write to github is > allowed: > > http://github.com/blog/644-subversion-write-support > > I can see allowing read-only svn, but write support is still experimental. > Do we want to allow that? > Why not is someone insists on using it. Once people get over the initial problems of moving to a different mind set in git, very few will want to use svn. There might be situtations when git does not work, however, so lets allow for svn usage. > > 7) Others? > > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Thu May 6 18:35:55 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 6 May 2010 20:35:55 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> Message-ID: [ git-SVN metadata ] > I don't really see much use for it personally, beyond retaining the SVN commit #. Oh well heck, in that case we may as well ditch it. If there's some way we could easily keep an inactive, archived version with the SVN to github commit # mapping, that would be a nice safety measure, but if it's too much trouble we needn't bother. [ github or dev as primary ] > It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. Great, okay, sounds like there won't be any problem there. [ single repo? ] > We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. Sounds like a plan. I'm pretty swamped until late next week, but if there's anything I can do to help at that time, just holler... Dave From cseligman at earthlink.net Thu May 6 19:23:40 2010 From: cseligman at earthlink.net (Chet Seligman) Date: Thu, 6 May 2010 12:23:40 -0700 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 Message-ID: <001b01caed51$a2e745c0$e8b5d140$@net> I need some help in installing this as it is not in the Active-perl repository. Here's what I have done: 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz 2. Extracted it into an empty directory IN 3. Planned to install by specifying the ppd file directly: ppm install c:\IN\whatever module-name.ppd However, there is no .ppd file extracted. I'd appreciate it if someone would explain how to get Bio::Graphics installed? Chet From scott at scottcain.net Thu May 6 19:44:04 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 6 May 2010 15:44:04 -0400 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 In-Reply-To: <001b01caed51$a2e745c0$e8b5d140$@net> References: <001b01caed51$a2e745c0$e8b5d140$@net> Message-ID: Hi Chet, Install it via the cpan shell: $ cpan cpan> install Bio::Graphics Scott On Thu, May 6, 2010 at 3:23 PM, Chet Seligman wrote: > I need some help in installing this as it is not in the Active-perl > repository. Here's what I have done: > 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz > 2. Extracted it into an empty directory IN > 3. Planned to install by specifying the ppd file directly: > ppm install c:\IN\whatever module-name.ppd > > However, there is no .ppd file extracted. > > I'd appreciate it if someone would explain how to get Bio::Graphics > installed? > > Chet > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Thu May 6 19:57:03 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 6 May 2010 15:57:03 -0400 Subject: [Bioperl-l] Installing Bio-Graphics-2.06 In-Reply-To: <002301caed55$53bfc400$fb3f4c00$@net> References: <001b01caed51$a2e745c0$e8b5d140$@net> <002301caed55$53bfc400$fb3f4c00$@net> Message-ID: Hi Chet, Please keep your responses on the bioperl mailing list. As long as you install BioPerl and GD before you try to install Bio::Graphics from cpan, yes, it is perfectly doable. You need to do that in the cmd shell. GD needs to be installed from ppm because it requires compiled code. Scott On Thu, May 6, 2010 at 3:50 PM, Chet Seligman wrote: > Hey Scott: > Is your suggestion doable in Windows? > > How? > > Chet > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Scott Cain > Sent: Thursday, May 06, 2010 12:44 PM > To: Chet Seligman > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Installing Bio-Graphics-2.06 > > Hi Chet, > > Install it via the cpan shell: > > $ cpan > cpan> install Bio::Graphics > > Scott > > > On Thu, May 6, 2010 at 3:23 PM, Chet Seligman > wrote: >> I need some help in installing this as it is not in the Active-perl >> repository. Here's what I have done: >> 1. Went to CPAN and downloaded Bio-Graphics-2.06.tar.gz >> 2. Extracted it into an empty directory IN >> 3. Planned to install by specifying the ppd file directly: >> ppm install c:\IN\whatever module-name.ppd >> >> However, there is no .ppd file extracted. >> >> I'd appreciate it if someone would explain how to get Bio::Graphics >> installed? >> >> Chet >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Thu May 6 20:04:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 15:04:39 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <06093F7F-7E3F-49AA-A440-21DBB93607A0@sbc.su.se> <7678DAD9-0D14-435C-829D-113727CC2D86@illinois.edu> Message-ID: <48C987D6-A7F2-4FBC-AB75-38F0B234961C@illinois.edu> On May 6, 2010, at 1:35 PM, Dave Messina wrote: > [ git-SVN metadata ] > >> I don't really see much use for it personally, beyond retaining the SVN commit #. > > Oh well heck, in that case we may as well ditch it. > > If there's some way we could easily keep an inactive, archived version with the SVN to github commit # mapping, that would be a nice safety measure, but if it's too much trouble we needn't bother. I think we'll keep it in for the SVN commits. Better to have it just in case. > [ github or dev as primary ] > >> It's fairly easy to set up multiple remote repos and push to them, so one could easily just push a local one elsewhere. > > Great, okay, sounds like there won't be any problem there. > > > [ single repo? ] > >> We can always start with a single repo and a read-only mirror. If we follow the Moose policy, one could fork from either the public Moose git or github and make changes, then post them back to the main github repo for review by the devs. > > Sounds like a plan. > > > I'm pretty swamped until late next week, but if there's anything I can do to help at that time, just holler... > > > Dave Okay, will prep another email for the final push over to git. chris From cjfields at illinois.edu Thu May 6 20:13:44 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 15:13:44 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> On May 6, 2010, at 12:26 PM, Heikki Lehvaslaiho wrote: > ... >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? >> >> http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod >> > > Interesting and educational document. Let's learn as much a we can from it. > > This is easy with github and forks. >> > > The more the merrier. > > BTW, I can see Moose using Shipit, > http://search.cpan.org/~bradfitz/ShipIt-0.55/ > that might be worth using in BioPerl. I agree. Have thought about that, primarily for easier releases down the road. >> 6) SVN Read/Write to GitHub >> >> It was recently announced that one can access a github repo using >> subversion as read-only, and just yesterday experimental write to github is >> allowed: >> >> http://github.com/blog/644-subversion-write-support >> >> I can see allowing read-only svn, but write support is still experimental. >> Do we want to allow that? >> > > Why not is someone insists on using it. Once people get over the initial > problems of moving to a different mind set in git, very few will want to use > svn. There might be situtations when git does not work, however, so lets > allow for svn usage. Nothing really stopping it, unless we add something to a pre-commit hook that prevents it somehow. I'm thinking a move in the next 5 days, maybe starting Monday? I'll try getting a post out on it. chris From rmb32 at cornell.edu Thu May 6 21:09:03 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 06 May 2010 14:09:03 -0700 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> Message-ID: <4BE32FEF.6080707@cornell.edu> The branching model at http://nvie.com/git-model is a good one, but the diagram might be a little intimidating for devs that are new to git. Note that the only branches that most devs will need to be concerned with are the feature branches (sometimes called topic branches), and the main development branch. The other branches are mostly concerned with making releases. To weigh in on other issues on this thread: * Might as well keep the svn metadata, it doesn't hurt and could help in any situations that call for historical digging around. * I don't think we should allow any svn write support. Anybody that truly cannot get over the hump can send patches to the list. Thanks so much for heading this up Chris. Rob From cjfields at illinois.edu Thu May 6 21:28:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 May 2010 16:28:25 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <4BE32FEF.6080707@cornell.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> Message-ID: <9676F5A9-A778-4440-95EF-14282DF72454@illinois.edu> On May 6, 2010, at 4:09 PM, Robert Buels wrote: > The branching model at http://nvie.com/git-model is a good one, but the diagram might be a little intimidating for devs that are new to git. > > Note that the only branches that most devs will need to be concerned with are the feature branches (sometimes called topic branches), and the main development branch. The other branches are mostly concerned with making releases. > > To weigh in on other issues on this thread: > > * Might as well keep the svn metadata, it doesn't hurt and could help in > any situations that call for historical digging around. > * I don't think we should allow any svn write support. Anybody that > truly cannot get over the hump can send patches to the list. > > Thanks so much for heading this up Chris. > > Rob One stumbling block that I'm seeing is there is a current lack of pre-commit hook support in github (to prevent destructive or history-changing commits). I don't think this will be a problem, but it's worth noting. post-commit is fine. chris From David.Messina at sbc.su.se Thu May 6 21:59:56 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 6 May 2010 23:59:56 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <4BE32FEF.6080707@cornell.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> Message-ID: <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> > * I don't think we should allow any svn write support. Anybody that > truly cannot get over the hump can send patches to the list. Unless svn commits are somehow problematic, is there another reason to disallow it? We're switching to git soon and with little advance notice. We'd be asking all the devs to make the move on our schedule. Dave From dimitark at bii.a-star.edu.sg Fri May 7 02:25:23 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 07 May 2010 10:25:23 +0800 Subject: [Bioperl-l] about Genewise Message-ID: <4BE37A13.6010309@bii.a-star.edu.sg> Hi guys, i have a question about Genewise. Is it possible to get the percent identity between query and target? I am now trying to figure that out. I found no such method so i suppose i should calculate it myself. Thank you for your time and help. Greetings Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From dimitark at bii.a-star.edu.sg Fri May 7 05:03:58 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 07 May 2010 13:03:58 +0800 Subject: [Bioperl-l] more genewise Message-ID: <4BE39F3E.4090204@bii.a-star.edu.sg> Hi guys, another question about genewise. Is it possible to get the query seq and the protein translation of the target seq somehow? So, up to now i could not find a way to get the percent identity between query and target(the protein translation) :( I spent some time on CPAN and perldoc and even checked the code of several modules but still no solution. Then i decided to extract the sequences out of the output file and compare them somehow but i could not find a way and for that. I found that the module 'Bio::Tools::Run::Genewise' is creating internal temp output file which i cant access so i can parse it myself and extract whatever. Because with current implementation i cant access that temp output i hacked a bit 'Bio::Tools::Run::Genewise' so i can pass my output file to the constructor, like that: my $factory = Bio::Tools::Run::Genewise->new( output => $tmpout); #not "-output" cos the module currently doesnt like it I modified the BEGIN section and the '_run' subroutine. My lines and the originals are marked : -------------- BEGIN { @GENEWISE_PARAMS = qw( DYMEM CODON GENE CFREQ SPLICE GENESTATS INIT SUBS INDEL INTRON NULL INSERT SPLICE_MAX_COLLAR SPLICE_MIN_COLLAR GW_EDGEQUERY GW_EDGETARGET GW_SPLICESPREAD KBYTE HNAME ALG BLOCK DIVIDE GENER U V S T G E M); @GENEWISE_SWITCHES = qw(HELP SILENT QUIET ERROROFFSTD TREV PSEUDO NOSPLICE_GTAG SPLICE_GTAG NOGWHSP GWHSP TFOR TABS BOTH HMMER ); $OK_FIELD{OUTPUT}++; *#dimitar * # Authorize attribute fields foreach my $attr ( @GENEWISE_PARAMS, @GENEWISE_SWITCHES, @OTHER_SWITCHES) { $OK_FIELD{$attr}++; } } ----------------------- ----------------------- my ($tfh1,$outfile1) = $self->io->tempfile(-dir=>$self->tempdir); $self->debug("genewise command = $commandstring"); my $outfile2=$self->output; *#dimitar* # my $status = system("$commandstring > $outfile1"); *#original* my $status = system("$commandstring > $outfile2 "); *#dimitar* $self->throw("Genewies call $commandstring crashed: $? \n") unless $status==0; # my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile1); *#original* my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile2); *#dimitar* ----------------------- More the method 'cds' from 'Bio::SeqFeature::Gene::Exon/I' gives nothing back it doesnt matter what i tried. And i tried a lot :) Fortunately for me i dont need that for now. But tried and didnt work so had to say. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From O.Niehuis.zfmk at uni-bonn.de Fri May 7 06:34:54 2010 From: O.Niehuis.zfmk at uni-bonn.de (Dr. Oliver Niehuis) Date: Fri, 7 May 2010 08:34:54 +0200 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifying alignment parameters Message-ID: Hi, I have a question about how to specify parameters for the alignment program MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run MAFFT with the following alignment parameters: --maxiterate 1000 --localpair Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module before, I specified the MAFFT run parameters as follows: @params = ('localpair', 'maxiterate' => 1000); $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); Unfortunately, this code causes an exception error: ------------- EXCEPTION ------------- MSG: Unallowed parameter: LOCALPAIR ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/ Generate_FASTA_files_of_orthologs.pl:55 ------------------------------------- I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT module, but only when leaving the @params array empty; MAFFT then runs with the default parameters. Has anyone an idea how I can specify run parameters for MAFFT via the Bio::Tools::Run::Alignment::MAFFT module? Any help is much appreciated! Best wishes, Oliver From biopython at maubp.freeserve.co.uk Fri May 7 08:51:38 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 May 2010 09:51:38 +0100 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> Message-ID: On Thu, May 6, 2010 at 10:59 PM, Dave Messina wrote: >> * I don't think we should allow any svn write support. ?Anybody that >> ?truly cannot get over the hump can send patches to the list. > > Unless svn commits are somehow problematic, is there another reason to disallow it? >From my reading of the github blog post, svn merges are potentially problematic. http://github.com/blog/644-subversion-write-support Peter From maj at fortinbras.us Fri May 7 11:53:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 07:53:55 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters In-Reply-To: References: Message-ID: Hi Oliver, This module looks like it needs some updating. Here's a hack that should make it work (or at least prevent that exception); put the following lines before the new() call: push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_PARAMS, 'MAXITERATE'; push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, 'LOCALPAIR'; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; HTH, Mark ----- Original Message ----- From: "Dr. Oliver Niehuis" To: Sent: Friday, May 07, 2010 2:34 AM Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters > Hi, > > I have a question about how to specify parameters for the alignment program > MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run > MAFFT with the following alignment parameters: > > --maxiterate 1000 --localpair > > Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module > before, I specified the MAFFT run parameters as follows: > > @params = ('localpair', 'maxiterate' => 1000); > $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); > > Unfortunately, this code causes an exception error: > > ------------- EXCEPTION ------------- > MSG: Unallowed parameter: LOCALPAIR ! > STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ > Bio/Tools/Run/Alignment/MAFFT.pm:211 > STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ > Tools/Run/Alignment/MAFFT.pm:196 > STACK toplevel /Users/Oliver/Desktop/Orthologs/ > Generate_FASTA_files_of_orthologs.pl:55 > ------------------------------------- > > I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT > module, but only when leaving the @params array empty; MAFFT then runs with > the default parameters. > > Has anyone an idea how I can specify run parameters for MAFFT via the > Bio::Tools::Run::Alignment::MAFFT module? > > Any help is much appreciated! > > Best wishes, > Oliver > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri May 7 12:12:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 7 May 2010 07:12:05 -0500 Subject: [Bioperl-l] more genewise In-Reply-To: <4BE39F3E.4090204@bii.a-star.edu.sg> References: <4BE39F3E.4090204@bii.a-star.edu.sg> Message-ID: <4899F495-FA46-4030-B984-EEFF81579C27@illinois.edu> Dimitar, It would be better if you could create a bug report describing the problem (with minimal example data and code) and provide a diff file or patch. This gives us a chance to do some code review and commit the patch if it passes tests. Here's a HOWTO on this: http://www.bioperl.org/wiki/HOWTO:SubmitPatch Let us know when it's submitted and we can take a look. chris On May 7, 2010, at 12:03 AM, Dimitar Kenanov wrote: > Hi guys, > another question about genewise. Is it possible to get the query seq and the protein translation of the target seq somehow? > > So, up to now i could not find a way to get the percent identity between query and target(the protein translation) :( I spent some time on CPAN and perldoc and even checked the code of several modules but still no solution. Then i decided to extract the sequences out of the output file and compare them somehow but i could not find a way and for that. I found that the module 'Bio::Tools::Run::Genewise' is creating internal temp output file which i cant access so i can parse it myself and extract whatever. > > Because with current implementation i cant access that temp output i hacked a bit 'Bio::Tools::Run::Genewise' so i can pass my output file to the constructor, like that: > > my $factory = Bio::Tools::Run::Genewise->new( output => $tmpout); #not "-output" cos the module currently doesnt like it > > I modified the BEGIN section and the '_run' subroutine. My lines and the originals are marked : > -------------- > BEGIN { > @GENEWISE_PARAMS = qw( DYMEM CODON GENE CFREQ SPLICE GENESTATS INIT > SUBS INDEL INTRON NULL INSERT SPLICE_MAX_COLLAR SPLICE_MIN_COLLAR > GW_EDGEQUERY GW_EDGETARGET GW_SPLICESPREAD > KBYTE HNAME ALG BLOCK DIVIDE GENER U V S T G E M); > > @GENEWISE_SWITCHES = qw(HELP SILENT QUIET ERROROFFSTD TREV PSEUDO NOSPLICE_GTAG > SPLICE_GTAG NOGWHSP GWHSP > TFOR TABS BOTH HMMER ); > > $OK_FIELD{OUTPUT}++; *#dimitar > * # Authorize attribute fields > foreach my $attr ( @GENEWISE_PARAMS, @GENEWISE_SWITCHES, > @OTHER_SWITCHES) { $OK_FIELD{$attr}++; } > } > ----------------------- > ----------------------- > my ($tfh1,$outfile1) = $self->io->tempfile(-dir=>$self->tempdir); > $self->debug("genewise command = $commandstring"); > my $outfile2=$self->output; *#dimitar* > # my $status = system("$commandstring > $outfile1"); *#original* > my $status = system("$commandstring > $outfile2 "); *#dimitar* > $self->throw("Genewies call $commandstring crashed: $? \n") unless $status==0; > > # my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile1); *#original* > my $genewiseParser = Bio::Tools::Genewise->new(-file=> $outfile2); *#dimitar* > ----------------------- > > More the method 'cds' from 'Bio::SeqFeature::Gene::Exon/I' gives nothing back it doesnt matter what i tried. And i tried a lot :) Fortunately for me i dont need that for now. But tried and didnt work so had to say. > > Cheers > Dimitar > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri May 7 15:34:09 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 11:34:09 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters In-Reply-To: <332A01DD-64DA-41EC-B5CE-2BC74BE78038@uni-bonn.de> References: <332A01DD-64DA-41EC-B5CE-2BC74BE78038@uni-bonn.de> Message-ID: <9764564B5CC44A89883498C6309DA045@NewLife> Hi Oliver, I think so, looking at the module again. Instead of the lines in the previous post, put push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, '(LOCALPAIR', 'MAXITERATE'); $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; and create your @params array with @params = ('localpair' => 1, 'maxiterate' => 1000); The switches need to be set with something that returns true, I believe. I *think* this should work for you. But if you would, please submit your original problem as a bug at http://bugzilla.bioperl.org. The module definitely needs some tender loving care. Thanks Mark ----- Original Message ----- From: Dr. Oliver Niehuis To: Mark A. Jensen Sent: Friday, May 07, 2010 11:07 AM Subject: Re: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters Dear Mark, Thanks for your quick reply and the MAFFT module hack. I added your code to my script and it seems to works, except that I can't specify the number of iterations (at least, I don't know how). I can specify my @params = ('localpair', 'maxiterate'); but when I assign 1000 to 'maxiterate' (i.e. 'maxiterate' => 1000), I get again an exception error, complaining about 1000 being an unallowed parameter. ------------- EXCEPTION ------------- MSG: Unallowed parameter: 1000 ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/Generate_FASTA_files_of_orthologs.pl:61 ------------------------------------- Do you know how to fix this? Best wishes, Oliver Am 07.05.2010 um 13:53 schrieb Mark A. Jensen: Hi Oliver, This module looks like it needs some updating. Here's a hack that should make it work (or at least prevent that exception); put the following lines before the new() call: push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_PARAMS, 'MAXITERATE'; push @Bio::Tools::Run::Alignment::MAFFT::MAFFT_SWITCHES, 'LOCALPAIR'; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{MAXITERATE} = 1; $Bio::Tools::Run::Alignment::MAFFT::OK_FIELD{LOCALPAIR} = 1; HTH, Mark ----- Original Message ----- From: "Dr. Oliver Niehuis" To: Sent: Friday, May 07, 2010 2:34 AM Subject: [Bioperl-l] Bio::Tools::Run::Alignment::MAFFT - specifyingalignment parameters Hi, I have a question about how to specify parameters for the alignment program MAFFT via the Bio::Tools::Run::Alignment::MAFFT module. I would like to run MAFFT with the following alignment parameters: --maxiterate 1000 --localpair Having used TCOFFEE and the Bio::Tools::Run::Alignment::Tcoffee module before, I specified the MAFFT run parameters as follows: @params = ('localpair', 'maxiterate' => 1000); $factory = Bio::Tools::Run::Alignment::MAFFT->new(@params); Unfortunately, this code causes an exception error: ------------- EXCEPTION ------------- MSG: Unallowed parameter: LOCALPAIR ! STACK Bio::Tools::Run::Alignment::MAFFT::AUTOLOAD /sw/lib/perl5/5.8.8/ Bio/Tools/Run/Alignment/MAFFT.pm:211 STACK Bio::Tools::Run::Alignment::MAFFT::new /sw/lib/perl5/5.8.8/Bio/ Tools/Run/Alignment/MAFFT.pm:196 STACK toplevel /Users/Oliver/Desktop/Orthologs/ Generate_FASTA_files_of_orthologs.pl:55 ------------------------------------- I can align sequences with MAFFT via Bio::Tools::Run::Alignment::MAFFT module, but only when leaving the @params array empty; MAFFT then runs with the default parameters. Has anyone an idea how I can specify run parameters for MAFFT via the Bio::Tools::Run::Alignment::MAFFT module? Any help is much appreciated! Best wishes, Oliver _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Fri May 7 16:42:38 2010 From: hartzell at alerce.com (George Hartzell) Date: Fri, 7 May 2010 09:42:38 -0700 Subject: [Bioperl-l] [job] Contract programmer in Bioinformatics at Genentech. Message-ID: <19428.17150.181595.755965@gargle.gargle.HOWL> Genentech's Bioinformatics department seeks an experienced software engineer for a six month contract. Modern Perl (or enlightened, or ..., just not circa 1998) style is required. We build tools to support our Research labs, collecting, storing, massaging, and presenting information to computer-philes and -phobes. We have more to do than we can handle, you'll be pitching in. Exactly what you'd be doing will be a function of your skills and our needs, and will probably vary a bit over the six month period. You write tests, sometimes even before you write code. You're not afraid of a little SQL and are comfortable collaborating with folks who were born speaking it. You're familiar with things like Moose, Rose::DB::Object, CGI::Application, NYTProf, and their ilk (or brethren) and more importantly are excited about learning more about them and using them in real-world work. Smoothing out our in-house DPAN, setting up an automated build/smoke system (we have Hudson handling Java builds already) and helping with some other infrastructure stuff is also on the table. You'll be working more-or-less full time in South San Fransisco, there's the potential for a bit of telecommuting once things get running smoothly but the bulk of the job is onsite. Things that you should be comfortable with include: Perl ("modern") SQL, object relational mappers Web application (CGI::Application, or similar) CPAN, Module::Build, Dist::Zilla, etc.... Linux Software engineering in a professional environment. Experience in bioinformatics, biology, or supporting scientists would be helpful but is not required. Please send cover letters and resumes to my work address: georgewh at gene.com (the ability to follow directions is important). Bonus points for easy formats (PDF is great!), demerits for sending me stuff in DOS specific archive formats. g. From qqq2395 at gmail.com Thu May 6 18:51:13 2010 From: qqq2395 at gmail.com (visitor555) Date: Thu, 6 May 2010 11:51:13 -0700 (PDT) Subject: [Bioperl-l] Bio::Align - alignment by position? Message-ID: <28478022.post@talk.nabble.com> Hi, I have a list alignment positions and I want to get each column them from the alignment. If I slice the alignment the sequence with gaps in these positions disappear. I can rotate on each seq and then split the sequence. Is there better way to go over the alignment position by position? thanks ! -- View this message in context: http://old.nabble.com/Bio%3A%3AAlign---alignment-by-position--tp28478022p28478022.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jillianrowe91286 at gmail.com Mon May 3 12:42:56 2010 From: jillianrowe91286 at gmail.com (mindlessbrain) Date: Mon, 3 May 2010 05:42:56 -0700 (PDT) Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall Message-ID: <28434717.post@talk.nabble.com> Hey all, I'm trying to run some code for StandAloneBLast in Windows Vista: [code] #!/usr/bin/perl use Bio::DB::SwissProt; use Bio::Tools::Run::StandAloneBlast; BEGIN { $ENV{PATH}="D:/blast-2.2.23+/bin/:"; } my $database = new Bio::DB::SwissProt; my $query = $database->get_Seq_by_id('TAUD_ECOLI'); my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastp', 'database' => 'swissprot', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); my $result = $blast_report->next_result; while( my $hit = $result->next_hit()) { print "\thit name: ", $hit->name(), " significance: ", $hit->significance(), "\n"; } [/code] I installed BLAST from the NCBI website. I get this when I run dir on the bin: D:\blast-2.2.23+\bin>dir Volume in drive D has no label. Volume Serial Number is 224C-0190 Directory of D:\blast-2.2.23+\bin 05/03/2010 03:02 PM . 05/03/2010 03:02 PM .. 03/08/2010 11:09 PM 2,789,376 blastdbcheck.exe 03/08/2010 11:09 PM 4,009,984 blastdbcmd.exe 03/08/2010 11:09 PM 1,810,432 blastdb_aliastool.exe 03/08/2010 11:09 PM 6,225,920 blastn.exe 03/08/2010 11:09 PM 6,221,824 blastp.exe 03/08/2010 11:09 PM 6,213,632 blastx.exe 03/08/2010 11:09 PM 5,316,608 blast_formatter.exe 03/08/2010 11:09 PM 3,215,360 convert2blastmask.exe 03/08/2010 11:09 PM 3,211,264 dustmasker.exe 03/08/2010 11:09 PM 51,178 legacy_blast.pl 03/08/2010 11:09 PM 3,866,624 makeblastdb.exe 03/08/2010 11:09 PM 3,612,672 makembindex.exe 03/08/2010 11:09 PM 6,344,704 psiblast.exe 03/08/2010 11:09 PM 6,201,344 rpsblast.exe 03/08/2010 11:09 PM 6,205,440 rpstblastn.exe 03/08/2010 11:09 PM 3,608,576 segmasker.exe 03/08/2010 11:09 PM 6,320,128 tblastn.exe 03/08/2010 11:09 PM 6,209,536 tblastx.exe 03/08/2010 11:09 PM 10,010 update_blastdb.pl 03/08/2010 11:09 PM 3,530,752 windowmasker.exe 20 File(s) 84,975,364 bytes 2 Dir(s) 122,390,626,304 bytes free I have an ncbi.ini file in my windows directory that contains: [NCBI] DATA=D:\blast-2.2.23+\data [BLAST] BLASTDB=D:\blast-2.2.23+\db Here's what my environmental variables looks like: http://old.nabble.com/file/p28434717/environmental%2Bvariables.jpg Help would be very, very appreciated! -- View this message in context: http://old.nabble.com/Bio%3A%3ATools%3A%3ARun%3A%3AStandAloneBlast-can%27t-find-path-to-blastall-tp28434717p28434717.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Fri May 7 20:07:58 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 7 May 2010 16:07:58 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall In-Reply-To: <28434717.post@talk.nabble.com> References: <28434717.post@talk.nabble.com> Message-ID: <670B2E492D9E4D158618EC4750C595AF@NewLife> You've got blast+, so have a look at Bio::Tools::Run::StandAloneBlastPlus, should solve it. MAJ ----- Original Message ----- From: "mindlessbrain" To: Sent: Monday, May 03, 2010 8:42 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlast can't find path to blastall > > Hey all, > > I'm trying to run some code for StandAloneBLast in Windows Vista: > > [code] > #!/usr/bin/perl > > use Bio::DB::SwissProt; > use Bio::Tools::Run::StandAloneBlast; > > BEGIN > { > $ENV{PATH}="D:/blast-2.2.23+/bin/:"; > } > > my $database = new Bio::DB::SwissProt; > my $query = $database->get_Seq_by_id('TAUD_ECOLI'); > > my $factory = Bio::Tools::Run::StandAloneBlast->new( > 'program' => 'blastp', > 'database' => 'swissprot', > _READMETHOD => "Blast" > ); > my $blast_report = $factory->blastall($query); > my $result = $blast_report->next_result; > while( my $hit = $result->next_hit()) { > print "\thit name: ", $hit->name(), > " significance: ", $hit->significance(), "\n"; > } > [/code] > > I installed BLAST from the NCBI website. I get this when I run dir on the > bin: > > D:\blast-2.2.23+\bin>dir > Volume in drive D has no label. > Volume Serial Number is 224C-0190 > > Directory of D:\blast-2.2.23+\bin > > 05/03/2010 03:02 PM . > 05/03/2010 03:02 PM .. > 03/08/2010 11:09 PM 2,789,376 blastdbcheck.exe > 03/08/2010 11:09 PM 4,009,984 blastdbcmd.exe > 03/08/2010 11:09 PM 1,810,432 blastdb_aliastool.exe > 03/08/2010 11:09 PM 6,225,920 blastn.exe > 03/08/2010 11:09 PM 6,221,824 blastp.exe > 03/08/2010 11:09 PM 6,213,632 blastx.exe > 03/08/2010 11:09 PM 5,316,608 blast_formatter.exe > 03/08/2010 11:09 PM 3,215,360 convert2blastmask.exe > 03/08/2010 11:09 PM 3,211,264 dustmasker.exe > 03/08/2010 11:09 PM 51,178 legacy_blast.pl > 03/08/2010 11:09 PM 3,866,624 makeblastdb.exe > 03/08/2010 11:09 PM 3,612,672 makembindex.exe > 03/08/2010 11:09 PM 6,344,704 psiblast.exe > 03/08/2010 11:09 PM 6,201,344 rpsblast.exe > 03/08/2010 11:09 PM 6,205,440 rpstblastn.exe > 03/08/2010 11:09 PM 3,608,576 segmasker.exe > 03/08/2010 11:09 PM 6,320,128 tblastn.exe > 03/08/2010 11:09 PM 6,209,536 tblastx.exe > 03/08/2010 11:09 PM 10,010 update_blastdb.pl > 03/08/2010 11:09 PM 3,530,752 windowmasker.exe > 20 File(s) 84,975,364 bytes > 2 Dir(s) 122,390,626,304 bytes free > > I have an ncbi.ini file in my windows directory that contains: > [NCBI] > DATA=D:\blast-2.2.23+\data > [BLAST] > BLASTDB=D:\blast-2.2.23+\db > > Here's what my environmental variables looks like: > > http://old.nabble.com/file/p28434717/environmental%2Bvariables.jpg > > Help would be very, very appreciated! > > > -- > View this message in context: > http://old.nabble.com/Bio%3A%3ATools%3A%3ARun%3A%3AStandAloneBlast-can%27t-find-path-to-blastall-tp28434717p28434717.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From manchunjohn-ma at uiowa.edu Fri May 7 20:17:52 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Fri, 7 May 2010 15:17:52 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> Hi, Right now I'm migrating some of my bioperl scripts from remote to stand-alone BLAST, and stumbled at how RemoteBlast->submit_blast and the StandAloneNCBIBlast->blastall deal with an array parameter. Common code for both versions: My p3_machine=Tools::Run::Primer3(@p3_parameters); [...] My $primer3_results=$p3_machine->run($seq); My $p3_results=$primers3_results->next_primer(); My @temp_primer_info=$p3_results->get_primer; My %primer_info; $primer_info{primer}[0]=$temp_primer_info[0]->seq; $primer_info{primer}[1]=$temp_primer_info[1]->seq; $primer_into{primer}[0]->display_id('F'); $primer_into{primer}[1]->display_id('R'); Code using RemoteBlast: My $remote_blast_machine=Tools::Run::RemoteBlast->new(@remote_blast_params) ; [Parameter setting skipped] $my $r=$remote_blast_machine->submit_blast(@primer_info{primer}); [etc, etc for iteration] Using this code, I have been able to put both sequences forth to the NCBI server and obtain results accordingly; each result object contains hits from an input sequence. However, when I switched to StandAlongBlast this way: My $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_blast _params); My $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); While (my $result=$blast_report->next_result()){ [etc, etc for iteration] } There is only one result object for sequence "F"-- and even so the loop went through twice. I would first suspect I made a mistake first-- but where? John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 From sumanth41277 at yahoo.com Fri May 7 21:34:53 2010 From: sumanth41277 at yahoo.com (polsum) Date: Fri, 7 May 2010 14:34:53 -0700 (PDT) Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU Message-ID: <28491725.post@talk.nabble.com> Hi - We have a pretty powerful computer with Dual-Quadcore intel Xeon w5580 prcoessor with 24 GB ram. When I use Bioperl programs for routine operations like Blastn and blast parsing etc. the programs dont seem to utilize the computer power to the fullest. I mean they just use one of the 8 cores and only 8GB of RAM. Is there a way to ask Perl to use all the available power? I have 64 bit windows and 64 bit Ubuntu and Ubuntu is definitely faster but still it also doesnt use entire cores of the cpu. thanks in advance -- View this message in context: http://old.nabble.com/Bio-Perl-and-multiple-cores-of-CPU-tp28491725p28491725.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Fri May 7 21:46:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 7 May 2010 16:46:24 -0500 Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU In-Reply-To: <28491725.post@talk.nabble.com> References: <28491725.post@talk.nabble.com> Message-ID: You can specify the number of processors to use. With legacy BLAST this is -a 8, with BLAST+ I think this is -num_threads 8 (with the explicit caveat I haven't tried the latter much, so no guarantees, we're not liable for explosions and such). chris On May 7, 2010, at 4:34 PM, polsum wrote: > Hi - We have a pretty powerful computer with Dual-Quadcore intel Xeon w5580 > prcoessor with 24 GB ram. When I use Bioperl programs for routine operations > like Blastn and blast parsing etc. the programs dont seem to utilize the > computer power to the fullest. I mean they just use one of the 8 cores and > only 8GB of RAM. Is there a way to ask Perl to use all the available power? > I have 64 bit windows and 64 bit Ubuntu and Ubuntu is definitely faster but > still it also doesnt use entire cores of the cpu. > > thanks in advance > -- > View this message in context: http://old.nabble.com/Bio-Perl-and-multiple-cores-of-CPU-tp28491725p28491725.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri May 7 22:14:24 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 00:14:24 +0200 Subject: [Bioperl-l] Bio-Perl and multiple cores of CPU In-Reply-To: References: <28491725.post@talk.nabble.com> Message-ID: On May 7, 2010, at 11:46 PM, Chris Fields wrote: > With legacy BLAST this is -a 8, with BLAST+ I think this is -num_threads 8 (with the explicit caveat I haven't tried the latter much, so no guarantees, we're not liable for explosions and such). Once other caveat if you use BLAST+: be sure you have the latest version 2.2.23. In my informal testing, the num_threads option wasn't working correctly in 2.2.22. Blast parsing will still be single-threaded, by the way. BioPerl programs, like everything else unfortunately, need to explicitly spawn multiple threads or forks to take advantage of multiple cores. While I've never done it myself, I ran across this post which may be helpful in case you want to try it: http://computationalbiologynews.blogspot.com/2008/07/harnessing-power-of-multicore.html Dave From David.Messina at sbc.su.se Fri May 7 22:34:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 00:34:10 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> Hi John, You're right that passing parameters should work similarly for both RemoteBlast and StandAloneBlast, but without seeing exactly the parameter array you're passing, it's not possible to identify the problem. Could you perhaps post a small, but complete test program that demonstrates the problem? Dave PS ? is this the actual code you ran? > My $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_blast_params); > My $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); > While (my $result=$blast_report->next_result()){ > [etc, etc for iteration] > } I'm guessing you were paraphrasing, but I ask because My, with a capital "M", will generate an error, you're calling Tools::Run::StandAloneBlast instead of Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), i.e. it should be: my $Stand_alone_blast_machine = Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); From florent.angly at gmail.com Sat May 8 04:42:18 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 08 May 2010 14:42:18 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: References: <28491725.post@talk.nabble.com> Message-ID: <4BE4EBAA.5010709@gmail.com> Hi all, I am working on updating some of the Bio::Assembly::* modules right now. I need to sort a list of IDs. These IDs could be numbers, "words" or a mix of the two, for example: @arr = ('singlet1', 'contig10', 'contig2', '101', '3'); I cannot sort them with the numerical sort: sort { $a <=> $b } @array This would generates warnings because some of'singlet1' the IDs are numbers. I cannot sort them lexically: sort @array Lexical sorting would not take into account numbers properly and result in: singlet1 contig10 contig2 3 101 So, what I really need is natural sorting, which is not in any core function of Perl. I'd like to use the CPAN module Sort::Naturally for this purpose: nsort @arr The results would be what we expect, i.e.: 3 101 contig2 contig10 singlet1 Can I add this module as an additional dependency of BioPerl? I imagine that some other modules might want to use this. On the assembly side, it would be used by the writing methods of Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around my problem that doesn't require any external module? Florent From manchunjohn-ma at uiowa.edu Sat May 8 21:37:13 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 16:37:13 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> Hi, And that's my problem here: I checked the BLAST output, and the two sequences did get aligned-- just that SearchIO, in whatever flavour (I tried blast, blasttable and blastxml) didn't see to do to the next result when next_result() is called. It knows there're two results, but still getting the first result on the second call. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 4:33 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Hi John, Please remember to keep Cc'ing the mailing list so that everyone can participate in the discussion. If I understand your question correctly, yes, you can iterate through the blast results in a report called $blast_report using next_result. If you haven't already, you may want to look at the SearchIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SearchIO (although the BioPerl website appears to be temporarily offline, so check back a little later.) Dave On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > Hi, > > I have did some more investigation and found that the issue is > probably that of SearchIO rather than StandAloneBlast--in case I made > a mistake, so if I parsed a standard @array of Bio::Seq objects into > StandAloneBlast (blastn with SearchIO output), the result for each of > the seqs in the array can be assessed by $blast_report->next_result, > right? > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, May 07, 2010 5:34 PM > To: Ma, Man Chun John > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Array Handling Differences between > RemoteBlast and StandAloneBlast > > Hi John, > > You're right that passing parameters should work similarly for both > RemoteBlast and StandAloneBlast, but without seeing exactly the > parameter array you're passing, it's not possible to identify the > problem. > > Could you perhaps post a small, but complete test program that > demonstrates the problem? > > > Dave > > > PS - is this the actual code you ran? > >> My >> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bl >> a >> st_params); My >> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >> While (my $result=$blast_report->next_result()){ >> [etc, etc for iteration] >> } > > I'm guessing you were paraphrasing, but I ask because My, with a > capital "M", will generate an error, you're calling > Tools::Run::StandAloneBlast instead of > Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), i.e. it should be: > > my $Stand_alone_blast_machine = > Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); > > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5095 (20100507) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > From David.Messina at sbc.su.se Sat May 8 21:32:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:32:42 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> Hi John, Please remember to keep Cc'ing the mailing list so that everyone can participate in the discussion. If I understand your question correctly, yes, you can iterate through the blast results in a report called $blast_report using next_result. If you haven't already, you may want to look at the SearchIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SearchIO (although the BioPerl website appears to be temporarily offline, so check back a little later.) Dave On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > Hi, > > I have did some more investigation and found that the issue is probably > that of SearchIO rather than StandAloneBlast--in case I made a mistake, > so if I parsed a standard @array of Bio::Seq objects into > StandAloneBlast (blastn with SearchIO output), the result for each of > the seqs in the array can be assessed by $blast_report->next_result, > right? > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Friday, May 07, 2010 5:34 PM > To: Ma, Man Chun John > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast > and StandAloneBlast > > Hi John, > > You're right that passing parameters should work similarly for both > RemoteBlast and StandAloneBlast, but without seeing exactly the > parameter array you're passing, it's not possible to identify the > problem. > > Could you perhaps post a small, but complete test program that > demonstrates the problem? > > > Dave > > > PS - is this the actual code you ran? > >> My >> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bla >> st_params); My >> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >> While (my $result=$blast_report->next_result()){ >> [etc, etc for iteration] >> } > > I'm guessing you were paraphrasing, but I ask because My, with a capital > "M", will generate an error, you're calling Tools::Run::StandAloneBlast > instead of Bio::Tools::Run::StandAloneBlast, and there's no method call > to new(), i.e. it should be: > > my $Stand_alone_blast_machine = > Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); > > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5095 (20100507) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > From cjfields at illinois.edu Sat May 8 19:41:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 14:41:58 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> Message-ID: <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> Lincoln, Just an update, I've added you, as well as Dave and Florent. Still not sure about the bioperl.org address myself, but it seems to work for Dave and others. We posted to root-l and Chris D. to make sure that's correct or if we should be using open-bio.org instead, but I believe it is. chris On May 6, 2010, at 7:01 AM, Lincoln Stein wrote: > My github username is lstein and I've just added lstein at bioperl.org to my > linked email addresses. I hope I have a bioperl.org address; I never use it! > > Lincoln > > On Wed, May 5, 2010 at 10:46 AM, Chris Fields wrote: > >> All, >> >> I would like to finalize moving over to git/github very soon. We're sort >> of in limbo on this, so it needs to progress forward. We'll need to do some >> initial cleanup after the move (Heikki is already doing a few things on the >> test repo, which we'll need to diff over to the new one). >> >> So with that in mind, here are my thoughts. This is copied over to this >> wiki page, in case you don't want to reply here: >> >> http://www.bioperl.org/wiki/From_SVN_to_Git >> >> (thanks Mark!) >> >> 1) Timeline >> >> When? Sooner the better (weeks as opposed to months). Our anon. svn is >> down, likely permanently ( >> http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). >> >> 2) Migration strategy >> >> Now mainly worked out using svn2git, which is very fast. We would need to >> make the svn repo on dev read-only during this transition. My guess is it >> would take very little time. Do we want to retain the git-SVN metadata on >> commits? This is viewable with our current read-only mirror on github: >> >> >> http://github.com/bioperl/bioperl-live/commit/7090e24f3916346b11a6bf960371f1d903d241ca >> >> 3) Developers >> >> Not everyone has a github account. Recent ones who I couldn't find on >> github: dmessina, fangly >> >> The current authors file used for mapping commit authors to emails used >> their respective bioperl.org addresses (DEVNAME -at- bioperl.org). I >> think, once one has signed up with github, you can add that same address to >> your current ones, and it should map to your github account. If we use >> dev.open-bio.org as our central git repo, we won't need to go through with >> that, but we will need a viewable version of dev available somehow (mirrored >> on github or otherwise). Speaking of... >> >> 4) Development strategy >> >> Are we sticking with a single centralized repo (SVN-like)? Will that be >> github, or will github be a downstream repo to our work on dev? We could >> feasibly have github be an active, forkable repo that could be >> bidirectionally synced with dev, but I'm not sure of the logistics on this >> (this popped up before with svn migration and was rejected b/c it was >> considered too difficult to maintain). >> >> Git makes it very easy to make branches and merge in code to trunk. With >> that in mind, I would highly suggest we start working on branches for almost >> everything and merge over to trunk. There is very little to no overhead in >> doing so with git. >> >> I like this strategy (Mark Jensen pointed this out): >> http://nvie.com/git-model >> >> Also, several points were raised in a related project (Parrot) considering >> a move to git/github from svn. One in particular was that git allows >> destructive commits. Jonathan Leto indicated we can set up specific >> branches that don't allow this, using commit hooks, so my guess is the >> master branch and release branches wouldn't allow rewinds. >> >> 5) Encouraging outside contributors >> >> Do we want to adopt a policy similar to Moose? >> >> http://search.cpan.org/dist/Moose/lib/Moose/Manual/Contributing.pod >> >> This is easy with github and forks. >> >> 6) SVN Read/Write to GitHub >> >> It was recently announced that one can access a github repo using >> subversion as read-only, and just yesterday experimental write to github is >> allowed: >> >> http://github.com/blog/644-subversion-write-support >> >> I can see allowing read-only svn, but write support is still experimental. >> Do we want to allow that? >> >> 7) Others? >> >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 8 19:23:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 14:23:35 -0500 Subject: [Bioperl-l] GitHub migration Wednesday Message-ID: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> Seems like we're all pretty much in agreement that this needs to happen sooner than later. So, I'm scheduling the git/github migration aggressively, for this Wednesday. Key steps: 1) Notify the list prior to locking the svn repo and/or making it read-only. 2) We need to set up post-commit hooks to forward commit messages on to bioperl-guts and elsewhere. I have tried this out off github and so far it's a little problematic (not working off bioperl-test, but working off my own github commits). 3) The current bioperl github repos will all be replaced with their live counterparts (branches and all), generated off the latest SVN via svn2git (including metadata). I'll have to reinstate collaborators at that time, but the author mapping should be the same as before (DEVACCOUNT at bioperl.org, where DEVACCOUNT is one's user name on dev.open-bio.org). 4) Update the wiki pages as needed to point to the github repo instead of the code.open-bio.org one. Also, I'm sure this will catch many devs not paying attention to the list by surprise, so we'll need a developer migration page set up. Anything else? chris From cjfields at illinois.edu Sat May 8 20:33:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 15:33:36 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <7EC12A62-249D-4816-9FDD-6D321095AA4B@illinois.edu> I don't have a problem with this personally, seeing how complex the code can get for natural sorting. It would become a recommended module, though, not a full dependency. chris On May 7, 2010, at 11:42 PM, Florent Angly wrote: > Hi all, > > I am working on updating some of the Bio::Assembly::* modules right now. > I need to sort a list of IDs. These IDs could be numbers, "words" or a mix of the two, for example: @arr = ('singlet1', 'contig10', 'contig2', '101', '3'); > > I cannot sort them with the numerical sort: sort { $a <=> $b } @array > This would generates warnings because some of'singlet1' the IDs are numbers. > > I cannot sort them lexically: sort @array > Lexical sorting would not take into account numbers properly and result in: > singlet1 contig10 contig2 3 101 > > So, what I really need is natural sorting, which is not in any core function of Perl. I'd like to use the CPAN module Sort::Naturally for this purpose: nsort @arr > The results would be what we expect, i.e.: > 3 101 contig2 contig10 singlet1 > > Can I add this module as an additional dependency of BioPerl? I imagine that some other modules might want to use this. On the assembly side, it would be used by the writing methods of Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around my problem that doesn't require any external module? > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Sat May 8 21:47:07 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:47:07 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> There was a report last week of a possible problem with BLAST parsing introduced in the last few days. I don't know what the status of that is, but it's possible that it's related. In any case, if you post your code and the blast report you're parsing, we might be able to diagnose the problem. Also, what version of BioPerl are you using? Dave On May 8, 2010, at 11:37 PM, Ma, Man Chun John wrote: > Hi, > > And that's my problem here: I checked the BLAST output, and the two > sequences did get aligned-- just that SearchIO, in whatever flavour (I > tried blast, blasttable and blastxml) didn't see to do to the next > result when next_result() is called. It knows there're two results, but > still getting the first result on the second call. > > Cheers, > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Saturday, May 08, 2010 4:33 PM > To: Ma, Man Chun John > Cc: BioPerl List > Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast > and StandAloneBlast > > Hi John, > > Please remember to keep Cc'ing the mailing list so that everyone can > participate in the discussion. > > If I understand your question correctly, yes, you can iterate through > the blast results in a report called $blast_report using next_result. > > If you haven't already, you may want to look at the SearchIO HOWTO: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > (although the BioPerl website appears to be temporarily offline, so > check back a little later.) > > > Dave > > > > On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > >> Hi, >> >> I have did some more investigation and found that the issue is >> probably that of SearchIO rather than StandAloneBlast--in case I made >> a mistake, so if I parsed a standard @array of Bio::Seq objects into >> StandAloneBlast (blastn with SearchIO output), the result for each of >> the seqs in the array can be assessed by $blast_report->next_result, >> right? >> >> >> John MC Ma >> Graduate Assistant >> Kwitek Lab >> Department of Internal Medicine >> 3125E MERF >> 375 Newton Road >> Iowa City IA 52242 >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, May 07, 2010 5:34 PM >> To: Ma, Man Chun John >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Array Handling Differences between >> RemoteBlast and StandAloneBlast >> >> Hi John, >> >> You're right that passing parameters should work similarly for both >> RemoteBlast and StandAloneBlast, but without seeing exactly the >> parameter array you're passing, it's not possible to identify the >> problem. >> >> Could you perhaps post a small, but complete test program that >> demonstrates the problem? >> >> >> Dave >> >> >> PS - is this the actual code you ran? >> >>> My >>> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_bl >>> a >>> st_params); My >>> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >>> While (my $result=$blast_report->next_result()){ >>> [etc, etc for iteration] >>> } >> >> I'm guessing you were paraphrasing, but I ask because My, with a >> capital "M", will generate an error, you're calling >> Tools::Run::StandAloneBlast instead of >> Bio::Tools::Run::StandAloneBlast, and there's no method call to new(), > i.e. it should be: >> >> my $Stand_alone_blast_machine = >> Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); >> >> >> >> __________ Information from ESET NOD32 Antivirus, version of virus >> signature database 5095 (20100507) __________ >> >> The message was checked by ESET NOD32 Antivirus. >> >> http://www.eset.com >> > From cjfields at illinois.edu Sat May 8 18:59:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 13:59:13 -0500 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <9B1436B1-BAF6-4330-8470-76EB35CEDD9B@illinois.edu> <4BE32FEF.6080707@cornell.edu> <703927FD-CCB7-4C59-9407-AA3ECB23B499@sbc.su.se> Message-ID: <73BDDA86-F487-484F-A87C-1DF37CDEA7D8@illinois.edu> On May 7, 2010, at 3:51 AM, Peter wrote: > On Thu, May 6, 2010 at 10:59 PM, Dave Messina wrote: >>> * I don't think we should allow any svn write support. Anybody that >>> truly cannot get over the hump can send patches to the list. >> >> Unless svn commits are somehow problematic, is there another reason to disallow it? > >> From my reading of the github blog post, svn merges are potentially problematic. > http://github.com/blog/644-subversion-write-support > > Peter Yes, they're still working out the kinks. I think we would only support read until the bugs get worked out of write. chris From David.Messina at sbc.su.se Sat May 8 21:33:53 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 23:33:53 +0200 Subject: [Bioperl-l] wiki offline? Message-ID: <064068F0-FF78-4557-9356-54CB1DB1783B@sbc.su.se> Hi, The BioPerl website appears to be down, at least from my spot on the net ? could someone please look into it? Thanks, Dave From David.Messina at sbc.su.se Sat May 8 20:07:02 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 8 May 2010 22:07:02 +0200 Subject: [Bioperl-l] BioPerl Migration to Git/GitHub In-Reply-To: <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> References: <3F557969-CB5A-482C-AAF4-69DD7C77CFEF@illinois.edu> <2390DAC7-0D04-4471-88E2-7CB08CA73246@illinois.edu> Message-ID: <9A27A797-027E-445D-A8C3-6A7B6FBF4F13@sbc.su.se> Thanks, Chris. It took a few days for github to "notice" my @bioperl.org address and connect it to my commits. Since Lincoln added his @bioperl.org email to github a little later than I did, it may just be still trickling through the github pipes. Dave From florent.angly at gmail.com Sat May 8 11:34:15 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 08 May 2010 21:34:15 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <4BE54C37.7020304@gmail.com> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. It looks like the Bio::SeqIO modules tests could use it as well. Cheers, Florent From David.Messina at sbc.su.se Sat May 8 22:40:22 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 9 May 2010 00:40:22 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Message-ID: Hi John, Your blast report works fine for me with the following code taken from the Bio::SearchIO HOWTO: #!usr/bin/perl use strict; use warnings; use Bio::SearchIO; my $in = Bio::SearchIO->new('-file' => 'blastout', '-format' => 'blast'); while(my $result = $in->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; } } } ## Here is the output: Query=F Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Query=F Hit=ref|NC_005117.2|NC_005117 Length=18 Percent_id=100 Query=F Hit=ref|NC_005105.2|NC_005105 Length=18 Percent_id=100 Query=R Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Dave From manchunjohn-ma at uiowa.edu Sat May 8 22:43:11 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 17:43:11 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> Hi Dave, Yes, I tried to write a separate script to parse all those files, and they came out fine. It just happens when I run the entire target script; and if I replace the StandAloneBlast part with the standard RemoteBlast code, it's file, too. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 5:40 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast Hi John, Your blast report works fine for me with the following code taken from the Bio::SearchIO HOWTO: #!usr/bin/perl use strict; use warnings; use Bio::SearchIO; my $in = Bio::SearchIO->new('-file' => 'blastout', '-format' => 'blast'); while(my $result = $in->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; } } } ## Here is the output: Query=F Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Query=F Hit=ref|NC_005117.2|NC_005117 Length=18 Percent_id=100 Query=F Hit=ref|NC_005105.2|NC_005105 Length=18 Percent_id=100 Query=R Hit=ref|NC_005116.2|NC_005116 Length=27 Percent_id=100 Dave From David.Messina at sbc.su.se Sat May 8 22:58:41 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 9 May 2010 00:58:41 +0200 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> <8C486C1ABCFEEF439190FA27D3101EEB048AE771@HC-MAIL11.healthcare.uiowa.edu> Message-ID: <41281436-08D3-46F9-BDD0-A8D5306DB412@sbc.su.se> I cannot help you without seeing the code. It sounds like you've already tested the parsing part in a script by itself and that works. If you haven't already, you can test the running Blast part in its own script and see if that works. If both parts work separately, then there's something wrong with the way they have been put together. Dave From jason at bioperl.org Sat May 8 16:06:28 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 08 May 2010 09:06:28 -0700 Subject: [Bioperl-l] Bio::Align - alignment by position? In-Reply-To: <28478022.post@talk.nabble.com> References: <28478022.post@talk.nabble.com> Message-ID: <4BE58C04.8090901@bioperl.org> Not clear what you want to make. You want a new alignment that only contains the columns in your list or You want to extract each column in your list one by one? visitor555 wrote, On 5/6/10 11:51 AM: > Hi, > > I have a list alignment positions and I want to get each column them from > the alignment. If I slice the alignment the sequence with gaps in these > positions disappear. I can rotate on each seq and then split the sequence. > Is there better way to go over the alignment position by position? > > thanks ! > From jason at bioperl.org Sat May 8 16:12:26 2010 From: jason at bioperl.org (Jason Stajich) Date: Sat, 08 May 2010 09:12:26 -0700 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE4EBAA.5010709@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> Message-ID: <4BE58D6A.9080601@bioperl.org> Unless necessary I don't know if adding yet another dependency is warranted here. I don't know how complicated the words will be but can't you just strip out the numbers and do this in a schwartzian transformation? #!/usr/bin/perl -w use strict; my @arr = qw(single1 contig10 101 contig2 3); my @sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ /(\d+)/, $_] } @arr; print join("\n", at sorted),"\n"; But I'm not sure how do you want to sort 10 vs contig10 vs singlet10 reliably? -jason Florent Angly wrote, On 5/7/10 9:42 PM: > Hi all, > > I am working on updating some of the Bio::Assembly::* modules right now. > I need to sort a list of IDs. These IDs could be numbers, "words" or a > mix of the two, for example: @arr = ('singlet1', 'contig10', > 'contig2', '101', '3'); > > I cannot sort them with the numerical sort: sort { $a <=> $b } @array > This would generates warnings because some of'singlet1' the IDs are > numbers. > > I cannot sort them lexically: sort @array > Lexical sorting would not take into account numbers properly and > result in: > singlet1 contig10 contig2 3 101 > > So, what I really need is natural sorting, which is not in any core > function of Perl. I'd like to use the CPAN module Sort::Naturally for > this purpose: nsort @arr > The results would be what we expect, i.e.: > 3 101 contig2 contig10 singlet1 > > Can I add this module as an additional dependency of BioPerl? I > imagine that some other modules might want to use this. On the > assembly side, it would be used by the writing methods of > Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around > my problem that doesn't require any external module? > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 8 23:47:58 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 18:47:58 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE54C37.7020304@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> Message-ID: To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. chris On May 8, 2010, at 6:34 AM, Florent Angly wrote: > Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. > > It looks like the Bio::SeqIO modules tests could use it as well. > > Cheers, > > Florent > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun May 9 00:02:28 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 May 2010 19:02:28 -0500 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> Message-ID: <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. chris On May 8, 2010, at 6:47 PM, Chris Fields wrote: > To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. > > chris > > On May 8, 2010, at 6:34 AM, Florent Angly wrote: > >> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm ). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >> >> It looks like the Bio::SeqIO modules tests could use it as well. >> >> Cheers, >> >> Florent >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sat May 8 23:30:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 8 May 2010 19:30:48 -0400 Subject: [Bioperl-l] GitHub migration Wednesday In-Reply-To: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> References: <8A4C0AFC-4C3D-45C7-87D5-96DA02A2E0C4@illinois.edu> Message-ID: <9B5043D308B942AEB4F9AA199470812B@NewLife> Sail on, great Ship of State. ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Saturday, May 08, 2010 3:23 PM Subject: [Bioperl-l] GitHub migration Wednesday > Seems like we're all pretty much in agreement that this needs to happen sooner > than later. So, I'm scheduling the git/github migration aggressively, for > this Wednesday. Key steps: > > 1) Notify the list prior to locking the svn repo and/or making it read-only. > > 2) We need to set up post-commit hooks to forward commit messages on to > bioperl-guts and elsewhere. I have tried this out off github and so far it's > a little problematic (not working off bioperl-test, but working off my own > github commits). > > 3) The current bioperl github repos will all be replaced with their live > counterparts (branches and all), generated off the latest SVN via svn2git > (including metadata). I'll have to reinstate collaborators at that time, but > the author mapping should be the same as before (DEVACCOUNT at bioperl.org, where > DEVACCOUNT is one's user name on dev.open-bio.org). > > 4) Update the wiki pages as needed to point to the github repo instead of the > code.open-bio.org one. Also, I'm sure this will catch many devs not paying > attention to the list by surprise, so we'll need a developer migration page > set up. > > Anything else? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From manchunjohn-ma at uiowa.edu Sat May 8 21:59:08 2010 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Sat, 8 May 2010 16:59:08 -0500 Subject: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast In-Reply-To: <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> References: <8C486C1ABCFEEF439190FA27D3101EEB048AE703@HC-MAIL11.healthcare.uiowa.edu> <681337E2-F579-40EC-84F7-C30839398C22@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76E@HC-MAIL11.healthcare.uiowa.edu> <01DCE954-9F12-4AA2-A181-98E843FF48E3@sbc.su.se> <8C486C1ABCFEEF439190FA27D3101EEB048AE76F@HC-MAIL11.healthcare.uiowa.edu> <39C42774-9F9C-4CF9-90F0-4AD8F738D335@sbc.su.se> Message-ID: <8C486C1ABCFEEF439190FA27D3101EEB048AE770@HC-MAIL11.healthcare.uiowa.edu> Hi, I use bioperl-live 16950 with blast 2.2.23 I haven't been able to put together a simplier script with problem at this time, so I'd put the BLASTn outputs (in blast, blasttable and blastxml formats) here-- they look perfectly normal except that look like 2 separate output files appended together. Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Internal Medicine 3125E MERF 375 Newton Road Iowa City IA 52242 -----Original Message----- From: Dave Messina [mailto:David.Messina at sbc.su.se] Sent: Saturday, May 08, 2010 4:47 PM To: Ma, Man Chun John Cc: BioPerl List Subject: Re: [Bioperl-l] Array Handling Differences between RemoteBlast and StandAloneBlast There was a report last week of a possible problem with BLAST parsing introduced in the last few days. I don't know what the status of that is, but it's possible that it's related. In any case, if you post your code and the blast report you're parsing, we might be able to diagnose the problem. Also, what version of BioPerl are you using? Dave On May 8, 2010, at 11:37 PM, Ma, Man Chun John wrote: > Hi, > > And that's my problem here: I checked the BLAST output, and the two > sequences did get aligned-- just that SearchIO, in whatever flavour (I > tried blast, blasttable and blastxml) didn't see to do to the next > result when next_result() is called. It knows there're two results, > but still getting the first result on the second call. > > Cheers, > > > John MC Ma > Graduate Assistant > Kwitek Lab > Department of Internal Medicine > 3125E MERF > 375 Newton Road > Iowa City IA 52242 > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: Saturday, May 08, 2010 4:33 PM > To: Ma, Man Chun John > Cc: BioPerl List > Subject: Re: [Bioperl-l] Array Handling Differences between > RemoteBlast and StandAloneBlast > > Hi John, > > Please remember to keep Cc'ing the mailing list so that everyone can > participate in the discussion. > > If I understand your question correctly, yes, you can iterate through > the blast results in a report called $blast_report using next_result. > > If you haven't already, you may want to look at the SearchIO HOWTO: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > (although the BioPerl website appears to be temporarily offline, so > check back a little later.) > > > Dave > > > > On May 8, 2010, at 11:16 PM, Ma, Man Chun John wrote: > >> Hi, >> >> I have did some more investigation and found that the issue is >> probably that of SearchIO rather than StandAloneBlast--in case I made >> a mistake, so if I parsed a standard @array of Bio::Seq objects into >> StandAloneBlast (blastn with SearchIO output), the result for each of >> the seqs in the array can be assessed by $blast_report->next_result, >> right? >> >> >> John MC Ma >> Graduate Assistant >> Kwitek Lab >> Department of Internal Medicine >> 3125E MERF >> 375 Newton Road >> Iowa City IA 52242 >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Friday, May 07, 2010 5:34 PM >> To: Ma, Man Chun John >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Array Handling Differences between >> RemoteBlast and StandAloneBlast >> >> Hi John, >> >> You're right that passing parameters should work similarly for both >> RemoteBlast and StandAloneBlast, but without seeing exactly the >> parameter array you're passing, it's not possible to identify the >> problem. >> >> Could you perhaps post a small, but complete test program that >> demonstrates the problem? >> >> >> Dave >> >> >> PS - is this the actual code you ran? >> >>> My >>> $Stand_alone_blast_machine=Tools::Run::StandAloneBlast(@standslone_b >>> l >>> a >>> st_params); My >>> $blast_report=$Stand_alone_blast_machine(@primer_info{primer}); >>> While (my $result=$blast_report->next_result()){ >>> [etc, etc for iteration] >>> } >> >> I'm guessing you were paraphrasing, but I ask because My, with a >> capital "M", will generate an error, you're calling >> Tools::Run::StandAloneBlast instead of >> Bio::Tools::Run::StandAloneBlast, and there's no method call to >> new(), > i.e. it should be: >> >> my $Stand_alone_blast_machine = >> Bio::Tools::Run::StandAloneBlast->new(@standalone_blast_params); >> >> >> >> __________ Information from ESET NOD32 Antivirus, version of virus >> signature database 5095 (20100507) __________ >> >> The message was checked by ESET NOD32 Antivirus. >> >> http://www.eset.com >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: blasttable Type: application/octet-stream Size: 842 bytes Desc: blasttable URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blast.xml Type: text/xml Size: 7598 bytes Desc: blast.xml URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastout Type: application/octet-stream Size: 3576 bytes Desc: blastout URL: From florent.angly at gmail.com Sun May 9 05:12:03 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sun, 09 May 2010 15:12:03 +1000 Subject: [Bioperl-l] New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE58D6A.9080601@bioperl.org> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE58D6A.9080601@bioperl.org> Message-ID: <4BE64423.1040104@gmail.com> Within one assembly file, contig IDs typically tend to follow one formatting convention. The two most popular ones are a numerical ID, or an alphanumeric ID, such as 'contig13'. The later case already requires natural sorting. There is no way to know in advance what format to expect, and in fact, the format being specified by the user, it could be arbitrarily complicated, although probably, IDs would be sorted naturally. I will follow Chris's recommendation of using Sort::Naturally as a recommended package. The users who don't have this dependency will have their IDs sorted in a safe way, lexically. Florent On 09/05/10 02:12, Jason Stajich wrote: > Unless necessary I don't know if adding yet another dependency is > warranted here. > > I don't know how complicated the words will be but can't you just > strip out the numbers and do this in a schwartzian transformation? > > #!/usr/bin/perl -w > use strict; > my @arr = qw(single1 contig10 101 contig2 3); > my @sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ > /(\d+)/, $_] } @arr; > print join("\n", at sorted),"\n"; > > But I'm not sure how do you want to sort > 10 vs contig10 vs singlet10 reliably? > > -jason > > Florent Angly wrote, On 5/7/10 9:42 PM: >> Hi all, >> >> I am working on updating some of the Bio::Assembly::* modules right now. >> I need to sort a list of IDs. These IDs could be numbers, "words" or >> a mix of the two, for example: @arr = ('singlet1', >> 'contig10', 'contig2', '101', '3'); >> >> I cannot sort them with the numerical sort: sort { $a <=> $b } @array >> This would generates warnings because some of'singlet1' the IDs are >> numbers. >> >> I cannot sort them lexically: sort @array >> Lexical sorting would not take into account numbers properly and >> result in: >> singlet1 contig10 contig2 3 101 >> >> So, what I really need is natural sorting, which is not in any core >> function of Perl. I'd like to use the CPAN module Sort::Naturally for >> this purpose: nsort @arr >> The results would be what we expect, i.e.: >> 3 101 contig2 contig10 singlet1 >> >> Can I add this module as an additional dependency of BioPerl? I >> imagine that some other modules might want to use this. On the >> assembly side, it would be used by the writing methods of >> Bio::Assembly::IO::tigr and ace. Or maybe there is an easy way around >> my problem that doesn't require any external module? >> >> Florent >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Sun May 9 07:26:19 2010 From: florent.angly at gmail.com (Florent Angly) Date: Sun, 09 May 2010 17:26:19 +1000 Subject: [Bioperl-l] Read/write round-tripping Was: Re: New Bioperl dependency? Sort::Naturally In-Reply-To: <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> Message-ID: <4BE6639B.6060004@gmail.com> Chris, I've thought some more on the problem and I now agree with you that round-tripping at the object-level is more powerful. It has the problem that some objects are given IDs dynamically every time, which means that identical input files won't have an identical object. > is_deeply( $obj_out , $obj_in , 'deep compare' ); > not ok 1 - deep compare > # Failed test 'deep compare' > # at ./test_roundtrip.pl line 33. > # Structures begin differing at: > # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '56438592' > # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '54980512' > 1..1 > # Looks like you failed 1 test of 1. And when I re-run this again: > not ok 1 - deep compare > # Failed test 'deep compare' > # at ./test_roundtrip.pl line 33. > # Structures begin differing at: > # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '47763264' > # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '46305184' > 1..1 > # Looks like you failed 1 test of 1. Note how the value of _btree changes everytime. Maybe using Test::Deep would be a good approach (http://search.cpan.org/~fdaly/Test-Deep-0.106/lib/Test/Deep.pod): > Where it becomes more interesting is in allowing you to do something > besides simple exact comparisons. With strings, the |eq| operator > checks that 2 strings are exactly equal but sometimes that's not what > you want. When you don't know exactly what the string should be but > you do know some things about how it should look, |eq| is no good and > you must use pattern matching instead. Test::Deep provides pattern > matching for complex data structures Florent On 09/05/10 10:02, Chris Fields wrote: > Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. > > chris > > On May 8, 2010, at 6:47 PM, Chris Fields wrote: > > >> To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. >> >> chris >> >> On May 8, 2010, at 6:34 AM, Florent Angly wrote: >> >> >>> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >>> >>> It looks like the Bio::SeqIO modules tests could use it as well. >>> >>> Cheers, >>> >>> Florent >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From ibi2008006 at iiita.ac.in Sun May 9 14:46:28 2010 From: ibi2008006 at iiita.ac.in (roserp) Date: Sun, 9 May 2010 07:46:28 -0700 (PDT) Subject: [Bioperl-l] where to find standard substitution matrices Message-ID: <28503204.post@talk.nabble.com> hi , I want blosum62, blosum80 , pam30, and pam70 matrices. I am getting different values in different sites for these matrices. can anyone suggest some authenticated site for getting these ?? thanks in advance -- View this message in context: http://old.nabble.com/where-to-find-standard-substitution-matrices-tp28503204p28503204.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From razi.khaja at gmail.com Sun May 9 19:23:47 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 15:23:47 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: Attached (blast.pm.diff) is a patch that fixes Heikki's problem. Can someone advise an appropriate way to have this patch applied, given that it is an amendment to a previous patch? Thanks Razi ---------- Forwarded message ---------- From: Heikki Lehvaslaiho Date: Wed, May 5, 2010 at 2:11 AM Subject: Re: [Bioperl-l] BLAST parsing broken To: Razi Khaja Hi Raja, Thanks for trying to fix this. I am attaching an example output file to this message. I just tested again that master from git repository fails to get query ID, but the previous version works. bala ~/src/bioperl-live> git checkout master Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp output Switched to branch 'master' When I started using the latest mpiBLAST code a few months ago I did compare the 0 output from it to standard NCBI blast and they were identical. Also, I've noticed a discrepancy between within bioperl blast parsing that I have not had time to work on. Would you be interested in having a look? I am creating output from mpiBLAST in 0 format and then converting it into tab-delimited 8 format. I am unable to get 100% similarity for all cases when I compare the conversion to the output straight from mpiBLAST in format 8. Sometimes the mismatch and gap values are off by one. I am attaching a script that does the conversion. It is the same one I was using when I noticed the problem above. I was going to put the code into bioperl but that got delayed when I noticed the discrepancies. Cheers, -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 4 May 2010 20:55, Razi Khaja wrote: > That is odd. Heikki, do you have a blast output file that produces this > error? > Could you attach the file and either send to the list or myself (if the > list > does not accept attachments). > Thanks, > Razi > > > On Mon, May 3, 2010 at 8:08 AM, Chris Fields > wrote: > > > Odd, I ran tests on that prior to commit. I'll work on fixing that (in > svn > > of course, until the migration is complete). > > > > chris > > > > On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > > > > > Chris, > > > > > > latest additions to Bio::SearchIO::blast.pm broke the parsing of > normal > > > blast output. $result->query_name returns now undef. > > > > > > (Using the anonymous git now). This change still works: > > > > > > commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > > Author: cjfields > > > Date: Sun Dec 20 04:39:58 2009 +0000 > > > > > > Robson's patch for buggy blastpgp output > > > > > > But this does not: > > > > > > commit 9a89c3434597104dd50553e3562983d78d14a544 > > > Author: cjfields > > > Date: Thu Apr 15 04:21:17 2010 +0000 > > > > > > [bug 3031] > > > > > > patches for catching algorithm ref, courtesy Razi Khaja. > > > > > > That makes it easy to find the diffs: > > > > > > $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > > > 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > > > diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > > > index 378023a..6f7eeeb 100644 > > > --- a/Bio/SearchIO/blast.pm > > > +++ b/Bio/SearchIO/blast.pm > > > @@ -209,6 +209,7 @@ BEGIN { > > > > > > 'BlastOutput_program' => 'RESULT-algorithm_name', > > > 'BlastOutput_version' => > 'RESULT-algorithm_version', > > > + 'BlastOutput_algorithm-reference' => > > 'RESULT-algorithm_reference', > > > 'BlastOutput_query-def' => 'RESULT-query_name', > > > 'BlastOutput_query-len' => 'RESULT-query_length', > > > 'BlastOutput_query-acc' => 'RESULT-query_accession', > > > @@ -504,6 +505,26 @@ sub next_result { > > > } > > > ); > > > } > > > + # parse the BLAST algorithm reference > > > + elsif(/^Reference:\s+(.*)$/) { > > > + # want to preserve newlines for the BLAST algorithm > > reference > > > + my $algorithm_reference = "$1\n"; > > > + $_ = $self->_readline; > > > + # while the current line, does not match an empty line, a > > RID:, > > > or a Database:, we are still looking at the > > > + # algorithm_reference, append it to what we parsed so far > > > + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { > > > + $algorithm_reference .= "$_"; > > > + $_ = $self->_readline; > > > + } > > > + # if we exited the while loop, we saw an empty line, a > RID:, > > or > > > a Database:, so push it back > > > + $self->_pushback($_); > > > + $self->element( > > > + { > > > + 'Name' => 'BlastOutput_algorithm-reference', > > > + 'Data' => $algorithm_reference > > > + } > > > + ); > > > + } > > > # added Windows workaround for bug 1985 > > > elsif (/^(Searching|Results from round)/) { > > > next unless $1 =~ /Results from round/; > > > > > > > > > I am not sure why reference parsing messes things up. Maybe it eats too > > many > > > lines from the result file. > > > > > > Yours, > > > > > > -Heikki > > > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > > #4216 > > > 4700 King Abdullah University of Science and Technology (KAUST) > > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: mpiblast.out Type: application/octet-stream Size: 34844 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastparser028.pl Type: application/x-perl Size: 2024 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blast.pm.diff Type: text/x-patch Size: 994 bytes Desc: not available URL: From cjfields at illinois.edu Sun May 9 20:43:29 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 15:43:29 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> Message-ID: <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> If the patch is against main trunk it isn't a problem, otherwise the diff should be vs. that code. chris On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > Can someone advise an appropriate way to have this patch applied, given that > it is an amendment to a previous patch? > Thanks > Razi > > > ---------- Forwarded message ---------- > From: Heikki Lehvaslaiho > Date: Wed, May 5, 2010 at 2:11 AM > Subject: Re: [Bioperl-l] BLAST parsing broken > To: Razi Khaja > > > Hi Raja, > > Thanks for trying to fix this. > > I am attaching an example output file to this message. I just tested again > that master from git repository fails to get query ID, but the previous > version works. > > bala ~/src/bioperl-live> git checkout master > Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > output > Switched to branch 'master' > > When I started using the latest mpiBLAST code a few months ago I did compare > the 0 output from it to standard NCBI blast and they were identical. > > > > > Also, I've noticed a discrepancy between within bioperl blast parsing that > I have not had time to work on. Would you be interested in having a look? > > I am creating output from mpiBLAST in 0 format and then converting it into > tab-delimited 8 format. I am unable to get 100% similarity for all cases > when I compare the conversion to the output straight from mpiBLAST in format > 8. Sometimes the mismatch and gap values are off by one. > > I am attaching a script that does the conversion. It is the same one I was > using when I noticed the problem above. I was going to put the code into > bioperl but that got delayed when I noticed the discrepancies. > > > Cheers, > > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > On 4 May 2010 20:55, Razi Khaja wrote: > >> That is odd. Heikki, do you have a blast output file that produces this >> error? >> Could you attach the file and either send to the list or myself (if the >> list >> does not accept attachments). >> Thanks, >> Razi >> >> >> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >> wrote: >> >>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in >> svn >>> of course, until the migration is complete). >>> >>> chris >>> >>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>> >>>> Chris, >>>> >>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >> normal >>>> blast output. $result->query_name returns now undef. >>>> >>>> (Using the anonymous git now). This change still works: >>>> >>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>> Author: cjfields >>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>> >>>> Robson's patch for buggy blastpgp output >>>> >>>> But this does not: >>>> >>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>> Author: cjfields >>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>> >>>> [bug 3031] >>>> >>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>> >>>> That makes it easy to find the diffs: >>>> >>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>> index 378023a..6f7eeeb 100644 >>>> --- a/Bio/SearchIO/blast.pm >>>> +++ b/Bio/SearchIO/blast.pm >>>> @@ -209,6 +209,7 @@ BEGIN { >>>> >>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>> 'BlastOutput_version' => >> 'RESULT-algorithm_version', >>>> + 'BlastOutput_algorithm-reference' => >>> 'RESULT-algorithm_reference', >>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>> @@ -504,6 +505,26 @@ sub next_result { >>>> } >>>> ); >>>> } >>>> + # parse the BLAST algorithm reference >>>> + elsif(/^Reference:\s+(.*)$/) { >>>> + # want to preserve newlines for the BLAST algorithm >>> reference >>>> + my $algorithm_reference = "$1\n"; >>>> + $_ = $self->_readline; >>>> + # while the current line, does not match an empty line, a >>> RID:, >>>> or a Database:, we are still looking at the >>>> + # algorithm_reference, append it to what we parsed so far >>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) { >>>> + $algorithm_reference .= "$_"; >>>> + $_ = $self->_readline; >>>> + } >>>> + # if we exited the while loop, we saw an empty line, a >> RID:, >>> or >>>> a Database:, so push it back >>>> + $self->_pushback($_); >>>> + $self->element( >>>> + { >>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>> + 'Data' => $algorithm_reference >>>> + } >>>> + ); >>>> + } >>>> # added Windows workaround for bug 1985 >>>> elsif (/^(Searching|Results from round)/) { >>>> next unless $1 =~ /Results from round/; >>>> >>>> >>>> I am not sure why reference parsing messes things up. Maybe it eats too >>> many >>>> lines from the result file. >>>> >>>> Yours, >>>> >>>> -Heikki >>>> >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>> >>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>> #4216 >>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Sun May 9 21:15:38 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 17:15:38 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: Hi Chris, The patch is against the main trunk. I checked out version 11326 of the repository today. Razi On Sun, May 9, 2010 at 4:43 PM, Chris Fields wrote: > If the patch is against main trunk it isn't a problem, otherwise the diff > should be vs. that code. > > chris > > On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > > > Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > > Can someone advise an appropriate way to have this patch applied, given > that > > it is an amendment to a previous patch? > > Thanks > > Razi > > > > > > ---------- Forwarded message ---------- > > From: Heikki Lehvaslaiho > > Date: Wed, May 5, 2010 at 2:11 AM > > Subject: Re: [Bioperl-l] BLAST parsing broken > > To: Razi Khaja > > > > > > Hi Raja, > > > > Thanks for trying to fix this. > > > > I am attaching an example output file to this message. I just tested > again > > that master from git repository fails to get query ID, but the previous > > version works. > > > > bala ~/src/bioperl-live> git checkout master > > Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > > output > > Switched to branch 'master' > > > > When I started using the latest mpiBLAST code a few months ago I did > compare > > the 0 output from it to standard NCBI blast and they were identical. > > > > > > > > > > Also, I've noticed a discrepancy between within bioperl blast parsing > that > > I have not had time to work on. Would you be interested in having a look? > > > > I am creating output from mpiBLAST in 0 format and then converting it > into > > tab-delimited 8 format. I am unable to get 100% similarity for all cases > > when I compare the conversion to the output straight from mpiBLAST in > format > > 8. Sometimes the mismatch and gap values are off by one. > > > > I am attaching a script that does the conversion. It is the same one I > was > > using when I noticed the problem above. I was going to put the code into > > bioperl but that got delayed when I noticed the discrepancies. > > > > > > Cheers, > > > > > > -Heikki > > > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > cell: +966 545 595 849 office: +966 2 808 2429 > > > > Computational Bioscience Research Centre (CBRC), Building #2, Office > #4216 > > 4700 King Abdullah University of Science and Technology (KAUST) > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > > > > > On 4 May 2010 20:55, Razi Khaja wrote: > > > >> That is odd. Heikki, do you have a blast output file that produces this > >> error? > >> Could you attach the file and either send to the list or myself (if the > >> list > >> does not accept attachments). > >> Thanks, > >> Razi > >> > >> > >> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > >> wrote: > >> > >>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in > >> svn > >>> of course, until the migration is complete). > >>> > >>> chris > >>> > >>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>> > >>>> Chris, > >>>> > >>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >> normal > >>>> blast output. $result->query_name returns now undef. > >>>> > >>>> (Using the anonymous git now). This change still works: > >>>> > >>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>> Author: cjfields > >>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>> > >>>> Robson's patch for buggy blastpgp output > >>>> > >>>> But this does not: > >>>> > >>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>> Author: cjfields > >>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>> > >>>> [bug 3031] > >>>> > >>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>> > >>>> That makes it easy to find the diffs: > >>>> > >>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>> index 378023a..6f7eeeb 100644 > >>>> --- a/Bio/SearchIO/blast.pm > >>>> +++ b/Bio/SearchIO/blast.pm > >>>> @@ -209,6 +209,7 @@ BEGIN { > >>>> > >>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>> 'BlastOutput_version' => > >> 'RESULT-algorithm_version', > >>>> + 'BlastOutput_algorithm-reference' => > >>> 'RESULT-algorithm_reference', > >>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>> @@ -504,6 +505,26 @@ sub next_result { > >>>> } > >>>> ); > >>>> } > >>>> + # parse the BLAST algorithm reference > >>>> + elsif(/^Reference:\s+(.*)$/) { > >>>> + # want to preserve newlines for the BLAST algorithm > >>> reference > >>>> + my $algorithm_reference = "$1\n"; > >>>> + $_ = $self->_readline; > >>>> + # while the current line, does not match an empty line, a > >>> RID:, > >>>> or a Database:, we are still looking at the > >>>> + # algorithm_reference, append it to what we parsed so far > >>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) > { > >>>> + $algorithm_reference .= "$_"; > >>>> + $_ = $self->_readline; > >>>> + } > >>>> + # if we exited the while loop, we saw an empty line, a > >> RID:, > >>> or > >>>> a Database:, so push it back > >>>> + $self->_pushback($_); > >>>> + $self->element( > >>>> + { > >>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>> + 'Data' => $algorithm_reference > >>>> + } > >>>> + ); > >>>> + } > >>>> # added Windows workaround for bug 1985 > >>>> elsif (/^(Searching|Results from round)/) { > >>>> next unless $1 =~ /Results from round/; > >>>> > >>>> > >>>> I am not sure why reference parsing messes things up. Maybe it eats > too > >>> many > >>>> lines from the result file. > >>>> > >>>> Yours, > >>>> > >>>> -Heikki > >>>> > >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>> > >>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>> #4216 > >>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >_______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun May 9 21:30:52 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 16:30:52 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: Then something is wrong, as current trunk is at r16969. Where are you pulling your code from? Our only working anon. server is the sync'ed github one. chris On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > Hi Chris, > The patch is against the main trunk. I checked out version 11326 of the > repository today. > Razi > > > On Sun, May 9, 2010 at 4:43 PM, Chris Fields wrote: > >> If the patch is against main trunk it isn't a problem, otherwise the diff >> should be vs. that code. >> >> chris >> >> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: >> >>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. >>> Can someone advise an appropriate way to have this patch applied, given >> that >>> it is an amendment to a previous patch? >>> Thanks >>> Razi >>> >>> >>> ---------- Forwarded message ---------- >>> From: Heikki Lehvaslaiho >>> Date: Wed, May 5, 2010 at 2:11 AM >>> Subject: Re: [Bioperl-l] BLAST parsing broken >>> To: Razi Khaja >>> >>> >>> Hi Raja, >>> >>> Thanks for trying to fix this. >>> >>> I am attaching an example output file to this message. I just tested >> again >>> that master from git repository fails to get query ID, but the previous >>> version works. >>> >>> bala ~/src/bioperl-live> git checkout master >>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp >>> output >>> Switched to branch 'master' >>> >>> When I started using the latest mpiBLAST code a few months ago I did >> compare >>> the 0 output from it to standard NCBI blast and they were identical. >>> >>> >>> >>> >>> Also, I've noticed a discrepancy between within bioperl blast parsing >> that >>> I have not had time to work on. Would you be interested in having a look? >>> >>> I am creating output from mpiBLAST in 0 format and then converting it >> into >>> tab-delimited 8 format. I am unable to get 100% similarity for all cases >>> when I compare the conversion to the output straight from mpiBLAST in >> format >>> 8. Sometimes the mismatch and gap values are off by one. >>> >>> I am attaching a script that does the conversion. It is the same one I >> was >>> using when I noticed the problem above. I was going to put the code into >>> bioperl but that got delayed when I noticed the discrepancies. >>> >>> >>> Cheers, >>> >>> >>> -Heikki >>> >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +966 545 595 849 office: +966 2 808 2429 >>> >>> Computational Bioscience Research Centre (CBRC), Building #2, Office >> #4216 >>> 4700 King Abdullah University of Science and Technology (KAUST) >>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>> >>> >>> >>> On 4 May 2010 20:55, Razi Khaja wrote: >>> >>>> That is odd. Heikki, do you have a blast output file that produces this >>>> error? >>>> Could you attach the file and either send to the list or myself (if the >>>> list >>>> does not accept attachments). >>>> Thanks, >>>> Razi >>>> >>>> >>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >>>> wrote: >>>> >>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in >>>> svn >>>>> of course, until the migration is complete). >>>>> >>>>> chris >>>>> >>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>>>> >>>>>> Chris, >>>>>> >>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >>>> normal >>>>>> blast output. $result->query_name returns now undef. >>>>>> >>>>>> (Using the anonymous git now). This change still works: >>>>>> >>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>> Author: cjfields >>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>>>> >>>>>> Robson's patch for buggy blastpgp output >>>>>> >>>>>> But this does not: >>>>>> >>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>>>> Author: cjfields >>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>>>> >>>>>> [bug 3031] >>>>>> >>>>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>>>> >>>>>> That makes it easy to find the diffs: >>>>>> >>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>>>> index 378023a..6f7eeeb 100644 >>>>>> --- a/Bio/SearchIO/blast.pm >>>>>> +++ b/Bio/SearchIO/blast.pm >>>>>> @@ -209,6 +209,7 @@ BEGIN { >>>>>> >>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>>>> 'BlastOutput_version' => >>>> 'RESULT-algorithm_version', >>>>>> + 'BlastOutput_algorithm-reference' => >>>>> 'RESULT-algorithm_reference', >>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>>>> @@ -504,6 +505,26 @@ sub next_result { >>>>>> } >>>>>> ); >>>>>> } >>>>>> + # parse the BLAST algorithm reference >>>>>> + elsif(/^Reference:\s+(.*)$/) { >>>>>> + # want to preserve newlines for the BLAST algorithm >>>>> reference >>>>>> + my $algorithm_reference = "$1\n"; >>>>>> + $_ = $self->_readline; >>>>>> + # while the current line, does not match an empty line, a >>>>> RID:, >>>>>> or a Database:, we are still looking at the >>>>>> + # algorithm_reference, append it to what we parsed so far >>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/) >> { >>>>>> + $algorithm_reference .= "$_"; >>>>>> + $_ = $self->_readline; >>>>>> + } >>>>>> + # if we exited the while loop, we saw an empty line, a >>>> RID:, >>>>> or >>>>>> a Database:, so push it back >>>>>> + $self->_pushback($_); >>>>>> + $self->element( >>>>>> + { >>>>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>>>> + 'Data' => $algorithm_reference >>>>>> + } >>>>>> + ); >>>>>> + } >>>>>> # added Windows workaround for bug 1985 >>>>>> elsif (/^(Searching|Results from round)/) { >>>>>> next unless $1 =~ /Results from round/; >>>>>> >>>>>> >>>>>> I am not sure why reference parsing messes things up. Maybe it eats >> too >>>>> many >>>>>> lines from the result file. >>>>>> >>>>>> Yours, >>>>>> >>>>>> -Heikki >>>>>> >>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>>> >>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>>> #4216 >>>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Sun May 9 23:48:28 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Sun, 9 May 2010 19:48:28 -0400 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: I checked out bioperl-live from github: svn checkout http://svn.github.com/bioperl/bioperl-live.git I just checked it out again, a few seconds ago and by default I got revision 11326. Razi On Sun, May 9, 2010 at 5:30 PM, Chris Fields wrote: > Then something is wrong, as current trunk is at r16969. Where are you > pulling your code from? Our only working anon. server is the sync'ed github > one. > > chris > > On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > > > Hi Chris, > > The patch is against the main trunk. I checked out version 11326 of the > > repository today. > > Razi > > > > > > On Sun, May 9, 2010 at 4:43 PM, Chris Fields > wrote: > > > >> If the patch is against main trunk it isn't a problem, otherwise the > diff > >> should be vs. that code. > >> > >> chris > >> > >> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > >> > >>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > >>> Can someone advise an appropriate way to have this patch applied, given > >> that > >>> it is an amendment to a previous patch? > >>> Thanks > >>> Razi > >>> > >>> > >>> ---------- Forwarded message ---------- > >>> From: Heikki Lehvaslaiho > >>> Date: Wed, May 5, 2010 at 2:11 AM > >>> Subject: Re: [Bioperl-l] BLAST parsing broken > >>> To: Razi Khaja > >>> > >>> > >>> Hi Raja, > >>> > >>> Thanks for trying to fix this. > >>> > >>> I am attaching an example output file to this message. I just tested > >> again > >>> that master from git repository fails to get query ID, but the previous > >>> version works. > >>> > >>> bala ~/src/bioperl-live> git checkout master > >>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp > >>> output > >>> Switched to branch 'master' > >>> > >>> When I started using the latest mpiBLAST code a few months ago I did > >> compare > >>> the 0 output from it to standard NCBI blast and they were identical. > >>> > >>> > >>> > >>> > >>> Also, I've noticed a discrepancy between within bioperl blast parsing > >> that > >>> I have not had time to work on. Would you be interested in having a > look? > >>> > >>> I am creating output from mpiBLAST in 0 format and then converting it > >> into > >>> tab-delimited 8 format. I am unable to get 100% similarity for all > cases > >>> when I compare the conversion to the output straight from mpiBLAST in > >> format > >>> 8. Sometimes the mismatch and gap values are off by one. > >>> > >>> I am attaching a script that does the conversion. It is the same one I > >> was > >>> using when I noticed the problem above. I was going to put the code > into > >>> bioperl but that got delayed when I noticed the discrepancies. > >>> > >>> > >>> Cheers, > >>> > >>> > >>> -Heikki > >>> > >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>> cell: +966 545 595 849 office: +966 2 808 2429 > >>> > >>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >> #4216 > >>> 4700 King Abdullah University of Science and Technology (KAUST) > >>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>> > >>> > >>> > >>> On 4 May 2010 20:55, Razi Khaja wrote: > >>> > >>>> That is odd. Heikki, do you have a blast output file that produces > this > >>>> error? > >>>> Could you attach the file and either send to the list or myself (if > the > >>>> list > >>>> does not accept attachments). > >>>> Thanks, > >>>> Razi > >>>> > >>>> > >>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > >>>> wrote: > >>>> > >>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that > (in > >>>> svn > >>>>> of course, until the migration is complete). > >>>>> > >>>>> chris > >>>>> > >>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>>>> > >>>>>> Chris, > >>>>>> > >>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >>>> normal > >>>>>> blast output. $result->query_name returns now undef. > >>>>>> > >>>>>> (Using the anonymous git now). This change still works: > >>>>>> > >>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>> Author: cjfields > >>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>>>> > >>>>>> Robson's patch for buggy blastpgp output > >>>>>> > >>>>>> But this does not: > >>>>>> > >>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>>>> Author: cjfields > >>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>>>> > >>>>>> [bug 3031] > >>>>>> > >>>>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>>>> > >>>>>> That makes it easy to find the diffs: > >>>>>> > >>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>>>> index 378023a..6f7eeeb 100644 > >>>>>> --- a/Bio/SearchIO/blast.pm > >>>>>> +++ b/Bio/SearchIO/blast.pm > >>>>>> @@ -209,6 +209,7 @@ BEGIN { > >>>>>> > >>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>>>> 'BlastOutput_version' => > >>>> 'RESULT-algorithm_version', > >>>>>> + 'BlastOutput_algorithm-reference' => > >>>>> 'RESULT-algorithm_reference', > >>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>>>> @@ -504,6 +505,26 @@ sub next_result { > >>>>>> } > >>>>>> ); > >>>>>> } > >>>>>> + # parse the BLAST algorithm reference > >>>>>> + elsif(/^Reference:\s+(.*)$/) { > >>>>>> + # want to preserve newlines for the BLAST algorithm > >>>>> reference > >>>>>> + my $algorithm_reference = "$1\n"; > >>>>>> + $_ = $self->_readline; > >>>>>> + # while the current line, does not match an empty line, > a > >>>>> RID:, > >>>>>> or a Database:, we are still looking at the > >>>>>> + # algorithm_reference, append it to what we parsed so > far > >>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ > /^Database:/) > >> { > >>>>>> + $algorithm_reference .= "$_"; > >>>>>> + $_ = $self->_readline; > >>>>>> + } > >>>>>> + # if we exited the while loop, we saw an empty line, a > >>>> RID:, > >>>>> or > >>>>>> a Database:, so push it back > >>>>>> + $self->_pushback($_); > >>>>>> + $self->element( > >>>>>> + { > >>>>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>>>> + 'Data' => $algorithm_reference > >>>>>> + } > >>>>>> + ); > >>>>>> + } > >>>>>> # added Windows workaround for bug 1985 > >>>>>> elsif (/^(Searching|Results from round)/) { > >>>>>> next unless $1 =~ /Results from round/; > >>>>>> > >>>>>> > >>>>>> I am not sure why reference parsing messes things up. Maybe it eats > >> too > >>>>> many > >>>>>> lines from the result file. > >>>>>> > >>>>>> Yours, > >>>>>> > >>>>>> -Heikki > >>>>>> > >>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>>> > >>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>>>> #4216 > >>>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 10 00:39:33 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 May 2010 19:39:33 -0500 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> Message-ID: <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> Ok, that's fine. It may be something off with revision numbers when using svn with github (git doesn't have incremental revisions, but a SHA). Committed the patch to dev svn, in r16970. chris On May 9, 2010, at 6:48 PM, Razi Khaja wrote: > I checked out bioperl-live from github: > svn checkout http://svn.github.com/bioperl/bioperl-live.git > > I just checked it out again, a few seconds ago and by default I got revision > 11326. > Razi > > > On Sun, May 9, 2010 at 5:30 PM, Chris Fields wrote: > >> Then something is wrong, as current trunk is at r16969. Where are you >> pulling your code from? Our only working anon. server is the sync'ed github >> one. >> >> chris >> >> On May 9, 2010, at 4:15 PM, Razi Khaja wrote: >> >>> Hi Chris, >>> The patch is against the main trunk. I checked out version 11326 of the >>> repository today. >>> Razi >>> >>> >>> On Sun, May 9, 2010 at 4:43 PM, Chris Fields >> wrote: >>> >>>> If the patch is against main trunk it isn't a problem, otherwise the >> diff >>>> should be vs. that code. >>>> >>>> chris >>>> >>>> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: >>>> >>>>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. >>>>> Can someone advise an appropriate way to have this patch applied, given >>>> that >>>>> it is an amendment to a previous patch? >>>>> Thanks >>>>> Razi >>>>> >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: Heikki Lehvaslaiho >>>>> Date: Wed, May 5, 2010 at 2:11 AM >>>>> Subject: Re: [Bioperl-l] BLAST parsing broken >>>>> To: Razi Khaja >>>>> >>>>> >>>>> Hi Raja, >>>>> >>>>> Thanks for trying to fix this. >>>>> >>>>> I am attaching an example output file to this message. I just tested >>>> again >>>>> that master from git repository fails to get query ID, but the previous >>>>> version works. >>>>> >>>>> bala ~/src/bioperl-live> git checkout master >>>>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp >>>>> output >>>>> Switched to branch 'master' >>>>> >>>>> When I started using the latest mpiBLAST code a few months ago I did >>>> compare >>>>> the 0 output from it to standard NCBI blast and they were identical. >>>>> >>>>> >>>>> >>>>> >>>>> Also, I've noticed a discrepancy between within bioperl blast parsing >>>> that >>>>> I have not had time to work on. Would you be interested in having a >> look? >>>>> >>>>> I am creating output from mpiBLAST in 0 format and then converting it >>>> into >>>>> tab-delimited 8 format. I am unable to get 100% similarity for all >> cases >>>>> when I compare the conversion to the output straight from mpiBLAST in >>>> format >>>>> 8. Sometimes the mismatch and gap values are off by one. >>>>> >>>>> I am attaching a script that does the conversion. It is the same one I >>>> was >>>>> using when I noticed the problem above. I was going to put the code >> into >>>>> bioperl but that got delayed when I noticed the discrepancies. >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> >>>>> -Heikki >>>>> >>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>> >>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>> #4216 >>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>> >>>>> >>>>> >>>>> On 4 May 2010 20:55, Razi Khaja wrote: >>>>> >>>>>> That is odd. Heikki, do you have a blast output file that produces >> this >>>>>> error? >>>>>> Could you attach the file and either send to the list or myself (if >> the >>>>>> list >>>>>> does not accept attachments). >>>>>> Thanks, >>>>>> Razi >>>>>> >>>>>> >>>>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields >>>>>> wrote: >>>>>> >>>>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that >> (in >>>>>> svn >>>>>>> of course, until the migration is complete). >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: >>>>>>> >>>>>>>> Chris, >>>>>>>> >>>>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of >>>>>> normal >>>>>>>> blast output. $result->query_name returns now undef. >>>>>>>> >>>>>>>> (Using the anonymous git now). This change still works: >>>>>>>> >>>>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>>>> Author: cjfields >>>>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 >>>>>>>> >>>>>>>> Robson's patch for buggy blastpgp output >>>>>>>> >>>>>>>> But this does not: >>>>>>>> >>>>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 >>>>>>>> Author: cjfields >>>>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 >>>>>>>> >>>>>>>> [bug 3031] >>>>>>>> >>>>>>>> patches for catching algorithm ref, courtesy Razi Khaja. >>>>>>>> >>>>>>>> That makes it easy to find the diffs: >>>>>>>> >>>>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 >>>>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm >>>>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm >>>>>>>> index 378023a..6f7eeeb 100644 >>>>>>>> --- a/Bio/SearchIO/blast.pm >>>>>>>> +++ b/Bio/SearchIO/blast.pm >>>>>>>> @@ -209,6 +209,7 @@ BEGIN { >>>>>>>> >>>>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', >>>>>>>> 'BlastOutput_version' => >>>>>> 'RESULT-algorithm_version', >>>>>>>> + 'BlastOutput_algorithm-reference' => >>>>>>> 'RESULT-algorithm_reference', >>>>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', >>>>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', >>>>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', >>>>>>>> @@ -504,6 +505,26 @@ sub next_result { >>>>>>>> } >>>>>>>> ); >>>>>>>> } >>>>>>>> + # parse the BLAST algorithm reference >>>>>>>> + elsif(/^Reference:\s+(.*)$/) { >>>>>>>> + # want to preserve newlines for the BLAST algorithm >>>>>>> reference >>>>>>>> + my $algorithm_reference = "$1\n"; >>>>>>>> + $_ = $self->_readline; >>>>>>>> + # while the current line, does not match an empty line, >> a >>>>>>> RID:, >>>>>>>> or a Database:, we are still looking at the >>>>>>>> + # algorithm_reference, append it to what we parsed so >> far >>>>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ >> /^Database:/) >>>> { >>>>>>>> + $algorithm_reference .= "$_"; >>>>>>>> + $_ = $self->_readline; >>>>>>>> + } >>>>>>>> + # if we exited the while loop, we saw an empty line, a >>>>>> RID:, >>>>>>> or >>>>>>>> a Database:, so push it back >>>>>>>> + $self->_pushback($_); >>>>>>>> + $self->element( >>>>>>>> + { >>>>>>>> + 'Name' => 'BlastOutput_algorithm-reference', >>>>>>>> + 'Data' => $algorithm_reference >>>>>>>> + } >>>>>>>> + ); >>>>>>>> + } >>>>>>>> # added Windows workaround for bug 1985 >>>>>>>> elsif (/^(Searching|Results from round)/) { >>>>>>>> next unless $1 =~ /Results from round/; >>>>>>>> >>>>>>>> >>>>>>>> I am not sure why reference parsing messes things up. Maybe it eats >>>> too >>>>>>> many >>>>>>>> lines from the result file. >>>>>>>> >>>>>>>> Yours, >>>>>>>> >>>>>>>> -Heikki >>>>>>>> >>>>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>>>>>> cell: +966 545 595 849 office: +966 2 808 2429 >>>>>>>> >>>>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office >>>>>>> #4216 >>>>>>>> 4700 King Abdullah University of Science and Technology (KAUST) >>>>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> >>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cmb433 at nyu.edu Mon May 10 02:22:52 2010 From: cmb433 at nyu.edu (bergeycm) Date: Sun, 9 May 2010 19:22:52 -0700 (PDT) Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely Message-ID: <28506482.post@talk.nabble.com> Hi all, I'm attempting to query GenBank for all sequences' lengths for a given taxon. I'm using get_Stream_by_query(), but only to grab the species, length, and accession. The genus of interest has almost 500,000 GB entries, though, and my code hangs up at odd points in the info-gathering loop. (Often after only 300 or 400 iterations.) The problem is that $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back undefined. I've tried wrapping the next_seq portion of the code in an eval block, but to no avail. Is there a way to split a query into a bunch of small streams that aren't too much to ask? Or is there a way to pick up a dropped SeqIO stream? I think the connection is timing out and the stream is being lost. Any advice is greatly appreciated, as I'm fairly new to BioPerl. - bergeycm use Bio::DB::GenBank; use Bio::DB::Query::GenBank; # Get general things ready to go for querying GenBank my %options; $options{'-maxids'} = '500000'; # There are presently 460,184 sequences $options{'-db'} = 'nucleotide'; $options{'-query'} = "Pongo [ORGN]"; # Orangutans my $query_obj = Bio::DB::Query::GenBank->new(%options); my $total = $query_obj->count; my $gb_obj = Bio::DB::GenBank->new(); my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); # Restrict info to just what I'll be using. No sequence necessary. my $builder = $stream_obj->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('species','length','accession'); my $c = 0; for (1 .. $total) { eval { my $seq_obj = $stream_obj->next_seq; my $flavor = $seq_obj->species; print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", $seq_obj->length, "\t", $seq_obj->accession, "\n"; }; if ($@) { print $!, '\n'; } # Pause for a little over a third of a second select(undef, undef, undef, 0.35); $c++; } -- View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From robert.bradbury at gmail.com Mon May 10 05:38:09 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 10 May 2010 01:38:09 -0400 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <28506482.post@talk.nabble.com> References: <28506482.post@talk.nabble.com> Message-ID: I don't know whether this is related or not. But the last time I tried to fetch a moderately large genome (NS_000198 for *Podospera anserina*) it failed [1]. It takes a *very* long time and eventually springs an "Out of Memory" error. This is on a Pentium IV Prescott which only has a 4GB address space (configured for 3GB for user programs) and after running a long strace on the perl process it seemed that what was happening was that it was never properly returning and merging memory from the sequence chunks which were being fetched. The final program address was brk(0xafd8c000) or 2,950,217,728 which is probably the maximum amount of data space a user program can have considering that one needs room for the stack. After that the mmap2() calls started failing with ENOMEM. If Bio::DB::GenBank::Query is intelligent enough to only fetch just the requested fields you should be ok. But if it fetches the entire GenBank record and simply throws away the sequence information and you are running into large sequences (say a big chunk of a chromosome) and this ends up hitting the memory/swap space limits on your machine that could be a problem. If the program is running for a long time I'd be inclined to check my system logs to see if one is running out of memory/swap. You can also watch the process using ps to determine if the VSZ grows continuously. I think I mentioned this before on the BioPerl list but never had a clear understanding of what was going on and may not have filed a bug report. I think I eventually worked around it, perhaps by fetching the offending (large) sequence using wget or a browser. Robert 1. Given that NS_000198 is only ~7MB (4.6 million actual bases) the BioPerl memory management has to be really poor in merging/reusing if the fetch uses ~3GB. From bhakti.dwivedi at gmail.com Mon May 10 15:22:41 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Mon, 10 May 2010 11:22:41 -0400 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface Message-ID: Does anyone know why the blast results vary for a query sequence when search is conducted using a web-based interface versus a Command line interface? For example, my web-based blast top hits do not match the top hits of the command line blast (blastcl3). I am using the default settings in both. not sure why the results are different Even if the hit is there, the e-value, bit score etc are different for the same hsp regions identified within the hit. is there a difference in the blast algorithm? or is it the database? Thanks! From cjfields at illinois.edu Mon May 10 16:28:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:28:15 -0500 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: References: Message-ID: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> The default web-based parameters differ than those via blastcl3, so if you are using the defaults for both they may differ somewhat. chris On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: > Does anyone know why the blast results vary for a query sequence when search > is conducted using a web-based interface versus a Command line interface? > > For example, my web-based blast top hits do not match the top hits of the > command line blast (blastcl3). I am using the default settings in both. > not sure why the results are different Even if the hit is there, the > e-value, bit score etc are different for the same hsp regions identified > within the hit. is there a difference in the blast algorithm? or is it the > database? > > Thanks! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 16:31:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:31:15 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: References: <28506482.post@talk.nabble.com> Message-ID: On May 10, 2010, at 12:38 AM, Robert Bradbury wrote: > I don't know whether this is related or not. But the last time I tried to > fetch a moderately large genome (NS_000198 for *Podospera anserina*) it > failed [1]. It takes a *very* long time and eventually springs an "Out of > Memory" error. This is on a Pentium IV Prescott which only has a 4GB > address space (configured for 3GB for user programs) and after running a > long strace on the perl process it seemed that what was happening was that > it was never properly returning and merging memory from the sequence chunks > which were being fetched. The final program address was brk(0xafd8c000) or > 2,950,217,728 which is probably the maximum amount of data space a user > program can have considering that one needs room for the stack. After that > the mmap2() calls started failing with ENOMEM. That's odd. What OS? > If Bio::DB::GenBank::Query is intelligent enough to only fetch just the > requested fields you should be ok. But if it fetches the entire GenBank > record and simply throws away the sequence information and you are running > into large sequences (say a big chunk of a chromosome) and this ends up > hitting the memory/swap space limits on your machine that could be a > problem. Yes, that may happen, as (at the moment) we push everything into memory; there are no lazy or DB-linked Seq instances, at least not yet. Very large sequences take a lot of time (object instantiation) and a lot of memory. To tell the truth, that seems to be the default of most toolkits, but we have recently talked about possible ways to deal with it, just need the tuits for it (as with anything). The other alternative is to pull the sequences down locally as a raw text file. This can still be done within BioPerl, just using Bio::DB::EUtilities: my $in = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nuccore', -email => 'cjfields at bioperl.org', -rettype => 'gbwithparts', -id => 'NS_000198'); $in->get_Response(-file => "$id.gb"); > If the program is running for a long time I'd be inclined to check my system > logs to see if one is running out of memory/swap. You can also watch the > process using ps to determine if the VSZ grows continuously. > > I think I mentioned this before on the BioPerl list but never had a clear > understanding of what was going on and may not have filed a bug report. I > think I eventually worked around it, perhaps by fetching the offending > (large) sequence using wget or a browser. You can still file a bug on it; does help with keeping track (just reporting it here doesn't help much, it gets lost in the shuffle). > Robert > > 1. Given that NS_000198 is only ~7MB (4.6 million actual bases) the BioPerl > memory management has to be really poor in merging/reusing if the fetch uses > ~3GB. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l BioPerl stores everything in memory, but I've worked with 4.6Mbp genomes quite a bit on my MB Pro. However, the default mode for Bio;:DB::GenBank is to pull down everything using 'gbwithparts'. This file is much larger doing so (sequence is ~34Mbp, file is ~51 MB). Maybe that's the problem? If you can please file a bug report, along with the relevant information. That helps us determine the best course of action. chris From cjfields at illinois.edu Mon May 10 16:32:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:32:43 -0500 Subject: [Bioperl-l] Read/write round-tripping Was: Re: New Bioperl dependency? Sort::Naturally In-Reply-To: <4BE6639B.6060004@gmail.com> References: <28491725.post@talk.nabble.com> <4BE4EBAA.5010709@gmail.com> <4BE54C37.7020304@gmail.com> <8AAD0315-4592-42C1-94A2-791249BBD2A7@illinois.edu> <4BE6639B.6060004@gmail.com> Message-ID: <4B47AB3F-3190-4ACC-8235-8F5D6DBE7DC6@illinois.edu> If there is dynamic ID assignment I would assume you can't compare them between runs, so using is_deeply() won't work as advertised since we already know the ID will change between runs anyway, it's a self-fulfilling prophecy. Also, is_deeply() here is inspecting the SF::Collection blessed hash directly (the _btree is a tied DB_File hash), not sure that's what you want either. So at this point I would have to ask myself: 1) Is the dynamic ID assignment a bug (e.g. should we be using a fixed ID of some sort)? If not, we can't expect these to match across runs, so is_deeply won't work. 2) Would it make more sense to explicitly inspect the handled objects (SF::Collection) directly via method calls? For instance, if I want to see whether a set of features falls within a region, is that reproducible between runs? Either way, I'm not sure what using Test::Deeply would gain you, as it's still meant to inspect complex data structures, just with a bit more sugar than Test::More and is_deeply(). Per #2 above, I would be more explicit in inspecting the SF::Collection: my $collection = $contig->get_features_collection; # check that IDs in SF::Collection conform to a regex using like() # inspect other things about the collection... chris On May 9, 2010, at 2:26 AM, Florent Angly wrote: > Chris, > > I've thought some more on the problem and I now agree with you that round-tripping at the object-level is more powerful. > > It has the problem that some objects are given IDs dynamically every time, which means that identical input files won't have an identical object. > >> is_deeply( $obj_out , $obj_in , 'deep compare' ); > >> not ok 1 - deep compare >> # Failed test 'deep compare' >> # at ./test_roundtrip.pl line 33. >> # Structures begin differing at: >> # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '56438592' >> # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '54980512' >> 1..1 >> # Looks like you failed 1 test of 1. > > > And when I re-run this again: > >> not ok 1 - deep compare >> # Failed test 'deep compare' >> # at ./test_roundtrip.pl line 33. >> # Structures begin differing at: >> # ${ $got->{_contigs}{Contig35}{_sfc}{_btree}} = '47763264' >> # ${$expected->{_contigs}{Contig35}{_sfc}{_btree}} = '46305184' >> 1..1 >> # Looks like you failed 1 test of 1. > > Note how the value of _btree changes everytime. > > Maybe using Test::Deep would be a good approach (http://search.cpan.org/~fdaly/Test-Deep-0.106/lib/Test/Deep.pod): >> Where it becomes more interesting is in allowing you to do something besides simple exact comparisons. With strings, the |eq| operator checks that 2 strings are exactly equal but sometimes that's not what you want. When you don't know exactly what the string should be but you do know some things about how it should look, |eq| is no good and you must use pattern matching instead. Test::Deep provides pattern matching for complex data structures > > Florent > > > > > On 09/05/10 10:02, Chris Fields wrote: >> Should clarify that: round-tripping to generate the same data structure/object is good and what we want. Round-tripping to generate the exact same output is not our highest priority. >> >> chris >> >> On May 8, 2010, at 6:47 PM, Chris Fields wrote: >> >> >>> To tell the truth, I'm more worried about getting data from various formats into Bio::* objects than getting the output 100% correct and identical to the original input. None of the SeqIO module make that specific promise, simply b/c it's a nearly impossible thing to maintain, with very little payback. Round-tripping is fine and all, just not our first priority. >>> >>> chris >>> >>> On May 8, 2010, at 6:34 AM, Florent Angly wrote: >>> >>> >>>> Same question about the CPAN module Test::Files (http://search.cpan.org/~philcrow/Test-Files-0.14/Files.pm). I could see myself using it in the BioPerl unit tests to make sure that the assembly files written match the input assembly files. >>>> >>>> It looks like the Bio::SeqIO modules tests could use it as well. >>>> >>>> Cheers, >>>> >>>> Florent >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 16:58:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 11:58:07 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <28506482.post@talk.nabble.com> References: <28506482.post@talk.nabble.com> Message-ID: <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> 500000 sequences is way too many to request, even in a loop. Under most circumstances this is breaking NCBI's eutils policies: http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements so don't be too surprised this is failing (this would be around 1000 queried of 500 sequences per query). You could try pulling down the raw sequence via batch entrez or using Bio::DB::EUtilities (which should die if an error occurs). chris On May 9, 2010, at 9:22 PM, bergeycm wrote: > > Hi all, > > I'm attempting to query GenBank for all sequences' lengths for a given > taxon. I'm using get_Stream_by_query(), but only to grab the species, > length, and accession. The genus of interest has almost 500,000 GB entries, > though, and my code hangs up at odd points in the info-gathering loop. > (Often after only 300 or 400 iterations.) The problem is that > $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back > undefined. > > I've tried wrapping the next_seq portion of the code in an eval block, but > to no avail. Is there a way to split a query into a bunch of small streams > that aren't too much to ask? Or is there a way to pick up a dropped SeqIO > stream? I think the connection is timing out and the stream is being lost. > Any advice is greatly appreciated, as I'm fairly new to BioPerl. > > - bergeycm > > > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > > # Get general things ready to go for querying GenBank > my %options; > $options{'-maxids'} = '500000'; # There are presently 460,184 sequences > $options{'-db'} = 'nucleotide'; > $options{'-query'} = "Pongo [ORGN]"; # Orangutans > > > my $query_obj = Bio::DB::Query::GenBank->new(%options); > my $total = $query_obj->count; > > my $gb_obj = Bio::DB::GenBank->new(); > my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); > > # Restrict info to just what I'll be using. No sequence necessary. > my $builder = $stream_obj->sequence_builder(); > $builder->want_none(); > $builder->add_wanted_slot('species','length','accession'); > > my $c = 0; > > for (1 .. $total) { > eval { > my $seq_obj = $stream_obj->next_seq; > my $flavor = $seq_obj->species; > print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", > $seq_obj->length, "\t", $seq_obj->accession, "\n"; > }; > > if ($@) { > print $!, '\n'; > } > > # Pause for a little over a third of a second > select(undef, undef, undef, 0.35); > > $c++; > } > > > > -- > View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 10 17:07:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 12:07:00 -0500 Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely In-Reply-To: <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> References: <28506482.post@talk.nabble.com> <3218EBA2-48BE-4757-B00A-85CBCE47AFB3@illinois.edu> Message-ID: <58E399D4-A884-4DC1-A5C6-8B0CBDDB173A@illinois.edu> (addendum added, sent too early) On May 10, 2010, at 11:58 AM, Chris Fields wrote: > 500000 sequences is way too many to request, even in a loop. Under most circumstances this is breaking NCBI's eutils policies: > > http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements > > so don't be too surprised this is failing (this would be around 1000 queried of 500 sequences per query). > > You could try pulling down the raw sequence via batch entrez or using Bio::DB::EUtilities (which should die if an error occurs). But you may still run into issues with eutils at some point, particularly if running this at peak times. > > chris > > On May 9, 2010, at 9:22 PM, bergeycm wrote: > >> >> Hi all, >> >> I'm attempting to query GenBank for all sequences' lengths for a given >> taxon. I'm using get_Stream_by_query(), but only to grab the species, >> length, and accession. The genus of interest has almost 500,000 GB entries, >> though, and my code hangs up at odd points in the info-gathering loop. >> (Often after only 300 or 400 iterations.) The problem is that >> $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back >> undefined. >> >> I've tried wrapping the next_seq portion of the code in an eval block, but >> to no avail. Is there a way to split a query into a bunch of small streams >> that aren't too much to ask? Or is there a way to pick up a dropped SeqIO >> stream? I think the connection is timing out and the stream is being lost. >> Any advice is greatly appreciated, as I'm fairly new to BioPerl. >> >> - bergeycm >> >> >> >> use Bio::DB::GenBank; >> use Bio::DB::Query::GenBank; >> >> >> # Get general things ready to go for querying GenBank >> my %options; >> $options{'-maxids'} = '500000'; # There are presently 460,184 sequences >> $options{'-db'} = 'nucleotide'; >> $options{'-query'} = "Pongo [ORGN]"; # Orangutans >> >> >> my $query_obj = Bio::DB::Query::GenBank->new(%options); >> my $total = $query_obj->count; >> >> my $gb_obj = Bio::DB::GenBank->new(); >> my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); >> >> # Restrict info to just what I'll be using. No sequence necessary. >> my $builder = $stream_obj->sequence_builder(); >> $builder->want_none(); >> $builder->add_wanted_slot('species','length','accession'); >> >> my $c = 0; >> >> for (1 .. $total) { >> eval { >> my $seq_obj = $stream_obj->next_seq; >> my $flavor = $seq_obj->species; >> print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", >> $seq_obj->length, "\t", $seq_obj->accession, "\n"; >> }; >> >> if ($@) { >> print $!, '\n'; >> } >> >> # Pause for a little over a third of a second >> select(undef, undef, undef, 0.35); >> >> $c++; >> } >> >> >> >> -- >> View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Mon May 10 17:14:36 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Mon, 10 May 2010 18:14:36 +0100 Subject: [Bioperl-l] Bio::Align - alignment by position? In-Reply-To: References: Message-ID: <003701caf064$441c4660$cc54d320$%yin@ucd.ie> Hi, When you use $aln->slice(), there is a third optional parameter to keep gap-only columns in newly created slice, e.g. $aln2=$aln->slice(20,30,1); By defining the third parameter, you can keep gap-only sub sequences. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From bhakti.dwivedi at gmail.com Mon May 10 18:35:37 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Mon, 10 May 2010 14:35:37 -0400 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> References: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> Message-ID: Thanks Chris! I changed few parameter values in blastcl3 and now the results are same. Any particular reason to set the default differently in web-based and command-line blast search? Bhakti On Mon, May 10, 2010 at 12:28 PM, Chris Fields wrote: > The default web-based parameters differ than those via blastcl3, so if you > are using the defaults for both they may differ somewhat. > > chris > > On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: > > > Does anyone know why the blast results vary for a query sequence when > search > > is conducted using a web-based interface versus a Command line interface? > > > > For example, my web-based blast top hits do not match the top hits of > the > > command line blast (blastcl3). I am using the default settings in both. > > not sure why the results are different Even if the hit is there, the > > e-value, bit score etc are different for the same hsp regions identified > > within the hit. is there a difference in the blast algorithm? or is it > the > > database? > > > > Thanks! > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon May 10 19:47:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 14:47:56 -0500 Subject: [Bioperl-l] Different Blast results from web-based interface and command line interface In-Reply-To: References: <74AF6F1E-A8F8-4453-B7A4-A2D0F96BE1B2@illinois.edu> Message-ID: you would need to ask NCBI that. chris On May 10, 2010, at 1:35 PM, Bhakti Dwivedi wrote: > Thanks Chris! I changed few parameter values in blastcl3 and now the > results are same. Any particular reason to set the default differently in > web-based and command-line blast search? > > Bhakti > > > > On Mon, May 10, 2010 at 12:28 PM, Chris Fields wrote: > >> The default web-based parameters differ than those via blastcl3, so if you >> are using the defaults for both they may differ somewhat. >> >> chris >> >> On May 10, 2010, at 10:22 AM, Bhakti Dwivedi wrote: >> >>> Does anyone know why the blast results vary for a query sequence when >> search >>> is conducted using a web-based interface versus a Command line interface? >>> >>> For example, my web-based blast top hits do not match the top hits of >> the >>> command line blast (blastcl3). I am using the default settings in both. >>> not sure why the results are different Even if the hit is there, the >>> e-value, bit score etc are different for the same hsp regions identified >>> within the hit. is there a difference in the blast algorithm? or is it >> the >>> database? >>> >>> Thanks! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dimitark at bii.a-star.edu.sg Tue May 11 02:03:51 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 11 May 2010 10:03:51 +0800 Subject: [Bioperl-l] StandAloneFasta and Too many open files Message-ID: <4BE8BB07.3040407@bii.a-star.edu.sg> Hi guys, yesterday i got the following error: 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' from the following code: ------------ my $ssout="my_seq_out.txt"; print "SS:$tquery:\n:$tseq:\n"; my @sargs=( 'q' => '', 'E' => '1', 'w' => '100', 'O' => "$ssout", 'program' => "ssearch36", ); my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); $fac_ss->library($tmpseq); my @sreport=$fac_ss->run($tqtmp); foreach my $sr (@sreport){ while(my $result=$sr->next_result){ while(my $hit=$result->next_hit){ while(my $hsp=$hit->next_hsp){ my $iden=$hsp->frac_identical; $rv3=$iden; # print "IDEN:$iden:$rv1\n"; } } } } -------------------- I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. Next i commented out the above code and rewrote my script into the following: -------------- my $ssout="my_seq_out.txt"; my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); system(@sargs) == 0 or die "system @sargs failed: $!"; my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); while(my $result=$sreport->next_result){ # print Dumper($result); while(my $hit=$result->next_hit){ while(my $hsp=$hit->next_hsp){ my $iden=$hsp->frac_identical; $rv3=$iden; # print "IDEN:$iden:$rv1\n"; } } } --------------- Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. Ok that is it. I just wanted to inform about my experience and to report the problem. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From cjfields at illinois.edu Tue May 11 03:04:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 22:04:12 -0500 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <4BE8BB07.3040407@bii.a-star.edu.sg> References: <4BE8BB07.3040407@bii.a-star.edu.sg> Message-ID: <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: > Hi guys, > yesterday i got the following error: > > 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' > > from the following code: > ------------ > my $ssout="my_seq_out.txt"; > print "SS:$tquery:\n:$tseq:\n"; > my @sargs=( > 'q' => '', > 'E' => '1', > 'w' => '100', > 'O' => "$ssout", > 'program' => "ssearch36", > ); > my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); > $fac_ss->library($tmpseq); > my @sreport=$fac_ss->run($tqtmp); > > foreach my $sr (@sreport){ > while(my $result=$sr->next_result){ > while(my $hit=$result->next_hit){ > while(my $hsp=$hit->next_hsp){ > my $iden=$hsp->frac_identical; > $rv3=$iden; > # print "IDEN:$iden:$rv1\n"; > } > } > } > } > -------------------- > I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. > I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. > Next i commented out the above code and rewrote my script into the following: > -------------- > my $ssout="my_seq_out.txt"; > my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); > system(@sargs) == 0 or die "system @sargs failed: $!"; > > my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); > while(my $result=$sreport->next_result){ > # print Dumper($result); > while(my $hit=$result->next_hit){ > while(my $hsp=$hit->next_hsp){ > > my $iden=$hsp->frac_identical; > $rv3=$iden; > # print "IDEN:$iden:$rv1\n"; > } > } > } > --------------- > Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. > > I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. > > Ok that is it. I just wanted to inform about my experience and to report the problem. > > Cheers > Dimitar Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. chris From cjfields at illinois.edu Tue May 11 03:57:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 10 May 2010 22:57:18 -0500 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> References: <4BE8BB07.3040407@bii.a-star.edu.sg> <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> Message-ID: <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> Addendum to that last post. On May 10, 2010, at 10:04 PM, Chris Fields wrote: > On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: > >> Hi guys, >> yesterday i got the following error: >> >> 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' >> >> from the following code: >> ------------ >> my $ssout="my_seq_out.txt"; >> print "SS:$tquery:\n:$tseq:\n"; >> my @sargs=( >> 'q' => '', >> 'E' => '1', >> 'w' => '100', >> 'O' => "$ssout", >> 'program' => "ssearch36", >> ); >> my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); >> $fac_ss->library($tmpseq); >> my @sreport=$fac_ss->run($tqtmp); >> >> foreach my $sr (@sreport){ >> while(my $result=$sr->next_result){ >> while(my $hit=$result->next_hit){ >> while(my $hsp=$hit->next_hsp){ >> my $iden=$hsp->frac_identical; >> $rv3=$iden; >> # print "IDEN:$iden:$rv1\n"; >> } >> } >> } >> } >> -------------------- >> I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. >> I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. >> Next i commented out the above code and rewrote my script into the following: >> -------------- >> my $ssout="my_seq_out.txt"; >> my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq > $ssout"); >> system(@sargs) == 0 or die "system @sargs failed: $!"; >> >> my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); >> while(my $result=$sreport->next_result){ >> # print Dumper($result); >> while(my $hit=$result->next_hit){ >> while(my $hsp=$hit->next_hsp){ >> >> my $iden=$hsp->frac_identical; >> $rv3=$iden; >> # print "IDEN:$iden:$rv1\n"; >> } >> } >> } >> --------------- >> Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. >> >> I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. >> >> Ok that is it. I just wanted to inform about my experience and to report the problem. >> >> Cheers >> Dimitar > > > Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. > > chris Dimitar, I think Peter had answered this before, might indicate the problem is actually using the 'O' option in output. We can look at possibly just capturing STDOUT instead, but we may not support the use of 'O' if it's as buggy as indicated. http://groups.google.com/group/bioperl-l/msg/25c17748d1ac6ef4 chris From dimitark at bii.a-star.edu.sg Tue May 11 04:24:13 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 11 May 2010 12:24:13 +0800 Subject: [Bioperl-l] StandAloneFasta and Too many open files In-Reply-To: <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> References: <4BE8BB07.3040407@bii.a-star.edu.sg> <80844561-B799-49C2-B4A0-ABD4F68AD51C@illinois.edu> <97A00967-6C2D-481E-955E-65E6EB87E87B@illinois.edu> Message-ID: <4BE8DBED.2000209@bii.a-star.edu.sg> Hi Chris, thank you for the information. I checked it out. I wrote you and the list about that as well. To you on 16.04.2010 and to the list on 23.04.2010. There i explained that i modified the module. Now i pass it the '0' option but this option is not passed to the actual program executed by system. I just add my desired output with "> $output" to the parameter line passed to system. In the email mentioned above i attached the modified version of the module. I was digging again a bit about the module. I found that - line(359): ----------- unless( $outfile ) { open(FASTARUN, "$para |") || $self->throw($@);#original $object=Bio::SearchIO->new(-fh=>\*FASTARUN, #original -format=>"fasta");#original } else { ------------ And here another one when the 'O' is used - line(371): --------- $object = Bio::SearchIO->new(-file=>$self->O, -format=>"fasta"); ---------- May be the problem is here. Because i didnt see anywhere a 'close' for these filehandles. I can test and tell if i was right. Cheers Dimitar On 05/11/2010 11:57 AM, Chris Fields wrote: > Addendum to that last post. > > On May 10, 2010, at 10:04 PM, Chris Fields wrote: > > >> On May 10, 2010, at 9:03 PM, Dimitar Kenanov wrote: >> >> >>> Hi guys, >>> yesterday i got the following error: >>> >>> 'Too many open files at /usr/lib64/perl5/site_perl/5.10.0/Bio/Tools/Run/Alignment/StandAloneFasta.pm line 380' >>> >>> from the following code: >>> ------------ >>> my $ssout="my_seq_out.txt"; >>> print "SS:$tquery:\n:$tseq:\n"; >>> my @sargs=( >>> 'q' => '', >>> 'E' => '1', >>> 'w' => '100', >>> 'O' => "$ssout", >>> 'program' => "ssearch36", >>> ); >>> my $fac_ss=Bio::Tools::Run::Alignment::StandAloneFasta->new(@sargs); >>> $fac_ss->library($tmpseq); >>> my @sreport=$fac_ss->run($tqtmp); >>> >>> foreach my $sr (@sreport){ >>> while(my $result=$sr->next_result){ >>> while(my $hit=$result->next_hit){ >>> while(my $hsp=$hit->next_hsp){ >>> my $iden=$hsp->frac_identical; >>> $rv3=$iden; >>> # print "IDEN:$iden:$rv1\n"; >>> } >>> } >>> } >>> } >>> -------------------- >>> I am using that code over several thousands of HSPs for which i get the sequence and then 'ssearch36' with it against another sequence. I was digging around the module StandAloneFasta but couldnt get where the problem is. There should be somewhere many opened filehandles but do not know where. I checked the module but couldnt find such filehandles. May be the problem is in the base modules. >>> I also checked and my script for left open filehandles and i have not. I found only that i can actually close SeqIO streams with '$stream->close' which i didnt see on the web documentation. So something positive out of this :) So i closed all my SeqIO streams and i still had the same problem. >>> Next i commented out the above code and rewrote my script into the following: >>> -------------- >>> my $ssout="my_seq_out.txt"; >>> my @sargs=("ssearch36 -q -E 1 -d 1 $tqtmp $tmpseq> $ssout"); >>> system(@sargs) == 0 or die "system @sargs failed: $!"; >>> >>> my $sreport=Bio::SearchIO->new(-file => $ssout, -format => 'fasta'); >>> while(my $result=$sreport->next_result){ >>> # print Dumper($result); >>> while(my $hit=$result->next_hit){ >>> while(my $hsp=$hit->next_hsp){ >>> >>> my $iden=$hsp->frac_identical; >>> $rv3=$iden; >>> # print "IDEN:$iden:$rv1\n"; >>> } >>> } >>> } >>> --------------- >>> Fortunately this code overcame the error message with too many filehandles. So the problem was indeed coming from the module or the modules behind it. >>> >>> I have also read that one can change the number of how many files can be opened on the system but i didnt want to mess with that for now because i do not know what could be the implications of that. >>> >>> Ok that is it. I just wanted to inform about my experience and to report the problem. >>> >>> Cheers >>> Dimitar >>> >> >> Seems this is hitting the system ulimit somehow, but it's not immediately apparent how that's happening unless you are caching the IO objects somehow. Can you file this as a bug, maybe with a fuller test script? Might give us something to check against. >> >> chris >> > Dimitar, > > I think Peter had answered this before, might indicate the problem is actually using the 'O' option in output. We can look at possibly just capturing STDOUT instead, but we may not support the use of 'O' if it's as buggy as indicated. > > http://groups.google.com/group/bioperl-l/msg/25c17748d1ac6ef4 > > chris > > -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From heikki.lehvaslaiho at gmail.com Tue May 11 05:40:14 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Tue, 11 May 2010 08:40:14 +0300 Subject: [Bioperl-l] Github possibilities Message-ID: FYI http://chem-bla-ics.blogspot.com/2010/05/github-simplifies-code-review-and.html -Heikki From heikki.lehvaslaiho at gmail.com Tue May 11 05:43:42 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Tue, 11 May 2010 08:43:42 +0300 Subject: [Bioperl-l] Fwd: BLAST parsing broken In-Reply-To: <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> References: <30F45792-AD11-48DC-A105-C89F89493FCB@illinois.edu> <8F1E69C5-7EBB-4DA6-9F00-05C9C75B6AB4@illinois.edu> <3D8D8DF2-6996-483B-A5BF-B16D0BE8153F@illinois.edu> Message-ID: Thanks Razi and Chris, Blast parsing works again beautifully. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 10 May 2010 03:39, Chris Fields wrote: > Ok, that's fine. It may be something off with revision numbers when using > svn with github (git doesn't have incremental revisions, but a SHA). > Committed the patch to dev svn, in r16970. > > chris > > On May 9, 2010, at 6:48 PM, Razi Khaja wrote: > > > I checked out bioperl-live from github: > > svn checkout http://svn.github.com/bioperl/bioperl-live.git > > > > I just checked it out again, a few seconds ago and by default I got > revision > > 11326. > > Razi > > > > > > On Sun, May 9, 2010 at 5:30 PM, Chris Fields > wrote: > > > >> Then something is wrong, as current trunk is at r16969. Where are you > >> pulling your code from? Our only working anon. server is the sync'ed > github > >> one. > >> > >> chris > >> > >> On May 9, 2010, at 4:15 PM, Razi Khaja wrote: > >> > >>> Hi Chris, > >>> The patch is against the main trunk. I checked out version 11326 of > the > >>> repository today. > >>> Razi > >>> > >>> > >>> On Sun, May 9, 2010 at 4:43 PM, Chris Fields > >> wrote: > >>> > >>>> If the patch is against main trunk it isn't a problem, otherwise the > >> diff > >>>> should be vs. that code. > >>>> > >>>> chris > >>>> > >>>> On May 9, 2010, at 2:23 PM, Razi Khaja wrote: > >>>> > >>>>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem. > >>>>> Can someone advise an appropriate way to have this patch applied, > given > >>>> that > >>>>> it is an amendment to a previous patch? > >>>>> Thanks > >>>>> Razi > >>>>> > >>>>> > >>>>> ---------- Forwarded message ---------- > >>>>> From: Heikki Lehvaslaiho > >>>>> Date: Wed, May 5, 2010 at 2:11 AM > >>>>> Subject: Re: [Bioperl-l] BLAST parsing broken > >>>>> To: Razi Khaja > >>>>> > >>>>> > >>>>> Hi Raja, > >>>>> > >>>>> Thanks for trying to fix this. > >>>>> > >>>>> I am attaching an example output file to this message. I just tested > >>>> again > >>>>> that master from git repository fails to get query ID, but the > previous > >>>>> version works. > >>>>> > >>>>> bala ~/src/bioperl-live> git checkout master > >>>>> Previous HEAD position was 5e278f5... Robson's patch for buggy > blastpgp > >>>>> output > >>>>> Switched to branch 'master' > >>>>> > >>>>> When I started using the latest mpiBLAST code a few months ago I did > >>>> compare > >>>>> the 0 output from it to standard NCBI blast and they were identical. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Also, I've noticed a discrepancy between within bioperl blast > parsing > >>>> that > >>>>> I have not had time to work on. Would you be interested in having a > >> look? > >>>>> > >>>>> I am creating output from mpiBLAST in 0 format and then converting it > >>>> into > >>>>> tab-delimited 8 format. I am unable to get 100% similarity for all > >> cases > >>>>> when I compare the conversion to the output straight from mpiBLAST in > >>>> format > >>>>> 8. Sometimes the mismatch and gap values are off by one. > >>>>> > >>>>> I am attaching a script that does the conversion. It is the same one > I > >>>> was > >>>>> using when I noticed the problem above. I was going to put the code > >> into > >>>>> bioperl but that got delayed when I noticed the discrepancies. > >>>>> > >>>>> > >>>>> Cheers, > >>>>> > >>>>> > >>>>> -Heikki > >>>>> > >>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>> > >>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office > >>>> #4216 > >>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>> > >>>>> > >>>>> > >>>>> On 4 May 2010 20:55, Razi Khaja wrote: > >>>>> > >>>>>> That is odd. Heikki, do you have a blast output file that produces > >> this > >>>>>> error? > >>>>>> Could you attach the file and either send to the list or myself (if > >> the > >>>>>> list > >>>>>> does not accept attachments). > >>>>>> Thanks, > >>>>>> Razi > >>>>>> > >>>>>> > >>>>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields > > >>>>>> wrote: > >>>>>> > >>>>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that > >> (in > >>>>>> svn > >>>>>>> of course, until the migration is complete). > >>>>>>> > >>>>>>> chris > >>>>>>> > >>>>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote: > >>>>>>> > >>>>>>>> Chris, > >>>>>>>> > >>>>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of > >>>>>> normal > >>>>>>>> blast output. $result->query_name returns now undef. > >>>>>>>> > >>>>>>>> (Using the anonymous git now). This change still works: > >>>>>>>> > >>>>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>>>> Author: cjfields > >>>>>>>> Date: Sun Dec 20 04:39:58 2009 +0000 > >>>>>>>> > >>>>>>>> Robson's patch for buggy blastpgp output > >>>>>>>> > >>>>>>>> But this does not: > >>>>>>>> > >>>>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544 > >>>>>>>> Author: cjfields > >>>>>>>> Date: Thu Apr 15 04:21:17 2010 +0000 > >>>>>>>> > >>>>>>>> [bug 3031] > >>>>>>>> > >>>>>>>> patches for catching algorithm ref, courtesy Razi Khaja. > >>>>>>>> > >>>>>>>> That makes it easy to find the diffs: > >>>>>>>> > >>>>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74 > >>>>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm > >>>>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm > >>>>>>>> index 378023a..6f7eeeb 100644 > >>>>>>>> --- a/Bio/SearchIO/blast.pm > >>>>>>>> +++ b/Bio/SearchIO/blast.pm > >>>>>>>> @@ -209,6 +209,7 @@ BEGIN { > >>>>>>>> > >>>>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name', > >>>>>>>> 'BlastOutput_version' => > >>>>>> 'RESULT-algorithm_version', > >>>>>>>> + 'BlastOutput_algorithm-reference' => > >>>>>>> 'RESULT-algorithm_reference', > >>>>>>>> 'BlastOutput_query-def' => 'RESULT-query_name', > >>>>>>>> 'BlastOutput_query-len' => 'RESULT-query_length', > >>>>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession', > >>>>>>>> @@ -504,6 +505,26 @@ sub next_result { > >>>>>>>> } > >>>>>>>> ); > >>>>>>>> } > >>>>>>>> + # parse the BLAST algorithm reference > >>>>>>>> + elsif(/^Reference:\s+(.*)$/) { > >>>>>>>> + # want to preserve newlines for the BLAST algorithm > >>>>>>> reference > >>>>>>>> + my $algorithm_reference = "$1\n"; > >>>>>>>> + $_ = $self->_readline; > >>>>>>>> + # while the current line, does not match an empty > line, > >> a > >>>>>>> RID:, > >>>>>>>> or a Database:, we are still looking at the > >>>>>>>> + # algorithm_reference, append it to what we parsed so > >> far > >>>>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ > >> /^Database:/) > >>>> { > >>>>>>>> + $algorithm_reference .= "$_"; > >>>>>>>> + $_ = $self->_readline; > >>>>>>>> + } > >>>>>>>> + # if we exited the while loop, we saw an empty line, > a > >>>>>> RID:, > >>>>>>> or > >>>>>>>> a Database:, so push it back > >>>>>>>> + $self->_pushback($_); > >>>>>>>> + $self->element( > >>>>>>>> + { > >>>>>>>> + 'Name' => 'BlastOutput_algorithm-reference', > >>>>>>>> + 'Data' => $algorithm_reference > >>>>>>>> + } > >>>>>>>> + ); > >>>>>>>> + } > >>>>>>>> # added Windows workaround for bug 1985 > >>>>>>>> elsif (/^(Searching|Results from round)/) { > >>>>>>>> next unless $1 =~ /Results from round/; > >>>>>>>> > >>>>>>>> > >>>>>>>> I am not sure why reference parsing messes things up. Maybe it > eats > >>>> too > >>>>>>> many > >>>>>>>> lines from the result file. > >>>>>>>> > >>>>>>>> Yours, > >>>>>>>> > >>>>>>>> -Heikki > >>>>>>>> > >>>>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > >>>>>>>> cell: +966 545 595 849 office: +966 2 808 2429 > >>>>>>>> > >>>>>>>> Computational Bioscience Research Centre (CBRC), Building #2, > Office > >>>>>>> #4216 > >>>>>>>> 4700 King Abdullah University of Science and Technology (KAUST) > >>>>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>> >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cmb433 at nyu.edu Sun May 9 23:40:48 2010 From: cmb433 at nyu.edu (bergeycm) Date: Sun, 9 May 2010 16:40:48 -0700 (PDT) Subject: [Bioperl-l] get_Stream_by_query Terminates Prematurely Message-ID: <28506482.post@talk.nabble.com> Hi all, I'm attempting to query GenBank for all sequences' lengths for a given taxon. I'm using get_Stream_by_query(), but only to grab the species, length, and accession. The genus of interest has almost 500,000 GB entries, though, and my code hangs up at odd points in the info-gathering loop. (Often after only 300 or 400 iterations.) The problem is that $stream_obj->next_seq (of Bio::SeqIO::genbank) eventually comes back undefined. I've tried wrapping the next_seq portion of the code in an eval block, but to no avail. Is there a way to split a query into a bunch of small streams that aren't too much to ask? Or is there a way to pick up a dropped SeqIO stream? I think the connection is timing out and the stream is being lost. Any advice is greatly appreciated, as I'm fairly new to BioPerl. - bergeycm use Bio::DB::GenBank; use Bio::DB::Query::GenBank; # Get general things ready to go for querying GenBank my %options; $options{'-maxids'} = '500000'; # There are presently 460,184 sequences $options{'-db'} = 'nucleotide'; $options{'-query'} = "Pongo [ORGN]"; # Orangutans my $query_obj = Bio::DB::Query::GenBank->new(%options); my $total = $query_obj->count; my $gb_obj = Bio::DB::GenBank->new(); my $stream_obj = $gb_obj->get_Stream_by_query($query_obj); # Restrict info to just what I'll be using. No sequence necessary. my $builder = $stream_obj->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('species','length','accession'); my $c = 0; for (1 .. $total) { eval { my $seq_obj = $stream_obj->next_seq; my $flavor = $seq_obj->species; print $c, "\t", $flavor->scientific_name, " (", $flavor->id, ")\t", $seq_obj->length, "\t", $seq_obj->accession, "\n"; }; if ($@) { print $!, '\n'; } # Pause for a little over a third of a second select(undef, undef, undef, 0.35); $c++; } -- View this message in context: http://old.nabble.com/get_Stream_by_query-Terminates-Prematurely-tp28506482p28506482.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From sudeep.mehrotra at mail.mcgill.ca Tue May 11 13:40:07 2010 From: sudeep.mehrotra at mail.mcgill.ca (Sudeep Mehrotra) Date: Tue, 11 May 2010 09:40:07 -0400 Subject: [Bioperl-l] [Fwd: Re: Modules in Bio:Tree] Message-ID: <4BE95E37.3060702@mail.mcgill.ca> Hello Jason, Your suggestion worked. Thanks. I have two format (NEXUS and NEWICK) for the same tree. I want to obtain a "clade list" in other words is there a way to obtain the leaves which are members of a clade. For example,part of NEXUS file has following entry: other entries ....... 655 Deinococcus_geothermalis, 656 Deinococcus_radiodurans, 657 Thermus_thermophilus, 658 Thermus_sp. ; other entries........ (((((655,656)[])[])[],(((657,658)[])[])[])[])[])[])[]); From the tree I can observe that 657 and 658 are members of a subclade and 655 and 656 are member of another subclade and both these belong to one clade. I want to get this membership information. I tried looking for a module in Bio::Tree but not able to find any. In Bio::NEXUS package there is a module "walk" which I thought would work for me, but it does not. Also, the Bio::NEXUS package is just not working for me. From the documentation the input file they are using it different from what I have. Is there any way I get the membership information as shown earlier. Cheers -- Sudeep Mehrotra (Ph.D. Candidate) McGill University and Genome Quebec Innovation Center -------------- next part -------------- An embedded message was scrubbed... From: Jason Stajich Subject: Re: Modules in Bio:Tree Date: Wed, 5 May 2010 18:45:41 -0400 Size: 5420 URL: From amackey at virginia.edu Tue May 11 21:26:50 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 11 May 2010 17:26:50 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug Message-ID: Hi Zerui (and others), I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, specifically in this code: lines: 1170: (-start => int ($loc->start / 3 ) +1, 1171: -end => int ($loc->end / 3 ) +1, both of those lines should look like: int (($loc->start - 1) / 3) + 1 otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) There is also a problem when mapping exon coordinates that are outside/after the CDS region (instead of getting undefined locations, you continue to get peptide coordinates, but they are invalid, larger than the protein length). Dennis and fringy -- this may affect the SNPtab.pl script I wrote for you, as it uses this module to calculate codons for SNPs. -Aaron P.S. a script the demonstrates the problem: use Bio::Coordinate::GeneMapper; my $mapper = Bio::Coordinate::GeneMapper ->new( -in => "chr", -out => "propeptide", -exons => [ Bio::Location::Simple ->new( -start => 101, -end => 109 ), Bio::Location::Simple ->new( -start => 201, -end => 221 ), ], -cds => Bio::Location::Simple ->new(-start => 101, -end => 209), ); print join("\t", "chr", "aa"), "\n"; for my $pos (99..111,199..211) { my $res = $mapper->map( Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => 1)); my $start = $res->start; $start = "NA" unless defined $start; my $end = $res->end; $end = "NA" unless defined $end; print join("\t", $pos, $start), "\n"; } From cjfields at illinois.edu Tue May 11 22:31:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 May 2010 17:31:17 -0500 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: Message-ID: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Aaron, Do we want to write this up as a set of tests to add to the bioperl test suite? We can probably add it after the github migration tomorrow. chris On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > Hi Zerui (and others), > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > specifically in this code: > > lines: > 1170: (-start => int ($loc->start / 3 ) +1, > 1171: -end => int ($loc->end / 3 ) +1, > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > There is also a problem when mapping exon coordinates that are outside/after > the CDS region (instead of getting undefined locations, you continue to get > peptide coordinates, but they are invalid, larger than the protein length). > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for you, > as it uses this module to calculate codons for SNPs. > > -Aaron > > P.S. a script the demonstrates the problem: > > use Bio::Coordinate::GeneMapper; > > my $mapper = > Bio::Coordinate::GeneMapper > ->new( -in => "chr", > -out => "propeptide", > -exons => [ Bio::Location::Simple > ->new( -start => 101, > -end => 109 ), > Bio::Location::Simple > ->new( -start => 201, > -end => 221 ), > ], > -cds => Bio::Location::Simple > ->new(-start => 101, -end => 209), > ); > > > print join("\t", "chr", "aa"), "\n"; > for my $pos (99..111,199..211) { > my $res = $mapper->map( > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => 1)); > my $start = $res->start; $start = "NA" unless defined $start; > my $end = $res->end; $end = "NA" unless defined $end; > print join("\t", $pos, $start), "\n"; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From amackey at virginia.edu Tue May 11 22:40:11 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 11 May 2010 18:40:11 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Hi Chris, I was hoping Heikki might take up the cause and investigate further -- let's give him a chance to respond. But it's a frightening bug if it's really been that way for all this time ... -Aaron On Tue, May 11, 2010 at 6:31 PM, Chris Fields wrote: > Aaron, > > Do we want to write this up as a set of tests to add to the bioperl test > suite? We can probably add it after the github migration tomorrow. > > chris > > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > > > Hi Zerui (and others), > > > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > > specifically in this code: > > > > lines: > > 1170: (-start => int ($loc->start / 3 ) +1, > > 1171: -end => int ($loc->end / 3 ) +1, > > > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect peptide > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > > > There is also a problem when mapping exon coordinates that are > outside/after > > the CDS region (instead of getting undefined locations, you continue to > get > > peptide coordinates, but they are invalid, larger than the protein > length). > > > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for > you, > > as it uses this module to calculate codons for SNPs. > > > > -Aaron > > > > P.S. a script the demonstrates the problem: > > > > use Bio::Coordinate::GeneMapper; > > > > my $mapper = > > Bio::Coordinate::GeneMapper > > ->new( -in => "chr", > > -out => "propeptide", > > -exons => [ Bio::Location::Simple > > ->new( -start => 101, > > -end => 109 ), > > Bio::Location::Simple > > ->new( -start => 201, > > -end => 221 ), > > ], > > -cds => Bio::Location::Simple > > ->new(-start => 101, -end => 209), > > ); > > > > > > print join("\t", "chr", "aa"), "\n"; > > for my $pos (99..111,199..211) { > > my $res = $mapper->map( > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => > 1)); > > my $start = $res->start; $start = "NA" unless defined $start; > > my $end = $res->end; $end = "NA" unless defined $end; > > print join("\t", $pos, $start), "\n"; > > } > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed May 12 04:15:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 May 2010 23:15:54 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow Message-ID: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Just a friendly reminder that we'll freeze the dev subversion repository tomorrow prior to migration to github. The migration will take about an hour, during which all bioperl github repos will be replaced with the full repos, and devs added. The test repos will be removed around that time (Heikki, will that be a problem?). chris From heikki.lehvaslaiho at gmail.com Wed May 12 04:23:07 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Wed, 12 May 2010 07:23:07 +0300 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Message-ID: No problem at all. Go ahead. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 07:15, Chris Fields wrote: > Just a friendly reminder that we'll freeze the dev subversion repository > tomorrow prior to migration to github. The migration will take about an > hour, during which all bioperl github repos will be replaced with the full > repos, and devs added. The test repos will be removed around that time > (Heikki, will that be a problem?). > > chris From heikki.lehvaslaiho at gmail.com Wed May 12 10:23:03 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Wed, 12 May 2010 13:23:03 +0300 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Outch. I'll definitely have a look. Strange that none of the tests have picked this up... -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 01:40, Aaron Mackey wrote: > Hi Chris, > > I was hoping Heikki might take up the cause and investigate further -- > let's > give him a chance to respond. But it's a frightening bug if it's really > been that way for all this time ... > > -Aaron > > On Tue, May 11, 2010 at 6:31 PM, Chris Fields > wrote: > > > Aaron, > > > > Do we want to write this up as a set of tests to add to the bioperl test > > suite? We can probably add it after the github migration tomorrow. > > > > chris > > > > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: > > > > > Hi Zerui (and others), > > > > > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, > > > specifically in this code: > > > > > > lines: > > > 1170: (-start => int ($loc->start / 3 ) +1, > > > 1171: -end => int ($loc->end / 3 ) +1, > > > > > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 > > > > > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect > peptide > > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) > > > > > > There is also a problem when mapping exon coordinates that are > > outside/after > > > the CDS region (instead of getting undefined locations, you continue to > > get > > > peptide coordinates, but they are invalid, larger than the protein > > length). > > > > > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for > > you, > > > as it uses this module to calculate codons for SNPs. > > > > > > -Aaron > > > > > > P.S. a script the demonstrates the problem: > > > > > > use Bio::Coordinate::GeneMapper; > > > > > > my $mapper = > > > Bio::Coordinate::GeneMapper > > > ->new( -in => "chr", > > > -out => "propeptide", > > > -exons => [ Bio::Location::Simple > > > ->new( -start => 101, > > > -end => 109 ), > > > Bio::Location::Simple > > > ->new( -start => 201, > > > -end => 221 ), > > > ], > > > -cds => Bio::Location::Simple > > > ->new(-start => 101, -end => 209), > > > ); > > > > > > > > > print join("\t", "chr", "aa"), "\n"; > > > for my $pos (99..111,199..211) { > > > my $res = $mapper->map( > > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => > > 1)); > > > my $start = $res->start; $start = "NA" unless defined $start; > > > my $end = $res->end; $end = "NA" unless defined $end; > > > print join("\t", $pos, $start), "\n"; > > > } > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed May 12 16:24:49 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:24:49 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <4BEAD562.1010702@cornell.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: <97B3DF77-C657-4E7C-8298-529F474E1FA5@illinois.edu> Yup, haven't started the migration yet (I'm taking down some crontab scripts used for prior github updates, nightly builds). Then I'll announce before freezing the repo. chris On May 12, 2010, at 11:20 AM, Robert Buels wrote: > The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. > > R > > Heikki Lehvaslaiho wrote: >> No problem at all. Go ahead. >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +966 545 595 849 office: +966 2 808 2429 >> Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 >> 4700 King Abdullah University of Science and Technology (KAUST) >> Thuwal 23955-6900, Kingdom of Saudi Arabia >> On 12 May 2010 07:15, Chris Fields wrote: >>> Just a friendly reminder that we'll freeze the dev subversion repository >>> tomorrow prior to migration to github. The migration will take about an >>> hour, during which all bioperl github repos will be replaced with the full >>> repos, and devs added. The test repos will be removed around that time >>> (Heikki, will that be a problem?). >>> >>> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Wed May 12 16:20:50 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 12 May 2010 09:20:50 -0700 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> Message-ID: <4BEAD562.1010702@cornell.edu> The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. R Heikki Lehvaslaiho wrote: > No problem at all. Go ahead. > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > On 12 May 2010 07:15, Chris Fields wrote: > >> Just a friendly reminder that we'll freeze the dev subversion repository >> tomorrow prior to migration to github. The migration will take about an >> hour, during which all bioperl github repos will be replaced with the full >> repos, and devs added. The test repos will be removed around that time >> (Heikki, will that be a problem?). >> >> chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Wed May 12 16:43:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:43:42 -0500 Subject: [Bioperl-l] dev.open-bio.org SVN is now read-only Message-ID: Just like the subject says, switched the repo to a read only status. I'm starting the github migration now. chris From thomas.sharpton at gmail.com Wed May 12 16:45:22 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 12 May 2010 09:45:22 -0700 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: <4BEAD562.1010702@cornell.edu> References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: Sorry if I screwed things up - updated before checking this email tread. -T On May 12, 2010, at 9:20 AM, Robert Buels wrote: > The SVN repository is not frozen yet, driveby_bot just say 16984 go > into svn from Thomas Sharpton. > > R > > Heikki Lehvaslaiho wrote: >> No problem at all. Go ahead. >> -Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +966 545 595 849 office: +966 2 808 2429 >> Computational Bioscience Research Centre (CBRC), Building #2, >> Office #4216 >> 4700 King Abdullah University of Science and Technology (KAUST) >> Thuwal 23955-6900, Kingdom of Saudi Arabia >> On 12 May 2010 07:15, Chris Fields wrote: >>> Just a friendly reminder that we'll freeze the dev subversion >>> repository >>> tomorrow prior to migration to github. The migration will take >>> about an >>> hour, during which all bioperl github repos will be replaced with >>> the full >>> repos, and devs added. The test repos will be removed around that >>> time >>> (Heikki, will that be a problem?). >>> >>> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed May 12 16:47:36 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 11:47:36 -0500 Subject: [Bioperl-l] [REMINDER] GitHub migration tomorrow In-Reply-To: References: <44B6F0F8-F41D-4EE2-966D-F7A6D7081A6D@illinois.edu> <4BEAD562.1010702@cornell.edu> Message-ID: <08E7C628-D914-43C0-AB3D-E8FC41A144DC@illinois.edu> No problem, just froze the repo and rsynced to my local machine, so your commit made it just under the wire. chris On May 12, 2010, at 11:45 AM, Thomas Sharpton wrote: > Sorry if I screwed things up - updated before checking this email tread. > > -T > > On May 12, 2010, at 9:20 AM, Robert Buels wrote: > >> The SVN repository is not frozen yet, driveby_bot just say 16984 go into svn from Thomas Sharpton. >> >> R >> >> Heikki Lehvaslaiho wrote: >>> No problem at all. Go ahead. >>> -Heikki >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>> cell: +966 545 595 849 office: +966 2 808 2429 >>> Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 >>> 4700 King Abdullah University of Science and Technology (KAUST) >>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>> On 12 May 2010 07:15, Chris Fields wrote: >>>> Just a friendly reminder that we'll freeze the dev subversion repository >>>> tomorrow prior to migration to github. The migration will take about an >>>> hour, during which all bioperl github repos will be replaced with the full >>>> repos, and devs added. The test repos will be removed around that time >>>> (Heikki, will that be a problem?). >>>> >>>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maizemu at gmail.com Wed May 12 17:12:28 2010 From: maizemu at gmail.com (Christopher Bottoms) Date: Wed, 12 May 2010 12:12:28 -0500 Subject: [Bioperl-l] Citing CPAN modules in scientific publications Message-ID: Dear BioPerlers, I am working on a publication which would be impossible without the use of several CPAN modules. I appreciate the work authors and maintainers have put into these modules and would like to acknowledge them by citing their work. I was thinking of a format such as Author(s), Maintainer(s) *Module::Name* [ http://search.cpan.org/dist/Module-Name] A reference for File::Slurp would appear thus: Uri Guttman, Dave Rolsky *File::Slurp* [ http://search.cpan.org/dist/File-Slurp] I guess that I could instead mention authors in an acknowledgment section. I noticed a large acknowledgment section in the BioPerl paper ( http://genome.cshlp.org/content/12/10/1611.full). Thanks for your time, Christopher Bottoms (molecules) From greg at ebi.ac.uk Wed May 12 18:16:53 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Wed, 12 May 2010 19:16:53 +0100 Subject: [Bioperl-l] BioPerl for indexing quality score files Message-ID: Hi all, I'm wondering if anyone has tried using BioPerl to index sequence quality score files? The files I'm looking at tend to look like Fasta files, but with numbers (between 0 and 99) and spaces instead of sequence strings. Something like: --- >chr1 0 20 20 20 50 99 99 99 99 30 30 20 20 10 10 0 0 0 0 --- (An example for Chimpanzee can be found here, as the file 'panTro2.quals.fa.gz': http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ ) I'm currently using a home-brewed file indexing system to access subsets of these quality scores, but it's kind of slow and (probably) buggy. I'd much rather use something like Bio::DB::Fasta, but (without having actually tried it) I expect it wouldn't be too happy with these not-quite-fasta format quality files. Has anyone run into a similar situation and found a solution using Bioperl (or something else)? I'd be happy to hack around a bit to get this to work, if necessary; if anyone could provide pointers on where to start, I'd be much obliged. Cheers, Greg PS - it's great to see the GitHub migration moving along so swiftly! I'll be *much* more likely to start bug-hunting and patch-submitting with the code living there now. :) From greg at ebi.ac.uk Wed May 12 18:26:26 2010 From: greg at ebi.ac.uk (Gregory Jordan) Date: Wed, 12 May 2010 19:26:26 +0100 Subject: [Bioperl-l] BioPerl for indexing quality score files In-Reply-To: References: Message-ID: Ok, I need to shame myself with a huge "RTFM" for this one -- http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/DB/Qual.pm Sorry for the spam. Still happy about the GitHub, though! greg On 12 May 2010 19:16, Gregory Jordan wrote: > Hi all, > > I'm wondering if anyone has tried using BioPerl to index sequence quality > score files? The files I'm looking at tend to look like Fasta files, but > with numbers (between 0 and 99) and spaces instead of sequence strings. > Something like: > --- > >chr1 > 0 20 20 20 50 99 99 99 99 30 30 20 20 10 10 0 0 0 0 > --- > (An example for Chimpanzee can be found here, as the file > 'panTro2.quals.fa.gz': > http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ ) > > I'm currently using a home-brewed file indexing system to access subsets of > these quality scores, but it's kind of slow and (probably) buggy. I'd much > rather use something like Bio::DB::Fasta, but (without having actually tried > it) I expect it wouldn't be too happy with these not-quite-fasta format > quality files. > > Has anyone run into a similar situation and found a solution using Bioperl > (or something else)? > > I'd be happy to hack around a bit to get this to work, if necessary; if > anyone could provide pointers on where to start, I'd be much obliged. > > Cheers, > Greg > > PS - it's great to see the GitHub migration moving along so swiftly! I'll > be *much* more likely to start bug-hunting and patch-submitting with the > code living there now. :) > From cjfields at illinois.edu Wed May 12 18:48:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 13:48:53 -0500 Subject: [Bioperl-l] GitHub migration complete Message-ID: All, The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. I've added collaborators to almost all repositories; unfortunately, GitHub decided to remove 'copy permissions' for adding collaborators just recently, so we'll have to manually add each in to each repo until that is resolved (from what I hear, should be soon). In the meantime, if you are a bioperl developer and aren't listed as a github collaborator please sign up for a github account, add SSH keys, and let me know your github user name. I'll add you to bioperl-live and any other repos you want (please let me know which ones!). I'll be doing a few last-minute house-cleaning bits (adding post-receive hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. Let me know how it goes! chris From David.Messina at sbc.su.se Wed May 12 19:59:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 12 May 2010 21:59:14 +0200 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: Thanks, Chris! Clone and commit are working here. Dave From Kevin.M.Brown at asu.edu Wed May 12 20:06:38 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 12 May 2010 13:06:38 -0700 Subject: [Bioperl-l] Citing CPAN modules in scientific publications In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> Wouldn't the format of the citation actually be dictated by the publication the paper was going to be in? E.g. the APA guide sets the format to be: Jones, D. F. (2002). The Mental Measurement Tester (Version 3.2) [Computer software]. Fort Lauderdale, FL: Nova Southeastern University. Retrieved July 22, 2007. Available from http://www.buros.com/ Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Christopher Bottoms > Sent: Wednesday, May 12, 2010 10:12 AM > To: bioperl-l List > Subject: [Bioperl-l] Citing CPAN modules in scientific publications > > Dear BioPerlers, > > I am working on a publication which would be impossible > without the use of > several CPAN modules. I appreciate the work authors and > maintainers have put > into these modules and would like to acknowledge them by > citing their work. > > I was thinking of a format such as > Author(s), Maintainer(s) *Module::Name* [ > http://search.cpan.org/dist/Module-Name] > > > A reference for File::Slurp would appear thus: > > Uri Guttman, Dave Rolsky *File::Slurp* [ > http://search.cpan.org/dist/File-Slurp] > > > I guess that I could instead mention authors in an > acknowledgment section. I > noticed a large acknowledgment section in the BioPerl paper ( > http://genome.cshlp.org/content/12/10/1611.full). > > Thanks for your time, > Christopher Bottoms (molecules) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Wed May 12 20:35:27 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 12 May 2010 15:35:27 -0500 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> On May 12, 2010, at 1:48 PM, Chris Fields wrote: > The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. You mean clones? ;) Thanks Chris!! This is *awesome*. I'm really glad we're in git now and very much appreciate all your work on this. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Wed May 12 21:34:39 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 May 2010 16:34:39 -0500 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> References: <0AC87BE0-AD5B-4FDC-B0D9-A9FBCBFC66CB@jays.net> Message-ID: On May 12, 2010, at 3:35 PM, Jay Hannah wrote: > On May 12, 2010, at 1:48 PM, Chris Fields wrote: >> The migration to github is now essentially complete, minus a few small house-keeping details. Please let me know if there are problems with checkouts. > > You mean clones? ;) > > Thanks Chris!! This is *awesome*. I'm really glad we're in git now and very much appreciate all your work on this. > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah Yes, that was svn slipping in there... chris From maj at fortinbras.us Thu May 13 01:44:09 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 12 May 2010 21:44:09 -0400 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: References: Message-ID: <77C82E975CC24860AA16EE537E270FBD@NewLife> awesome job, Chris- MAJ (what's git again? Oh never mind...) ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Wednesday, May 12, 2010 2:48 PM Subject: [Bioperl-l] GitHub migration complete > All, > > The migration to github is now essentially complete, minus a few small > house-keeping details. Please let me know if there are problems with > checkouts. > > I've added collaborators to almost all repositories; unfortunately, GitHub > decided to remove 'copy permissions' for adding collaborators just recently, > so we'll have to manually add each in to each repo until that is resolved > (from what I hear, should be soon). In the meantime, if you are a bioperl > developer and aren't listed as a github collaborator please sign up for a > github account, add SSH keys, and let me know your github user name. I'll add > you to bioperl-live and any other repos you want (please let me know which > ones!). > > I'll be doing a few last-minute house-cleaning bits (adding post-receive > hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. Let > me know how it goes! > > chris > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maizemu at gmail.com Thu May 13 03:27:47 2010 From: maizemu at gmail.com (Christopher Bottoms) Date: Wed, 12 May 2010 22:27:47 -0500 Subject: [Bioperl-l] Citing CPAN modules in scientific publications In-Reply-To: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B406BB3F19@EX02.asurite.ad.asu.edu> Message-ID: Thanks. I was also wondering about listing the maintainer. I'm guessing not, since the maintainer can add herself (or himself) to the list of authors if she felt that she had contributed enough to warrant it. On Wed, May 12, 2010 at 3:06 PM, Kevin Brown wrote: > Wouldn't the format of the citation actually be dictated by the > publication the paper was going to be in? E.g. the APA guide sets the > format to be: > > Jones, D. F. (2002). The Mental Measurement Tester (Version 3.2) > [Computer software]. > Fort Lauderdale, FL: Nova Southeastern University. Retrieved > July 22, 2007. > Available from http://www.buros.com/ > > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Christopher Bottoms > > Sent: Wednesday, May 12, 2010 10:12 AM > > To: bioperl-l List > > Subject: [Bioperl-l] Citing CPAN modules in scientific publications > > > > Dear BioPerlers, > > > > I am working on a publication which would be impossible > > without the use of > > several CPAN modules. I appreciate the work authors and > > maintainers have put > > into these modules and would like to acknowledge them by > > citing their work. > > > > I was thinking of a format such as > > Author(s), Maintainer(s) *Module::Name* [ > > http://search.cpan.org/dist/Module-Name] > > > > > > A reference for File::Slurp would appear thus: > > > > Uri Guttman, Dave Rolsky *File::Slurp* [ > > http://search.cpan.org/dist/File-Slurp] > > > > > > I guess that I could instead mention authors in an > > acknowledgment section. I > > noticed a large acknowledgment section in the BioPerl paper ( > > http://genome.cshlp.org/content/12/10/1611.full). > > > > Thanks for your time, > > Christopher Bottoms (molecules) > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki.lehvaslaiho at gmail.com Thu May 13 06:11:40 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 09:11:40 +0300 Subject: [Bioperl-l] GitHub migration complete In-Reply-To: <77C82E975CC24860AA16EE537E270FBD@NewLife> References: <77C82E975CC24860AA16EE537E270FBD@NewLife> Message-ID: It works. Bliss. Worth mentioning now on the list that the latest instructions are in http://www.bioperl.org/wiki/Using_Git I've recommitted the the two changes I did on the experimental repo. I had a small problem when editing the README text file: git was not showing differences between the original file and my edits. It kept saying that bala ~/src/bioperl-live> git diff README diff --git a/README b/README index 03685a8..8e20592 100644 Binary files a/README and b/README differ The reason, of course, was that a hard to detect binary character had slipped in to my edit. Just so that you know when this happens to you... -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 13 May 2010 04:44, Mark A. Jensen wrote: > awesome job, Chris- MAJ > (what's git again? Oh never mind...) > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Wednesday, May 12, 2010 2:48 PM > Subject: [Bioperl-l] GitHub migration complete > > > > All, >> >> The migration to github is now essentially complete, minus a few small >> house-keeping details. Please let me know if there are problems with >> checkouts. >> >> I've added collaborators to almost all repositories; unfortunately, GitHub >> decided to remove 'copy permissions' for adding collaborators just recently, >> so we'll have to manually add each in to each repo until that is resolved >> (from what I hear, should be soon). In the meantime, if you are a bioperl >> developer and aren't listed as a github collaborator please sign up for a >> github account, add SSH keys, and let me know your github user name. I'll >> add you to bioperl-live and any other repos you want (please let me know >> which ones!). >> >> I'll be doing a few last-minute house-cleaning bits (adding post-receive >> hooks, set up a mirror, etc), but it shouldn't interfere with checkouts. >> Let me know how it goes! >> >> chris >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From heikki.lehvaslaiho at gmail.com Thu May 13 06:20:51 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 09:20:51 +0300 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: Just a thumbs up. Aaron's fix works. It problem seems to be limited to where he spotted it. I am working on refreshing my memory how the code work - it has been quite a few years since I wrote it - and will commit better tests. As of getting values outseide the defined region, that is a feature rather than a bug. The idea was to be able to ask what would the new coordinate be if the feature extended beyond the known limits. The is the capability of Bio::Coordinate::ExtrapolatingPair that is used here. That class also has a method strict that can be used to prevent extrapolating, but the code to access that has not been written into GeneMapper. I'll see if I can get it to work. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 12 May 2010 13:23, Heikki Lehvaslaiho wrote: > Outch. I'll definitely have a look. > > Strange that none of the tests have picked this up... > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > > > On 12 May 2010 01:40, Aaron Mackey wrote: > >> Hi Chris, >> >> I was hoping Heikki might take up the cause and investigate further -- >> let's >> give him a chance to respond. But it's a frightening bug if it's really >> been that way for all this time ... >> >> -Aaron >> >> On Tue, May 11, 2010 at 6:31 PM, Chris Fields >> wrote: >> >> > Aaron, >> > >> > Do we want to write this up as a set of tests to add to the bioperl test >> > suite? We can probably add it after the github migration tomorrow. >> > >> > chris >> > >> > On May 11, 2010, at 4:26 PM, Aaron Mackey wrote: >> > >> > > Hi Zerui (and others), >> > > >> > > I've confirmed there seems to be a bug in Bio::Coordinate::GeneMapper, >> > > specifically in this code: >> > > >> > > lines: >> > > 1170: (-start => int ($loc->start / 3 ) +1, >> > > 1171: -end => int ($loc->end / 3 ) +1, >> > > >> > > both of those lines should look like: int (($loc->start - 1) / 3) + 1 >> > > >> > > otherwise for CDS coordinates 1, 2, 3, 4, 5, 6, you get incorrect >> peptide >> > > positions 1, 1, 2, 2, 2, 3 (when you instead want 1, 1, 1, 2, 2, 2) >> > > >> > > There is also a problem when mapping exon coordinates that are >> > outside/after >> > > the CDS region (instead of getting undefined locations, you continue >> to >> > get >> > > peptide coordinates, but they are invalid, larger than the protein >> > length). >> > > >> > > Dennis and fringy -- this may affect the SNPtab.pl script I wrote for >> > you, >> > > as it uses this module to calculate codons for SNPs. >> > > >> > > -Aaron >> > > >> > > P.S. a script the demonstrates the problem: >> > > >> > > use Bio::Coordinate::GeneMapper; >> > > >> > > my $mapper = >> > > Bio::Coordinate::GeneMapper >> > > ->new( -in => "chr", >> > > -out => "propeptide", >> > > -exons => [ Bio::Location::Simple >> > > ->new( -start => 101, >> > > -end => 109 ), >> > > Bio::Location::Simple >> > > ->new( -start => 201, >> > > -end => 221 ), >> > > ], >> > > -cds => Bio::Location::Simple >> > > ->new(-start => 101, -end => 209), >> > > ); >> > > >> > > >> > > print join("\t", "chr", "aa"), "\n"; >> > > for my $pos (99..111,199..211) { >> > > my $res = $mapper->map( >> > > Bio::Location::Simple->new(-start => $pos, -end => $pos, -seq_id => >> > 1)); >> > > my $start = $res->start; $start = "NA" unless defined $start; >> > > my $end = $res->end; $end = "NA" unless defined $end; >> > > print join("\t", $pos, $start), "\n"; >> > > } >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From remi.planel at free.fr Thu May 13 09:08:58 2010 From: remi.planel at free.fr (Remi) Date: Thu, 13 May 2010 11:08:58 +0200 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast Message-ID: <4BEBC1AA.2020908@free.fr> Hi all, I'm using Bio::Tools::Run::StandAloneBlastPlus and trying to run a remote blast using this code : /my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'nr', -remote => '1', ); my $result = $fac->blastp( -query => 'P12996.fasta', -outfile => 'out.bls', ); /but I got an error message : "BLAST Database error: Protein BLAST database './nr' does not exist in the NCBI servers". But if I'm modifying directly the value of $fac->{'_db_path'} like : /$fac->{'_db_path'} = 'nr';/ it's working. Is that a Bug or am I missing something ? Thanks, R?mi From maj at fortinbras.us Thu May 13 11:17:55 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 13 May 2010 07:17:55 -0400 Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast In-Reply-To: <4BEBC1AA.2020908@free.fr> References: <4BEBC1AA.2020908@free.fr> Message-ID: <1A1631149DEF4B9080E5D4D5851F4587@NewLife> Hi R?mi Looks like a bug-- can you report it via http://bugzilla.bioperl.org? Just enter what you've written here-- I appreciate it- Mark ----- Original Message ----- From: "Remi" To: "BioPerl List" Sent: Thursday, May 13, 2010 5:08 AM Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - remote blast Hi all, I'm using Bio::Tools::Run::StandAloneBlastPlus and trying to run a remote blast using this code : /my $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'nr', -remote => '1', ); my $result = $fac->blastp( -query => 'P12996.fasta', -outfile => 'out.bls', ); /but I got an error message : "BLAST Database error: Protein BLAST database './nr' does not exist in the NCBI servers". But if I'm modifying directly the value of $fac->{'_db_path'} like : /$fac->{'_db_path'} = 'nr';/ it's working. Is that a Bug or am I missing something ? Thanks, R?mi _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed May 12 20:10:36 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 12 May 2010 22:10:36 +0200 Subject: [Bioperl-l] Ohloh update Message-ID: <32ED5B44-061D-4634-9E5C-72E313E1A58C@sbc.su.se> Hi everyone, Ohloh account probably needs to be changed to point to our Github repo. I'd be happy to do it if someone adds me on there. Otherwise, could one of the admins check into that when they get a chance? Also, I notice it hasn't registered any commits since March 15th ? hopefully the repo change will wake it up or we may need to contact one of their admins again. Can anyone think of other external sites pointing to BioPerl which need updating, too? Dave From jay at jays.net Thu May 13 12:42:41 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 07:42:41 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <201005130328.o4D3S8Fs011865@portal.open-bio.org> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> Message-ID: ------- Comment #3 from cjfields at bioperl.org 2010-05-12 23:28 EST ------- > Ouch, that's a bit nasty. Taking advantage of git move and doing this on a > topic branch (topic/bug_3077) on github. I plan on cleaning up the 'jhannah' branch (renaming it 'topic/bug_2515', asking people for their input, merging to master). I plan on cleaning up the 'yapc10hackathon' branch. I can't remember what Robert and I left in there after YAPC last year. Should most of the other branches be deleted? If a branch hasn't been changed in more than a year and no one intends to jump into it in the coming year what purpose does it serve? Old tags can hang out forever, but shouldn't our branch list be tidy? (Specifically I would argue that old release number tags should hang out forever, but I don't see the point in any other ancient tags continuing to exist if their purpose isn't documented anywhere.) Are we serious about emulating this branching model? http://nvie.com/git-model If so then we need to create a 'develop' branch and only the release manager should touch 'master' and yahoos like me should be branching off of 'develop' instead, right? Counter argument: Since 'master' is the default branch and we want to encourage doc patches and typo corrections from the world making trivial contributions as easy as possible for everyone, I would think that using 'master' as the daily headstream would be better. So 'topic/bug_####' for each non-trivial Bugzilla ticket, and release managers can work their magic in 'release-#-#' branches. (Release branches old enough that there's no way we're going to patch them any more are deleted, and only the tag remains). Thoughts? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah P.S. My "up to date" OS X 10.6.3 machines both had git 1.5.3.1 on them. Upgrading to git 1.7.1 makes branch checkouts simpler. jhannah at minijaysnet~/src/bioperl-live$ git branch -a * master remotes/origin/HEAD -> origin/master remotes/origin/TRY_featureio_refactor remotes/origin/TRY_gff_refactor remotes/origin/TRY_locatableseq_refactor remotes/origin/anydbm-branch remotes/origin/bioperl remotes/origin/bioperl-branch-1-5-1 remotes/origin/bioperl-live remotes/origin/branch-06 remotes/origin/branch-07 remotes/origin/branch-07-ensembl-120 remotes/origin/branch-1-0-0 remotes/origin/branch-1-2 remotes/origin/branch-1-2-collection remotes/origin/branch-1-4 remotes/origin/branch-1-5-2 remotes/origin/branch-1-6 remotes/origin/branch-ensembl-m1 remotes/origin/branch-experimental remotes/origin/featann_rollback remotes/origin/internal-branch-pre-delete-06-tag remotes/origin/jhannah remotes/origin/lightweight_feature_branch remotes/origin/master remotes/origin/ontology-cache remotes/origin/release-0-04-bug remotes/origin/restriction-refactor remotes/origin/stable-0-05 remotes/origin/stable-0-05-new remotes/origin/steve_chervitz remotes/origin/topic/bug_3077 remotes/origin/yapc10hackathon jhannah at minijaysnet~/src/bioperl-live$ git tag after-05-06-merge after-05-06-merge-2 after004 before-05-to-06-merge before-05-to-06-trunk bioperl-06-1 bioperl-061-pre1 bioperl-1-0-0 bioperl-1-0-alpha bioperl-1-0-alpha2-rc bioperl-1-2-1-rc1 bioperl-1-6-0_001 bioperl-1-6-0_002 bioperl-1-6-0_003 bioperl-1-6-0_004 bioperl-1-6-0_005 bioperl-1-6-0_006 bioperl-1-6-RC1 bioperl-1-6-RC2 bioperl-1-6-RC2_15306 bioperl-1-6-RC3 bioperl-1-6-RC3_15392 bioperl-1-6-RC4 bioperl-devel-1-1-1 bioperl-devel-1-3-01 bioperl-devel-1-3-02 bioperl-devel-1-3-03 bioperl-devel-1-3-04 bioperl-release-1-0-0 bioperl-release-1-0-1 bioperl-release-1-0-2 bioperl-release-1-1-0 bioperl-release-1-2-0 bioperl-release-1-2-1 bioperl-release-1-2-2 bioperl-release-1-2-3 bioperl-release-1-4-0 bioperl-release-1-5-0 bioperl-release-1-5-0-rc1 bioperl-release-1-5-0-rc2 bioperl-release-1-5-1 bioperl-release-1-5-1-rc4 bioperl-release-1-5-2 bioperl-release-1-5-2-patch1 bioperl-release-1-5-2-patch2 bioperl-release-1-6 bioperl-release-1-6-1 bioperl-run-release-1-2-0 for_gmod_0_003 gbrowse_1_65 join-0-04-to-0-05 lightweight_feature ontology-fix1 ontology-overhaul-end ontology-overhaul-start prerelease-06 release-0-04-1 release-0-04-2 release-0-04-3 release-0-04-4 release-0-05 release-0-05-1 release-0-7-0 release-0-7-1 release-0-7-2 release-0-9-0 release-0-9-2 release-0-9-3 release-06 release-06-2 release-1_01 release-ensembl-06 snapshot-at-head-of-07-branch start tag-ensembl-stable-061 From cjfields at illinois.edu Thu May 13 13:49:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 08:49:19 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> Message-ID: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> On May 13, 2010, at 7:42 AM, Jay Hannah wrote: > ------- Comment #3 from cjfields at bioperl.org 2010-05-12 23:28 EST ------- >> Ouch, that's a bit nasty. Taking advantage of git move and doing this on a >> topic branch (topic/bug_3077) on github. > > I plan on cleaning up the 'jhannah' branch (renaming it 'topic/bug_2515', asking people for their input, merging to master). > > I plan on cleaning up the 'yapc10hackathon' branch. I can't remember what Robert and I left in there after YAPC last year. > > Should most of the other branches be deleted? If a branch hasn't been changed in more than a year and no one intends to jump into it in the coming year what purpose does it serve? Old tags can hang out forever, but shouldn't our branch list be tidy? (Specifically I would argue that old release number tags should hang out forever, but I don't see the point in any other ancient tags continuing to exist if their purpose isn't documented anywhere.) I would say err on the safe side and keep the ones we're unsure of, but a cleanup would be nice. We could adopt what Moose has done and move branches we're unsure of to something like 'attic'. > Are we serious about emulating this branching model? > > http://nvie.com/git-model > > If so then we need to create a 'develop' branch and only the release manager should touch 'master' and yahoos like me should be branching off of 'develop' instead, right? > > Counter argument: Since 'master' is the default branch and we want to encourage doc patches and typo corrections from the world making trivial contributions as easy as possible for everyone, I would think that using 'master' as the daily headstream would be better. So 'topic/bug_####' for each non-trivial Bugzilla ticket, and release managers can work their magic in 'release-#-#' branches. (Release branches old enough that there's no way we're going to patch them any more are deleted, and only the tag remains). ... > Thoughts? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > P.S. My "up to date" OS X 10.6.3 machines both had git 1.5.3.1 on them. Upgrading to git 1.7.1 makes branch checkouts simpler. Moose has a 'stable' branch that release managers (the cabal) pull into from 'master' for releases. It's just a matter of semantics, what name we use for active development branches and what to use for stable releases; for us, the 'develop' and 'master' from that link could be (respectively) 'master' and 'stable'. 'hotfixes' would be bug fixes, and 'feature branches' would be just that, new features to be added. As for bug fixes, it would be much nicer to have most changes beyond very simple ones (including all bug fixes) relegated to branches that can be merged in. This sequesters any changes to the branch, where they can be tested prior to a merge. Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. chris From jay at jays.net Thu May 13 14:38:20 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 09:38:20 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: So, like this? Flow diagram: http://biodoc.ist.unomaha.edu/~jhannah/tmp/branches.png master (git and github default) Trivial changes committed directly here. topic/bug_#### One branch per non-trivial Bugzilla ticket topic/jhannah_crazy_idea Branches for unstable/unfinished work stable Release manager pulls from master to stable periodically (all tests are passing, etc.) release-#-#-# Pulled from stable, pushed to CPAN attic/* Any branch with no activity for 1 year I like it. > Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? I'm fine with attic/ and just leaving stuff in there until 2050. Then we should probably delete them. :) My understanding is that by default commits that have no pointers to them (branches or tags or subsequent commits) are subject to cleanup/prune. I think this means that if someone, 10 years ago, committed 3 times to the branch "jhannah_crazy_idea" and that branch is deleted, then those 3 commits may be removed (gone forever) by git cleanup/prune. This is a feature or a crime against humanity depending on who you ask. It can be disabled in a normal repo, I don't know about github. > Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. As I collect clues I'll be brain dumping everything I think I know onto the wiki. This is a crazy busy week for me though. :( Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From jay at jays.net Thu May 13 15:00:05 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 10:00:05 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: On May 13, 2010, at 8:49 AM, Chris Fields wrote: > Saying that, we could adopt a workflow policy that allows deletion of any merged branch. Right. Except for release-* branches, which are never merged anywhere. A release is a branch while it's being prepared and tweaked. Once perfect, it is tagged and pushed to CPAN. At that point the branch can be deleted since we can never push that release number to CPAN again (even if we wanted to). The tag remains forever. Or am I mistaken? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From shalabh.sharma7 at gmail.com Thu May 13 15:07:26 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 13 May 2010 11:07:26 -0400 Subject: [Bioperl-l] parsing blast report with long description Message-ID: Hi All, I need some help in parsing blast output. I have a inhouse database that contain sequences with really long description. >SMPL_IDI_1105131728043 /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV So my blast report looks like this: ..... ..... >SMPL_IDI_1105131728043 /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 6887/Open Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 Length = 213 Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix adjust. Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) ..... ..... (note that the tag "TI_1000008216887" is splitting in two lines). I am using SeqIO to parse this report. What i am doing is parsing the description field again to get all the tags. like .... .... my $desc = $hit->description; my @f = split('/',$desc); for(my $i = 0;$i < scalar @f;$i++){ print OUT "$f[$i]\t";} ..... ..... *I am getting the perfect parsed report but the field with TI_1000008216887 has a space **TI_100000821 6887 *. I would really appreciate if anyone can help me out. Thanks Shalabh Sharma From joshpk105 at gmail.com Thu May 13 14:42:28 2010 From: joshpk105 at gmail.com (Katz) Date: Thu, 13 May 2010 07:42:28 -0700 (PDT) Subject: [Bioperl-l] RemoteBlast Message-ID: <54674635-db43-413c-8c96-0d214f1b978d@l31g2000yqm.googlegroups.com> Is there anyway to differentiate between the three different ncbi blastn? Right now I'm using RemoteBlast as follows: Bio::Tools::Run::RemoteBlast->new(-prog => 'blastn', -data => 'nr', - expect => '1e-5', -readmethod => 'SearchIO'); then blasting my files. However, this is auto using megablastn and i need to use regular blastn. Thx, Josh From hlapp at drycafe.net Thu May 13 15:43:47 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 11:43:47 -0400 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> On May 13, 2010, at 9:49 AM, Chris Fields wrote: > Re: deletion of branches, I'm only really in support of deleting > feature branches that have been merged back to 'master' or another > branch (e.g. only removed using 'git branch -d foo'). I agree. > Older subversion release branches don't tend to fall into that > category, in that we had merged or cherry-picked changes from svn > trunk to them, not vice versa; they were never merged back to > trunk. Deletion in this case would be somewhat history-revising, > correct? I wouldn't call it history-revising. I also think it's OK to delete release branches that are no longer supported, iff we have a tag for the release itself. That's different from counting inactivity. A branch may lie dormant for a year or longer until someone has time to pick it back up again - I don't see the harm in keeping those around. > Saying that, we could adopt a workflow policy that allows deletion > of any merged branch. All this suggests coming up with a good > 'Contributing' document. That would be highly useful. I'll also voice a word of caution here though - I find it kind of ironic that the switch to git, which is supposed to make contribution *easier*, very often leads subsequently to complex commit/pull/push/branching workflows being instituted for projects that take pages and pages to document, a lot of time to ingest, and discipline to follow - it seems to be very easy and tempting to go overboard with this. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Thu May 13 16:01:05 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 11:01:05 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: On May 13, 2010, at 10:43 AM, Hilmar Lapp wrote: > On May 13, 2010, at 9:49 AM, Chris Fields wrote: >> Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. > > That would be highly useful. I'll also voice a word of caution here though - I find it kind of ironic that the switch to git, which is supposed to make contribution *easier*, very often leads subsequently to complex commit/pull/push/branching workflows being instituted for projects that take pages and pages to document, a lot of time to ingest, and discipline to follow - it seems to be very easy and tempting to go overboard with this. I'm happy to comply with whatever the policy is. If that policy is "everything trivial in master, non-trivial in topic/FOO, release manager will figure out everything else" that's fine with me. A branch cleanup would be nice. Or I'll just close my eyes. :) I'm embarrassed that I left unfinished business in branches in 2009. I'm fishing for a consensus on a contribution policy. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From heikki.lehvaslaiho at gmail.com Thu May 13 16:48:14 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Thu, 13 May 2010 19:48:14 +0300 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: I second Hilmar. Let's try to keep this simple. While for most people just beginning to use git this discussion seems confusing and the structures complex, things really are pretty simple. I expect most of the branches to live only in developers copies of the repo. They are created when work starts on the new bug or a feature, merged to master when work is done, and removed immediately or soon after that. Most of the work is done in the master and only the release managers touch the stable and release branches. See Jay's flow diagram. Work flow for this is (while calling 'git status' all the time): git branch $new git checkout $new # work git commit git commit ... git checkout master git merge $new git push ... git branch -d $new -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 13 May 2010 18:43, Hilmar Lapp wrote: > > On May 13, 2010, at 9:49 AM, Chris Fields wrote: > > Re: deletion of branches, I'm only really in support of deleting feature >> branches that have been merged back to 'master' or another branch (e.g. only >> removed using 'git branch -d foo'). >> > > I agree. > > > Older subversion release branches don't tend to fall into that category, >> in that we had merged or cherry-picked changes from svn trunk to them, not >> vice versa; they were never merged back to trunk. Deletion in this case >> would be somewhat history-revising, correct? >> > > I wouldn't call it history-revising. I also think it's OK to delete release > branches that are no longer supported, iff we have a tag for the release > itself. > > That's different from counting inactivity. A branch may lie dormant for a > year or longer until someone has time to pick it back up again - I don't see > the harm in keeping those around. > > > Saying that, we could adopt a workflow policy that allows deletion of any >> merged branch. All this suggests coming up with a good 'Contributing' >> document. >> > > That would be highly useful. I'll also voice a word of caution here though > - I find it kind of ironic that the switch to git, which is supposed to make > contribution *easier*, very often leads subsequently to complex > commit/pull/push/branching workflows being instituted for projects that take > pages and pages to document, a lot of time to ingest, and discipline to > follow - it seems to be very easy and tempting to go overboard with this. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 13 21:41:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 16:41:35 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: On May 13, 2010, at 11:48 AM, Heikki Lehvaslaiho wrote: > I second Hilmar. Let's try to keep this simple. > > While for most people just beginning to use git this discussion seems > confusing and the structures complex, things really are pretty simple. > > I expect most of the branches to live only in developers copies of the repo. > They are created when work starts on the new bug or a feature, merged to > master when work is done, and removed immediately or soon after that. Most > of the work is done in the master and only the release managers touch the > stable and release branches. See Jay's flow diagram. Right, many branches will occur locally. And I'm not suggesting that we strictly follow a particular pattern; I would rather not enforce that upon devs who already have a productive pattern set. I think this would act more as a suggested method of development, something that has been demonstrated to work well for other large projects (and something I'll be following). What I would really like to promote is using branches for making code changes, even ones that are only a few commits or so (and even if they are only local ones not pushed to github). Branches are cheap. > Work flow for this is (while calling 'git status' all the time): > > git branch $new > git checkout $new > # work > git commit > git commit > ... > git checkout master > git merge $new > git push > ... > git branch -d $new > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > > Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia Yes, that's essentially the basic workflow, maybe with a preliminary 'git pull' to sync to the latest. chris > On 13 May 2010 18:43, Hilmar Lapp wrote: > >> >> On May 13, 2010, at 9:49 AM, Chris Fields wrote: >> >> Re: deletion of branches, I'm only really in support of deleting feature >>> branches that have been merged back to 'master' or another branch (e.g. only >>> removed using 'git branch -d foo'). >>> >> >> I agree. >> >> >> Older subversion release branches don't tend to fall into that category, >>> in that we had merged or cherry-picked changes from svn trunk to them, not >>> vice versa; they were never merged back to trunk. Deletion in this case >>> would be somewhat history-revising, correct? >>> >> >> I wouldn't call it history-revising. I also think it's OK to delete release >> branches that are no longer supported, iff we have a tag for the release >> itself. >> >> That's different from counting inactivity. A branch may lie dormant for a >> year or longer until someone has time to pick it back up again - I don't see >> the harm in keeping those around. >> >> >> Saying that, we could adopt a workflow policy that allows deletion of any >>> merged branch. All this suggests coming up with a good 'Contributing' >>> document. >>> >> >> That would be highly useful. I'll also voice a word of caution here though >> - I find it kind of ironic that the switch to git, which is supposed to make >> contribution *easier*, very often leads subsequently to complex >> commit/pull/push/branching workflows being instituted for projects that take >> pages and pages to document, a lot of time to ingest, and discipline to >> follow - it seems to be very easy and tempting to go overboard with this. >> >> -hilmar >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu May 13 21:56:11 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 14:56:11 -0700 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> Message-ID: <4BEC757B.5030407@cornell.edu> OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. Rob From jay at jays.net Thu May 13 22:00:21 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 17:00:21 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <4BEC757B.5030407@cornell.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: <7BA7535D-AE97-4827-8B86-91C24842BAED@jays.net> On May 13, 2010, at 4:56 PM, Robert Buels wrote: > OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. > > I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. master++ Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From rmb32 at cornell.edu Thu May 13 22:13:52 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 15:13:52 -0700 Subject: [Bioperl-l] move ancient branches to attic Message-ID: <4BEC79A0.5000505@cornell.edu> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. Note that there are still tags for all the old releases, so those won't be lost. Thoughts? Rob From jay at jays.net Thu May 13 22:22:30 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 13 May 2010 17:22:30 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC79A0.5000505@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> Message-ID: On May 13, 2010, at 5:13 PM, Robert Buels wrote: > To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. > > Note that there are still tags for all the old releases, so those won't be lost. Sounds generous to me. proceed++ Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From hlapp at drycafe.net Thu May 13 22:46:00 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 18:46:00 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC79A0.5000505@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> Message-ID: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Why? What is the gain from deleting branches that you don't know whether they are dead or not? -hilmar On May 13, 2010, at 6:13 PM, Robert Buels wrote: > To clean up branches, I propose to deleting branches (merged or not) > whose head is older than Jan 1, 2006, and moving branches to attic/ > whose head is older than Jan 1, 2009. > > Note that there are still tags for all the old releases, so those > won't be lost. > > Thoughts? > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From rmb32 at cornell.edu Thu May 13 23:05:06 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 May 2010 16:05:06 -0700 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <4BEC85A2.50401@cornell.edu> The gain is to avoid having useless things hanging around. Every time somebody has to read through a list of 50 branches to find the maybe 5 that are useful, it's time lost. In other word, it's the same gain that you get from cleaning off your desk, so that you can see where you put things. Rob Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether > they are dead or not? > > -hilmar > > On May 13, 2010, at 6:13 PM, Robert Buels wrote: > >> To clean up branches, I propose to deleting branches (merged or not) >> whose head is older than Jan 1, 2006, and moving branches to attic/ >> whose head is older than Jan 1, 2009. >> >> Note that there are still tags for all the old releases, so those >> won't be lost. >> >> Thoughts? >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 13 23:07:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 18:07:31 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: <4BEC757B.5030407@cornell.edu> References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: 'master'. That's more in lone with other repos. chris On May 13, 2010, at 4:56 PM, Robert Buels wrote: > OK then, decision time, which is the main devel branch, 'master' or 'develop'? I need to merge in a few small bugfixes. > > I vote for 'master', since it's slightly simpler for new devs, with releases being constructed in branches off of that. > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri May 14 00:27:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:27:22 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC85A2.50401@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> Message-ID: <77C06787-B381-43AA-8F5A-74331866C495@illinois.edu> Let's go through and check which branches are specifically merged back to trunk and delete those first, then list the ones that aren't or we're unsure of. If needed we can move those to an 'attic', like Moose. chris On May 13, 2010, at 6:05 PM, Robert Buels wrote: > The gain is to avoid having useless things hanging around. Every time somebody has to read through a list of 50 branches to find the maybe 5 that are useful, it's time lost. > > In other word, it's the same gain that you get from cleaning off your desk, so that you can see where you put things. > > Rob > > > Hilmar Lapp wrote: >> Why? What is the gain from deleting branches that you don't know whether they are dead or not? >> -hilmar >> On May 13, 2010, at 6:13 PM, Robert Buels wrote: >>> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. >>> >>> Note that there are still tags for all the old releases, so those won't be lost. >>> >>> Thoughts? >>> >>> Rob >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri May 14 00:28:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:28:30 -0500 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> Message-ID: <6757E1DD-5712-4894-8EAF-52F5F902D348@illinois.edu> On May 13, 2010, at 9:38 AM, Jay Hannah wrote: > So, like this? > > Flow diagram: > http://biodoc.ist.unomaha.edu/~jhannah/tmp/branches.png > > master > (git and github default) Trivial changes committed directly here. > topic/bug_#### > One branch per non-trivial Bugzilla ticket > topic/jhannah_crazy_idea > Branches for unstable/unfinished work > stable > Release manager pulls from master to stable periodically (all tests are passing, etc.) > release-#-#-# > Pulled from stable, pushed to CPAN > attic/* > Any branch with no activity for 1 year > > I like it. Yes, something along those lines. >> Re: deletion of branches, I'm only really in support of deleting feature branches that have been merged back to 'master' or another branch (e.g. only removed using 'git branch -d foo'). Older subversion release branches don't tend to fall into that category, in that we had merged or cherry-picked changes from svn trunk to them, not vice versa; they were never merged back to trunk. Deletion in this case would be somewhat history-revising, correct? > > I'm fine with attic/ and just leaving stuff in there until 2050. Then we should probably delete them. :) > > My understanding is that by default commits that have no pointers to them (branches or tags or subsequent commits) are subject to cleanup/prune. I think this means that if someone, 10 years ago, committed 3 times to the branch "jhannah_crazy_idea" and that branch is deleted, then those 3 commits may be removed (gone forever) by git cleanup/prune. > > This is a feature or a crime against humanity depending on who you ask. It can be disabled in a normal repo, I don't know about github. I don't think this is disabled in github (e.g. one can still delete branches). Duke Leto suggested the only real way to prevent history revising commits would be to do a pre-commit hook, which is not supported right now in github. >> Saying that, we could adopt a workflow policy that allows deletion of any merged branch. All this suggests coming up with a good 'Contributing' document. Our 'Using Git' is a start towards this, but it's more a general use page and could point to other (possibly better) resources. I would like something a bit more focused to demonstrate example work-flows and standard practices for those new to git and to BioPerl. It should also mention how we handle pull requests and other github-related bits. > > As I collect clues I'll be brain dumping everything I think I know onto the wiki. This is a crazy busy week for me though. :( > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah No problem. chris From cjfields at illinois.edu Fri May 14 00:41:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 May 2010 19:41:57 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> It would be nice to at least designate them as outdated in some respect, and organize them along those lines. chris On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether they are dead or not? > > -hilmar > > On May 13, 2010, at 6:13 PM, Robert Buels wrote: > >> To clean up branches, I propose to deleting branches (merged or not) whose head is older than Jan 1, 2006, and moving branches to attic/ whose head is older than Jan 1, 2009. >> >> Note that there are still tags for all the old releases, so those won't be lost. >> >> Thoughts? >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Fri May 14 00:55:01 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 20:55:01 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <0111814C-81BB-4F79-A4C9-723725B2B671@illinois.edu> Message-ID: On May 13, 2010, at 8:41 PM, Chris Fields wrote: > It would be nice to at least designate them as outdated in some > respect, and organize them along those lines. I agree. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From hlapp at drycafe.net Fri May 14 01:04:02 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 13 May 2010 21:04:02 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BEC85A2.50401@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> Message-ID: <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> On May 13, 2010, at 7:05 PM, Robert Buels wrote: > The gain is to avoid having useless things hanging around. Every > time somebody has to read through a list of 50 branches to find the > maybe 5 that are useful, it's time lost. > > In other word, it's the same gain that you get from cleaning off > your desk, so that you can see where you put things. Hold on - that's not a good comparison is it? First off, this being git, the "main" repo is not your desk. You can have your desk and wipe it clean of all branches and tags that have ever existed, without affecting, or imposing this on, anyone else. Second, why would you *want* to look through all those branches? This being git, you create branches all the time and merge them back, on your own repo, right? Where in this workflow are you browsing through the 50 branches of the "main" repo all the time? Third, and maybe I'm just too old, but moving to git because branching and having your own clone exactly the way you want it is so easy, only to subsequently delete most of the branches on the "main" repo for primarily aesthetic reasons just doesn't make much sense to me, honestly. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From heikki.lehvaslaiho at gmail.com Fri May 14 10:41:22 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Fri, 14 May 2010 13:41:22 +0300 Subject: [Bioperl-l] git branches, tags, 'topic/bug_####' In-Reply-To: References: <201005130328.o4D3S8Fs011865@portal.open-bio.org> <73BB1389-3820-4035-A917-D064735E875B@illinois.edu> <157DCE48-B575-4539-8A50-229A6216ED26@drycafe.net> <4BEC757B.5030407@cornell.edu> Message-ID: Yep. master. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 14 May 2010 02:07, Chris Fields wrote: > 'master'. That's more in lone with other repos. > > chris > > On May 13, 2010, at 4:56 PM, Robert Buels wrote: > > > OK then, decision time, which is the main devel branch, 'master' or > 'develop'? I need to merge in a few small bugfixes. > > > > I vote for 'master', since it's slightly simpler for new devs, with > releases being constructed in branches off of that. > > > > Rob > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From heikki.lehvaslaiho at gmail.com Fri May 14 10:45:50 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Fri, 14 May 2010 13:45:50 +0300 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <4BEC85A2.50401@cornell.edu> <7B0E9AE4-BDEC-4D69-BC7A-A03657B6D6E4@drycafe.net> Message-ID: Rob, If you think is important, do a survay and create a nice wiki page explaing these braches to everyone. Then we can discuss if some of them are best deleted. -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 14 May 2010 04:04, Hilmar Lapp wrote: > > On May 13, 2010, at 7:05 PM, Robert Buels wrote: > > The gain is to avoid having useless things hanging around. Every time >> somebody has to read through a list of 50 branches to find the maybe 5 that >> are useful, it's time lost. >> >> In other word, it's the same gain that you get from cleaning off your >> desk, so that you can see where you put things. >> > > > Hold on - that's not a good comparison is it? First off, this being git, > the "main" repo is not your desk. You can have your desk and wipe it clean > of all branches and tags that have ever existed, without affecting, or > imposing this on, anyone else. > > Second, why would you *want* to look through all those branches? This being > git, you create branches all the time and merge them back, on your own repo, > right? Where in this workflow are you browsing through the 50 branches of > the "main" repo all the time? > > Third, and maybe I'm just too old, but moving to git because branching and > having your own clone exactly the way you want it is so easy, only to > subsequently delete most of the branches on the "main" repo for primarily > aesthetic reasons just doesn't make much sense to me, honestly. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Fri May 14 13:32:04 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 14 May 2010 08:32:04 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> Message-ID: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: > Why? What is the gain from deleting branches that you don't know whether they are dead or not? If our branch list was clean they wouldn't dupe up when I go to merge in other people's contributions. You don't find large lists of probably dead things annoying? Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah jhannah at cplreynoldslpt:~/src/bioperl-live$ git remote add vinanna git://github.com/vinanna/bioperl-live.gitjhannah at cplreynoldslpt:~/src/bioperl-live$ git fetch vinanna remote: Counting objects: 18, done. remote: Compressing objects: 100% (9/9), done. remote: Total 10 (delta 8), reused 0 (delta 0) Unpacking objects: 100% (10/10), done. >From git://github.com/vinanna/bioperl-live * [new branch] TRY_featureio_refactor -> vinanna/TRY_featureio_refactor * [new branch] TRY_gff_refactor -> vinanna/TRY_gff_refactor * [new branch] TRY_locatableseq_refactor -> vinanna/TRY_locatableseq_refactor * [new branch] anydbm-branch -> vinanna/anydbm-branch * [new branch] bioperl -> vinanna/bioperl * [new branch] bioperl-branch-1-5-1 -> vinanna/bioperl-branch-1-5-1 * [new branch] bioperl-live -> vinanna/bioperl-live * [new branch] branch-06 -> vinanna/branch-06 * [new branch] branch-07 -> vinanna/branch-07 * [new branch] branch-07-ensembl-120 -> vinanna/branch-07-ensembl-120 * [new branch] branch-1-0-0 -> vinanna/branch-1-0-0 * [new branch] branch-1-2 -> vinanna/branch-1-2 * [new branch] branch-1-2-collection -> vinanna/branch-1-2-collection * [new branch] branch-1-4 -> vinanna/branch-1-4 * [new branch] branch-1-5-2 -> vinanna/branch-1-5-2 * [new branch] branch-1-6 -> vinanna/branch-1-6 * [new branch] branch-ensembl-m1 -> vinanna/branch-ensembl-m1 * [new branch] branch-experimental -> vinanna/branch-experimental * [new branch] featann_rollback -> vinanna/featann_rollback * [new branch] internal-branch-pre-delete-06-tag -> vinanna/internal-branch-pre-delete-06-tag * [new branch] lightweight_feature_branch -> vinanna/lightweight_feature_branch * [new branch] master -> vinanna/master * [new branch] ontology-cache -> vinanna/ontology-cache * [new branch] release-0-04-bug -> vinanna/release-0-04-bug * [new branch] restriction-refactor -> vinanna/restriction-refactor * [new branch] stable-0-05 -> vinanna/stable-0-05 * [new branch] stable-0-05-new -> vinanna/stable-0-05-new * [new branch] steve_chervitz -> vinanna/steve_chervitz * [new branch] topic/bug_2515 -> vinanna/topic/bug_2515 jhannah at cplreynoldslpt:~/src/bioperl-live$ git branch -a * master remotes/origin/HEAD -> origin/master remotes/origin/TRY_featureio_refactor remotes/origin/TRY_gff_refactor remotes/origin/TRY_locatableseq_refactor remotes/origin/anydbm-branch remotes/origin/bioperl remotes/origin/bioperl-branch-1-5-1 remotes/origin/bioperl-live remotes/origin/branch-06 remotes/origin/branch-07 remotes/origin/branch-07-ensembl-120 remotes/origin/branch-1-0-0 remotes/origin/branch-1-2 remotes/origin/branch-1-2-collection remotes/origin/branch-1-4 remotes/origin/branch-1-5-2 remotes/origin/branch-1-6 remotes/origin/branch-ensembl-m1 remotes/origin/branch-experimental remotes/origin/featann_rollback remotes/origin/internal-branch-pre-delete-06-tag remotes/origin/jhannah remotes/origin/lightweight_feature_branch remotes/origin/master remotes/origin/ontology-cache remotes/origin/release-0-04-bug remotes/origin/restriction-refactor remotes/origin/stable-0-05 remotes/origin/stable-0-05-new remotes/origin/steve_chervitz remotes/origin/topic/bug_2515 remotes/origin/yapc10hackathon remotes/vinanna/TRY_featureio_refactor remotes/vinanna/TRY_gff_refactor remotes/vinanna/TRY_locatableseq_refactor remotes/vinanna/anydbm-branch remotes/vinanna/bioperl remotes/vinanna/bioperl-branch-1-5-1 remotes/vinanna/bioperl-live remotes/vinanna/branch-06 remotes/vinanna/branch-07 remotes/vinanna/branch-07-ensembl-120 remotes/vinanna/branch-1-0-0 remotes/vinanna/branch-1-2 remotes/vinanna/branch-1-2-collection remotes/vinanna/branch-1-4 remotes/vinanna/branch-1-5-2 remotes/vinanna/branch-1-6 remotes/vinanna/branch-ensembl-m1 remotes/vinanna/branch-experimental remotes/vinanna/featann_rollback remotes/vinanna/internal-branch-pre-delete-06-tag remotes/vinanna/lightweight_feature_branch remotes/vinanna/master remotes/vinanna/ontology-cache remotes/vinanna/release-0-04-bug remotes/vinanna/restriction-refactor remotes/vinanna/stable-0-05 remotes/vinanna/stable-0-05-new remotes/vinanna/steve_chervitz remotes/vinanna/topic/bug_2515 From cjfields at illinois.edu Fri May 14 13:47:05 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 08:47:05 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> Message-ID: <2309AD4D-9FEA-4463-A4FD-519F0FCA2639@illinois.edu> To me, this is more a problem with the way forks currently work in github, via automatically dup-ing all branches vs allowing a single branch ('master', for instance). In fairness, that makes sense if they're implementing this the way I think, in order to conserve space. There are other small issues on github that should be worked out, for instance the automatic addition of all collabs with pull requests, since these go to bioperl-guts now. At least, I got a dup email from the last pull request. Some fixes are supposedly being planned for group-like accounts, just don't know when they'll appear. But I think the overall benefits of github outweigh some of the bumps in the road we're seeing. chris On May 14, 2010, at 8:32 AM, Jay Hannah wrote: > On May 13, 2010, at 5:46 PM, Hilmar Lapp wrote: >> Why? What is the gain from deleting branches that you don't know whether they are dead or not? > > If our branch list was clean they wouldn't dupe up when I go to merge in other people's contributions. > > You don't find large lists of probably dead things annoying? > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > > > > jhannah at cplreynoldslpt:~/src/bioperl-live$ git remote add vinanna git://github.com/vinanna/bioperl-live.gitjhannah at cplreynoldslpt:~/src/bioperl-live$ git fetch vinanna > remote: Counting objects: 18, done. > remote: Compressing objects: 100% (9/9), done. > remote: Total 10 (delta 8), reused 0 (delta 0) > Unpacking objects: 100% (10/10), done. >> From git://github.com/vinanna/bioperl-live > * [new branch] TRY_featureio_refactor -> vinanna/TRY_featureio_refactor > * [new branch] TRY_gff_refactor -> vinanna/TRY_gff_refactor > * [new branch] TRY_locatableseq_refactor -> vinanna/TRY_locatableseq_refactor > * [new branch] anydbm-branch -> vinanna/anydbm-branch > * [new branch] bioperl -> vinanna/bioperl > * [new branch] bioperl-branch-1-5-1 -> vinanna/bioperl-branch-1-5-1 > * [new branch] bioperl-live -> vinanna/bioperl-live > * [new branch] branch-06 -> vinanna/branch-06 > * [new branch] branch-07 -> vinanna/branch-07 > * [new branch] branch-07-ensembl-120 -> vinanna/branch-07-ensembl-120 > * [new branch] branch-1-0-0 -> vinanna/branch-1-0-0 > * [new branch] branch-1-2 -> vinanna/branch-1-2 > * [new branch] branch-1-2-collection -> vinanna/branch-1-2-collection > * [new branch] branch-1-4 -> vinanna/branch-1-4 > * [new branch] branch-1-5-2 -> vinanna/branch-1-5-2 > * [new branch] branch-1-6 -> vinanna/branch-1-6 > * [new branch] branch-ensembl-m1 -> vinanna/branch-ensembl-m1 > * [new branch] branch-experimental -> vinanna/branch-experimental > * [new branch] featann_rollback -> vinanna/featann_rollback > * [new branch] internal-branch-pre-delete-06-tag -> vinanna/internal-branch-pre-delete-06-tag > * [new branch] lightweight_feature_branch -> vinanna/lightweight_feature_branch > * [new branch] master -> vinanna/master > * [new branch] ontology-cache -> vinanna/ontology-cache > * [new branch] release-0-04-bug -> vinanna/release-0-04-bug > * [new branch] restriction-refactor -> vinanna/restriction-refactor > * [new branch] stable-0-05 -> vinanna/stable-0-05 > * [new branch] stable-0-05-new -> vinanna/stable-0-05-new > * [new branch] steve_chervitz -> vinanna/steve_chervitz > * [new branch] topic/bug_2515 -> vinanna/topic/bug_2515 > jhannah at cplreynoldslpt:~/src/bioperl-live$ git branch -a > * master > remotes/origin/HEAD -> origin/master > remotes/origin/TRY_featureio_refactor > remotes/origin/TRY_gff_refactor > remotes/origin/TRY_locatableseq_refactor > remotes/origin/anydbm-branch > remotes/origin/bioperl > remotes/origin/bioperl-branch-1-5-1 > remotes/origin/bioperl-live > remotes/origin/branch-06 > remotes/origin/branch-07 > remotes/origin/branch-07-ensembl-120 > remotes/origin/branch-1-0-0 > remotes/origin/branch-1-2 > remotes/origin/branch-1-2-collection > remotes/origin/branch-1-4 > remotes/origin/branch-1-5-2 > remotes/origin/branch-1-6 > remotes/origin/branch-ensembl-m1 > remotes/origin/branch-experimental > remotes/origin/featann_rollback > remotes/origin/internal-branch-pre-delete-06-tag > remotes/origin/jhannah > remotes/origin/lightweight_feature_branch > remotes/origin/master > remotes/origin/ontology-cache > remotes/origin/release-0-04-bug > remotes/origin/restriction-refactor > remotes/origin/stable-0-05 > remotes/origin/stable-0-05-new > remotes/origin/steve_chervitz > remotes/origin/topic/bug_2515 > remotes/origin/yapc10hackathon > remotes/vinanna/TRY_featureio_refactor > remotes/vinanna/TRY_gff_refactor > remotes/vinanna/TRY_locatableseq_refactor > remotes/vinanna/anydbm-branch > remotes/vinanna/bioperl > remotes/vinanna/bioperl-branch-1-5-1 > remotes/vinanna/bioperl-live > remotes/vinanna/branch-06 > remotes/vinanna/branch-07 > remotes/vinanna/branch-07-ensembl-120 > remotes/vinanna/branch-1-0-0 > remotes/vinanna/branch-1-2 > remotes/vinanna/branch-1-2-collection > remotes/vinanna/branch-1-4 > remotes/vinanna/branch-1-5-2 > remotes/vinanna/branch-1-6 > remotes/vinanna/branch-ensembl-m1 > remotes/vinanna/branch-experimental > remotes/vinanna/featann_rollback > remotes/vinanna/internal-branch-pre-delete-06-tag > remotes/vinanna/lightweight_feature_branch > remotes/vinanna/master > remotes/vinanna/ontology-cache > remotes/vinanna/release-0-04-bug > remotes/vinanna/restriction-refactor > remotes/vinanna/stable-0-05 > remotes/vinanna/stable-0-05-new > remotes/vinanna/steve_chervitz > remotes/vinanna/topic/bug_2515 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Fri May 14 13:56:48 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 14 May 2010 09:56:48 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <3B012988-D239-478D-8080-7721633A4AA5@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> Message-ID: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> On May 14, 2010, at 9:32 AM, Jay Hannah wrote: > You don't find large lists of probably dead things annoying? Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. As an analogy, Google Mail keeps all your dead email (email you delete). Forever. Not because they think most of what you delete you shouldn't have deleted, but because it costs so little, and can be so efficiently managed for the few things that you do decide to recover a year later that it's not worth for you as a user to spend any brain cycles on which emails you should physically delete and which you should only "archive". Likewise, I don't see the gain that outweighs the brain cycles and careful consideration that would have to go into deciding which branches to delete, which ones to move into an "attic", and which ones to keep around. If you don't want to see them, simply clone and wipe them away. Life can be so easy :-) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Fri May 14 14:20:22 2010 From: jay at jays.net (Jay Hannah) Date: Fri, 14 May 2010 09:20:22 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> Message-ID: <0C1AE8D4-70F5-427E-9429-B59156587E19@jays.net> On May 14, 2010, at 8:56 AM, Hilmar Lapp wrote: > On May 14, 2010, at 9:32 AM, Jay Hannah wrote: >> You don't find large lists of probably dead things annoying? > > Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. > > As an analogy, Google Mail keeps all your dead email (email you delete). Forever. OK. So our policy is that our branch list is an ever-growing pile of probably-dead things that we all ignore. A couple of them might be alive and useful at any given moment in time, but only if whoever created them is still around and cares and happens to remember what the point was. Understood. Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Fri May 14 15:34:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 10:34:41 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> Message-ID: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> On May 14, 2010, at 8:56 AM, Hilmar Lapp wrote: > > On May 14, 2010, at 9:32 AM, Jay Hannah wrote: > >> You don't find large lists of probably dead things annoying? > > > Not if they're not in the way of my executing my workflow effectively. Keeping a room super-tidy that you don't ever live in is a waste of energy. > > As an analogy, Google Mail keeps all your dead email (email you delete). Forever. Not because they think most of what you delete you shouldn't have deleted, but because it costs so little, and can be so efficiently managed for the few things that you do decide to recover a year later that it's not worth for you as a user to spend any brain cycles on which emails you should physically delete and which you should only "archive". > > Likewise, I don't see the gain that outweighs the brain cycles and careful consideration that would have to go into deciding which branches to delete, which ones to move into an "attic", and which ones to keep around. If you don't want to see them, simply clone and wipe them away. Life can be so easy :-) > > -hilmar I tend to fall in the middle here, in that it would be nice to clean out feature branches that have been merged back in and relegate all older branches to an attic. Moving branches is as easy as 'git branch -m foo attic/foo'. I'm not in favor of removing branches that haven't been merged back, unless they're deemed unnecessary by the core devs. re: removing feature branches, this is something we have talked about doing in the past on svn, but is a bit trickier at the moment as the git repo doesn't currently indicate if/when specific svn branches were merged to HEAD. We still have read-only access to our svn repo to determine that if needed. So far, though, I haven't seen much in the way of indicating what some regard as 'feature' (removable) vs 'attic' (old but retained). That discussion needs to happen on list. chris From hlapp at drycafe.net Fri May 14 16:56:54 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 14 May 2010 12:56:54 -0400 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> Message-ID: <69D4619C-F21E-4FAE-B56F-C2F3B323EFD6@drycafe.net> On May 14, 2010, at 11:34 AM, Chris Fields wrote: > it would be nice to clean out feature branches that have been merged > back in Agreed, if the case is clear. > and relegate all older branches to an attic. Moving branches is as > easy as 'git branch -m foo attic/foo'. That's easy enough too and doesn't lose anything, hence no need to spend time on making sure it might not be a mistake. > I'm not in favor of removing branches that haven't been merged > back, unless they're deemed unnecessary by the core devs. Agreed, except I would remove the conditional. I'd rather spend that time on coding ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From subodhs at iastate.edu Fri May 14 16:24:21 2010 From: subodhs at iastate.edu (Srivastava, Subodh K [AGRON]) Date: Fri, 14 May 2010 11:24:21 -0500 Subject: [Bioperl-l] running perl script Message-ID: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> hi, I am running a perl script and getting error like: Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. How to set the path for this? the other related scripts are working in same directory. I am running; perl, v5.8.8 built for x86_64-linux-thread-multi thank you subodh ************************************* G-302 Agronomy Hall Iowa State University Ames, IA -50010 From rmb32 at cornell.edu Fri May 14 18:38:10 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 11:38:10 -0700 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> Message-ID: <4BED9892.5070408@cornell.edu> At the PDX hackathon last night, I was talking about this problem with a git expert, and he gave me a little tutorial on how git thinks about and keep branches and tags. Each of these things is just a special case of a 'ref', which is just a reference to the end of some piece of the commit graph. If you run git ls-remote http://github.com/bioperl/bioperl-live.git you can see all the refs we currently have in our bioperl-live repo, which are all in either /refs/heads (which are our branches), or /refs/tags (our tags). Now, it turns out you can have arbitrary things in here in addition to heads and tags. I copied one of the old branches to /refs/archives/branch-ensembl-m1 to demonstrate this. Now, it doesn't show up in normal workflow listings, but it's not deleted. If somebody wanted to resurrect it, they could move or copy it into /refs/heads (where it would show up as as an active branch again). To copy a branch into archives/, git push origin origin/:refs/archives/ To *move* a branch into archives/ git push origin origin/:refs/archives/ \ :refs/heads/ The first part of that second part of that push has nothing on the left side of the colon, which pushes a 'null' to refs/heads/, which deletes it. You can have an arbitrary number of these kinds of commands in each push invocation. So, there's a good mechanism for archiving our old branches. Rob From pat.boutet at gmail.com Fri May 14 19:14:36 2010 From: pat.boutet at gmail.com (Patrick Boutet) Date: Fri, 14 May 2010 13:14:36 -0600 Subject: [Bioperl-l] running perl script In-Reply-To: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> Message-ID: On Fri, May 14, 2010 at 10:24 AM, Srivastava, Subodh K [AGRON] < subodhs at iastate.edu> wrote: > hi, > I am running a perl script and getting error like: > > Can't locate Bio/Perl.pm in @INC (@INC contains: > /home/subodhs/SHORE_map/SHOREmap_release_1.1 > /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl > /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl > /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at > /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. > > How to set the path for this? > the other related scripts are working in same directory. > > I am running; perl, v5.8.8 built for x86_64-linux-thread-multi > > thank you > subodh > ************************************* > G-302 > Agronomy Hall > Iowa State University > Ames, IA -50010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Now I'm still new at this but I'll try and be helpful, first where is bioperl installed? System wide or local to your home directory? Do you have root access? What type of shell are you using? Because it seems like you might have to set your shells PERL5LIB variable to check the directory where bioperl is installed. Patrick Boutet From cjfields at illinois.edu Fri May 14 19:23:31 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 14:23:31 -0500 Subject: [Bioperl-l] move ancient branches to attic In-Reply-To: <4BED9892.5070408@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> Message-ID: On May 14, 2010, at 1:38 PM, Robert Buels wrote: > At the PDX hackathon last night, I was talking about this problem with a git expert, and he gave me a little tutorial on how git thinks about and keep branches and tags. > > Each of these things is just a special case of a 'ref', which is just a reference to the end of some piece of the commit graph. If you run > > git ls-remote http://github.com/bioperl/bioperl-live.git > > you can see all the refs we currently have in our bioperl-live repo, which are all in either /refs/heads (which are our branches), or /refs/tags (our tags). > > Now, it turns out you can have arbitrary things in here in addition to heads and tags. I copied one of the old branches to /refs/archives/branch-ensembl-m1 to demonstrate this. Now, it doesn't show up in normal workflow listings, but it's not deleted. If somebody wanted to resurrect it, they could move or copy it into /refs/heads (where it would show up as as an active branch again). > > To copy a branch into archives/, > > git push origin origin/:refs/archives/ > > To *move* a branch into archives/ > > git push origin origin/:refs/archives/ \ > :refs/heads/ > > The first part of that second part of that push has nothing on the left side of the colon, which pushes a 'null' to refs/heads/, which deletes it. You can have an arbitrary number of these kinds of commands in each push invocation. > > So, there's a good mechanism for archiving our old branches. > > Rob That's a nice alternative to an attic, and less visible. On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. chris From rmb32 at cornell.edu Fri May 14 22:56:49 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 15:56:49 -0700 Subject: [Bioperl-l] BioPerl for indexing quality score files In-Reply-To: References: Message-ID: <4BEDD531.8050502@cornell.edu> Gregory Jordan wrote: > Ok, I need to shame myself with a huge "RTFM" for this one -- We still like you, Greg. Come hang out in #bioperl, where we can make fun of you properly. ;-) Rob From rmb32 at cornell.edu Fri May 14 23:01:50 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 May 2010 16:01:50 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> Message-ID: <4BEDD65E.9070702@cornell.edu> Chris Fields wrote: > On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. OK, here are all our current branches, I will go through them in order of last-modified date. 1998-12-11 bioperl 1999-02-19 release-0-04-bug 1999-04-13 bioperl-live 1999-04-13 stable-0-05 2000-01-27 branch-ensembl-m1 2000-02-07 internal-branch-pre-delete-06-tag 2000-03-22 stable-0-05-new 2001-02-19 branch-06 2001-11-14 branch-07-ensembl-120 2001-12-28 steve_chervitz 2002-01-16 branch-07 2002-10-22 branch-1-0-0 2003-07-07 branch-1-2-collection 2003-10-13 branch-1-2 2004-10-20 ontology-cache 2005-04-14 branch-1-4 2006-01-11 bioperl-branch-1-5-1 2006-08-14 branch-experimental 2007-02-14 branch-1-5-2 2007-08-28 featann_rollback 2007-11-07 lightweight_feature_branch Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. 2009-06-17 restriction-refactor Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f 2009-07-16 topic/bug_2515 proposal: keep, jhannah "working" ;-) 2009-08-13 TRY_gff_refactor proposal: delete, git claims it is merged 2009-08-13 TRY_locatableseq_refactor proposal: delete, git claims it is merged 2009-09-29 branch-1-6 keep, 1.6 maint branch i think. 2009-10-14 anydbm-branch keep, MAJ working. MAJ, maybe you should move this to topic/ ? 2010-01-31 TRY_featureio_refactor keep, but looks dead. cjfields, maybe you want to delete it? 2010-05-12 topic/bug_3077 delete, git claims it is merged. Please review, and I'll do the work if people agree. Rob From jason at bioperl.org Fri May 14 23:54:30 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 14 May 2010 16:54:30 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDD65E.9070702@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> Message-ID: <4BEDE2B6.3010307@bioperl.org> lightweight_feature_branch was my test built with a feature type that is based on arrays instead of hashes got 25+% speedup I believe - have to go back to the archives to see what I claimed was speedup... =) I think that Bio::SeqFeature::Slim might be at least one speedup by Lincoln for Gbrowse that addresses some of the speed problem, though I think it still isn't array-based for data storage. -j Robert Buels wrote, On 5/14/10 4:01 PM: > Chris Fields wrote: >> On a related note, going through, it appears the git conversion >> didn't track merges back to trunk. For instance, I know the >> featann_rollback was merged to trunk but it's not showing up. I know >> svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came >> into play), so it may be hard to actually find true merges w/o that. > > OK, here are all our current branches, I will go through them in order > of last-modified date. > > 1998-12-11 bioperl > 1999-02-19 release-0-04-bug > 1999-04-13 bioperl-live > 1999-04-13 stable-0-05 > 2000-01-27 branch-ensembl-m1 > 2000-02-07 internal-branch-pre-delete-06-tag > 2000-03-22 stable-0-05-new > 2001-02-19 branch-06 > 2001-11-14 branch-07-ensembl-120 > 2001-12-28 steve_chervitz > 2002-01-16 branch-07 > 2002-10-22 branch-1-0-0 > 2003-07-07 branch-1-2-collection > 2003-10-13 branch-1-2 > 2004-10-20 ontology-cache > 2005-04-14 branch-1-4 > 2006-01-11 bioperl-branch-1-5-1 > 2006-08-14 branch-experimental > 2007-02-14 branch-1-5-2 > 2007-08-28 featann_rollback > 2007-11-07 lightweight_feature_branch > > Proposal: move the above to refs/archive and not worry any further > about them. Maybe we can throw them out in 2020. > > 2009-06-17 restriction-refactor > > Proposal: delete, looks like it was merged in > a2cb40e6c9c7da4f776dbb72a0266f54320fa37f > > 2009-07-16 topic/bug_2515 > proposal: keep, jhannah "working" ;-) > > 2009-08-13 TRY_gff_refactor > proposal: delete, git claims it is merged > > 2009-08-13 TRY_locatableseq_refactor > proposal: delete, git claims it is merged > > 2009-09-29 branch-1-6 > keep, 1.6 maint branch i think. > > 2009-10-14 anydbm-branch > keep, MAJ working. MAJ, maybe you should move this to topic/ ? > > 2010-01-31 TRY_featureio_refactor > keep, but looks dead. cjfields, maybe you want to delete it? > > 2010-05-12 topic/bug_3077 > delete, git claims it is merged. > > Please review, and I'll do the work if people agree. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat May 15 03:41:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 22:41:18 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDD65E.9070702@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> Message-ID: <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> On May 14, 2010, at 6:01 PM, Robert Buels wrote: > Chris Fields wrote: >> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. > > OK, here are all our current branches, I will go through them in order of last-modified date. > > 1998-12-11 bioperl > 1999-02-19 release-0-04-bug > 1999-04-13 bioperl-live > 1999-04-13 stable-0-05 > 2000-01-27 branch-ensembl-m1 > 2000-02-07 internal-branch-pre-delete-06-tag > 2000-03-22 stable-0-05-new > 2001-02-19 branch-06 > 2001-11-14 branch-07-ensembl-120 > 2001-12-28 steve_chervitz > 2002-01-16 branch-07 > 2002-10-22 branch-1-0-0 > 2003-07-07 branch-1-2-collection > 2003-10-13 branch-1-2 > 2004-10-20 ontology-cache > 2005-04-14 branch-1-4 > 2006-01-11 bioperl-branch-1-5-1 > 2006-08-14 branch-experimental > 2007-02-14 branch-1-5-2 > 2007-08-28 featann_rollback > 2007-11-07 lightweight_feature_branch > > Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. Just as long as we know they are there. Rob, can you document the archive set up on the wiki so we don't forget it? I deleted the featann_rollback branch. That was a feature branch (no pun intended) to rollback overloading and a host of other changes introduced to bioperl just before the 1.5 release. It was merged a few years ago in svn. > 2009-06-17 restriction-refactor > > Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f This may have been Mark's refactoring, so yes, delete. > 2009-08-13 TRY_gff_refactor > proposal: delete, git claims it is merged > > 2009-08-13 TRY_locatableseq_refactor > proposal: delete, git claims it is merged I deleted these. The primary goal of TRY_gff_refactor was to work in GFF3 work, but that may rely on FeatureIO so will have to be done in stages. At some point, if we do a larger scale refactoring of GFF for GFF3 compat we can make another branch. TRY_locatableseq_refactor will be obsoleted once GSoC starts. > 2009-09-29 branch-1-6 > keep, 1.6 maint branch i think. Yes. I will probably work on another set of merges from to 1.6 soon to bring it up to speed, maybe for one last 1.6 release. > 2009-10-14 anydbm-branch > keep, MAJ working. MAJ, maybe you should move this to topic/ ? > > 2010-01-31 TRY_featureio_refactor > keep, but looks dead. cjfields, maybe you want to delete it? Yes. I've deleted this, as FeatureIO is on it's own. > 2010-05-12 topic/bug_3077 > delete, git claims it is merged. That's already deleted. Maybe needs to be pruned locally? > Please review, and I'll do the work if people agree. > > Rob Good start! chris From cjfields at illinois.edu Sat May 15 03:45:07 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 May 2010 22:45:07 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BEDE2B6.3010307@bioperl.org> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <4BEDE2B6.3010307@bioperl.org> Message-ID: <34DFCB4E-2048-4A62-AE9C-06CBF900D38A@illinois.edu> This was moved into bioperl-dev at some point: http://github.com/bioperl/bioperl-dev/tree/master/Bio/SeqFeature/ Might be obsolete as well. chris On May 14, 2010, at 6:54 PM, Jason Stajich wrote: > lightweight_feature_branch was my test built with a feature type that is based on arrays instead of hashes got 25+% speedup I believe - have to go back to the archives to see what I claimed was speedup... =) > > I think that Bio::SeqFeature::Slim might be at least one speedup by Lincoln for Gbrowse that addresses some of the speed problem, though I think it still isn't array-based for data storage. > > -j > > Robert Buels wrote, On 5/14/10 4:01 PM: >> Chris Fields wrote: >>> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. >> >> OK, here are all our current branches, I will go through them in order of last-modified date. >> >> 1998-12-11 bioperl >> 1999-02-19 release-0-04-bug >> 1999-04-13 bioperl-live >> 1999-04-13 stable-0-05 >> 2000-01-27 branch-ensembl-m1 >> 2000-02-07 internal-branch-pre-delete-06-tag >> 2000-03-22 stable-0-05-new >> 2001-02-19 branch-06 >> 2001-11-14 branch-07-ensembl-120 >> 2001-12-28 steve_chervitz >> 2002-01-16 branch-07 >> 2002-10-22 branch-1-0-0 >> 2003-07-07 branch-1-2-collection >> 2003-10-13 branch-1-2 >> 2004-10-20 ontology-cache >> 2005-04-14 branch-1-4 >> 2006-01-11 bioperl-branch-1-5-1 >> 2006-08-14 branch-experimental >> 2007-02-14 branch-1-5-2 >> 2007-08-28 featann_rollback >> 2007-11-07 lightweight_feature_branch >> >> Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. >> >> 2009-06-17 restriction-refactor >> >> Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f >> >> 2009-07-16 topic/bug_2515 >> proposal: keep, jhannah "working" ;-) >> >> 2009-08-13 TRY_gff_refactor >> proposal: delete, git claims it is merged >> >> 2009-08-13 TRY_locatableseq_refactor >> proposal: delete, git claims it is merged >> >> 2009-09-29 branch-1-6 >> keep, 1.6 maint branch i think. >> >> 2009-10-14 anydbm-branch >> keep, MAJ working. MAJ, maybe you should move this to topic/ ? >> >> 2010-01-31 TRY_featureio_refactor >> keep, but looks dead. cjfields, maybe you want to delete it? >> >> 2010-05-12 topic/bug_3077 >> delete, git claims it is merged. >> >> Please review, and I'll do the work if people agree. >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Sat May 15 14:27:48 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 09:27:48 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) Message-ID: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> I wrote some tests and merged and deleted branch topic/bug_2515. Bio::SeqIO::gbxml is now in master. Thanks to Ryan Golhar for the contribution! Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah bioperl-live$ perl -I. t/SeqIO/gbxml.t 1..14 ok 1 - use Bio::SeqIO::gbxml; ok 2 - The object isa Bio::SeqIO ok 3 - molecule ok 4 - alphabet ok 5 - primary_id ok 6 - display_id ok 7 - version ok 8 - is_circular ok 9 - description ok 10 - sequence ok 11 - classification ok 12 - feat - clone_lib ok 13 - feat - db_xref ok 14 - feat - lab_host From jay at jays.net Sat May 15 14:57:54 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 09:57:54 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> Message-ID: On May 15, 2010, at 9:34 AM, Chris Fields wrote: > Can you add something to the Changes file for this? You can make a new section for bug fixes or new features at the top, and we can worry about versions later. > > I'll add in the recent bug fix I made as well. Pushed. Feel free to discard any of that you don't like. HTH, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Sat May 15 15:46:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 May 2010 10:46:16 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> Message-ID: <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> Thanks Jay. I'll add a bit in myself for bug 3077. Not sure if we'll pursue another point release yet, but it would be nice to get changes out prior to any major structural reorganization. chris On May 15, 2010, at 9:57 AM, Jay Hannah wrote: > On May 15, 2010, at 9:34 AM, Chris Fields wrote: >> Can you add something to the Changes file for this? You can make a new section for bug fixes or new features at the top, and we can worry about versions later. >> >> I'll add in the recent bug fix I made as well. > > Pushed. Feel free to discard any of that you don't like. HTH, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > From jay at jays.net Sat May 15 18:08:35 2010 From: jay at jays.net (Jay Hannah) Date: Sat, 15 May 2010 13:08:35 -0500 Subject: [Bioperl-l] Bio::SeqIO::gbxml (Genbank XML parser) In-Reply-To: <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> References: <60F32FC7-C238-4BC0-83AE-09BE94D74AD8@jays.net> <7A6671F0-3447-4AF9-9594-8F513521F7E9@illinois.edu> <2D78C32C-B46C-46E1-A096-3CDF4EC9EAE8@illinois.edu> Message-ID: On May 15, 2010, at 10:46 AM, Chris Fields wrote: > Thanks Jay. I'll add a bit in myself for bug 3077. Not sure if we'll pursue another point release yet, but it would be nice to get changes out prior to any major structural reorganization. Is there a list whose completion will mark the push of 1.6.2 to CPAN? The Changes file says this now: Bugs to be addressed: http://bugzilla.open-bio.org specific bugs intended for the next CPAN release series highlighted in BUGS But I don't understand what 'highlighted in BUGS' means. I also don't know what a 'point release' is. :) Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From David.Messina at sbc.su.se Sat May 15 19:34:58 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 May 2010 21:34:58 +0200 Subject: [Bioperl-l] parsing blast report with long description In-Reply-To: References: Message-ID: Shalabh, Could you please file a bug report on this at bugzilla.open-bio.org? Please include a description (pasting this email will do) and most importantly a test script and sample blast output file which reproduces the problem. We will need those in order to be able to diagnose and fix the problem. Thanks! Dave On May 13, 2010, at 5:07 PM, shalabh sharma wrote: > Hi All, > I need some help in parsing blast output. > I have a inhouse database that contain sequences with really long > description. > >> SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open > Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - > 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV > > So my blast report looks like this: > > ..... > ..... >> SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 > 6887/Open Ocean/Galapagos Islands/134 miles NE of > Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 > m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > Length = 213 > > Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix > adjust. > Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) > ..... > ..... > > (note that the tag "TI_1000008216887" is splitting in two lines). > > I am using SeqIO to parse this report. What i am doing is parsing the > description field again to get all the tags. like > .... > .... > my $desc = $hit->description; > my @f = split('/',$desc); > for(my $i = 0;$i < scalar > @f;$i++){ print OUT "$f[$i]\t";} > ..... > ..... > > > *I am getting the perfect parsed report but the field with TI_1000008216887 > has a space **TI_100000821 6887 *. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun May 16 15:14:25 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 10:14:25 -0500 Subject: [Bioperl-l] GenomeeTools Message-ID: Anyone used GenomeTools? I'm thinking of setting up some C bindings to it. It has a C-based GFF3 parser, among other goodies. http://genometools.org/index.html chris From cjfields at illinois.edu Sun May 16 16:16:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 11:16:11 -0500 Subject: [Bioperl-l] Bio-FeatureIO Message-ID: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> All, Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. chris From jay at jays.net Sun May 16 17:32:57 2010 From: jay at jays.net (Jay Hannah) Date: Sun, 16 May 2010 12:32:57 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 11:16 AM, Chris Fields wrote: > Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. I'm curious about how this works in terms of git storage. Does this mean that the separate Bio-FeatureIO repo will have the entire history of BioPerl inside it? (Making git clones of Bio-FeatureIO 189MB?) In the recent past I have attempted pulling certain files across git repos before, and ended up with the full history of repo1 inside repo2. I'm unclear if this is just how life is, or if I did it wrong. You could, of course, always just cp text files in, but then you lose the history of those files. Is there some way to get all the history of a handful of files from massive repo1 into tiny repo2 without making repo1 massive? I don't know if any of these considerations are important for the eventual de-monolithification of BioPerl, I was just generally curious. git does that to me. :) Thanks, Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Sun May 16 18:18:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 May 2010 13:18:24 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 12:32 PM, Jay Hannah wrote: > On May 16, 2010, at 11:16 AM, Chris Fields wrote: >> Just a heads-up. I recently (Jan 2010) moved Bio::FeatureIO to it's own repository for refactoring. However, the original Bio::FeatureIO code is still within bioperl-live and the 1.6 release branch, and the branch where these were removed was no longer clean, so I removed it. I'm in the process of syncing the 1.6 branch with master soon (where it will remain unmodified), and then will remove Bio-FeatureIO code from master and pull it into the master branch of the separate Bio-FeatureIO repo, as the current (significantly refactored) code is only partly refactored and needs more work and integration. > > I'm curious about how this works in terms of git storage. > > Does this mean that the separate Bio-FeatureIO repo will have the entire history of BioPerl inside it? (Making git clones of Bio-FeatureIO 189MB?) > > In the recent past I have attempted pulling certain files across git repos before, and ended up with the full history of repo1 inside repo2. I'm unclear if this is just how life is, or if I did it wrong. > > You could, of course, always just cp text files in, but then you lose the history of those files. > > Is there some way to get all the history of a handful of files from massive repo1 into tiny repo2 without making repo1 massive? > > I don't know if any of these considerations are important for the eventual de-monolithification of BioPerl, I was just generally curious. git does that to me. :) > > Thanks, > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I'm just planning on having something to the effect of 'Bio-FeatureIO is a set of modules developed by author X that once was part of bioperl-live, but was removed at point XYZ to significantly refactor the code,' then point back to bioperl-live if anyone is interested in software archaeology. Not sure we would need to go beyond that. chris From jay at jays.net Sun May 16 18:47:42 2010 From: jay at jays.net (Jay Hannah) Date: Sun, 16 May 2010 13:47:42 -0500 Subject: [Bioperl-l] Bio-FeatureIO In-Reply-To: References: <2F7E4275-0306-4B7D-A6FB-90FD3ECE0179@illinois.edu> Message-ID: On May 16, 2010, at 1:18 PM, Chris Fields wrote: > I'm just planning on having something to the effect of 'Bio-FeatureIO is a set of modules developed by author X that once was part of bioperl-live, but was removed at point XYZ to significantly refactor the code,' then point back to bioperl-live if anyone is interested in software archaeology. Not sure we would need to go beyond that. Gotcha. That certainly solves the problem. :) So maybe in 2020 we'll be pushing 30 independent github repos to PAUSE all citing the bioperl-live repo for historical digging prior to their emancipation. To jhannah in the year 2020: You are NOT too old for dirt bikes. Keep riding! :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From fs5 at sanger.ac.uk Mon May 17 08:38:18 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 17 May 2010 09:38:18 +0100 Subject: [Bioperl-l] parsing blast report with long description In-Reply-To: References: Message-ID: <1274085498.5288.30.camel@deskpro15336.dynamic.sanger.ac.uk> I think you should try to avoid those long IDs anyway, especially because you have spaces in there too and this may cause problems further down the line as many programs will use a pattern like />(\S+)/ as the identifier. I would build a small database for your files and use unique database identifiers in your FASTA files. That will make it easier in the future to collect, for example, all sequences from a certain region etc. If you want to avoid that you could have two file: one FASTA files using numbers as IDs and a file where you map those numbers to sample descriptions, i.e. a simple flat-file database. Frank On Thu, 2010-05-13 at 11:07 -0400, shalabh sharma wrote: > Hi All, > I need some help in parsing blast output. > I have a inhouse database that contain sequences with really long > description. > > >SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open > Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 - > 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV > > So my blast report looks like this: > > ..... > ..... > >SMPL_IDI_1105131728043 > /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821 > 6887/Open Ocean/Galapagos Islands/134 miles NE of > Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2 > m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04 > Length = 213 > > Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix > adjust. > Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%) > ..... > ..... > > (note that the tag "TI_1000008216887" is splitting in two lines). > > I am using SeqIO to parse this report. What i am doing is parsing the > description field again to get all the tags. like > .... > .... > my $desc = $hit->description; > my @f = split('/',$desc); > for(my $i = 0;$i < scalar > @f;$i++){ print OUT "$f[$i]\t";} > ..... > ..... > > > *I am getting the perfect parsed report but the field with TI_1000008216887 > has a space **TI_100000821 6887 *. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From fs5 at sanger.ac.uk Mon May 17 08:41:51 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 17 May 2010 09:41:51 +0100 Subject: [Bioperl-l] running perl script In-Reply-To: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> Message-ID: <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> why are you requiring "Bio::Perl"? You would normally use somethink specific in the BioPerl bundle, like Bio::Seq or whatever. Can you show some of your script? Frank On Fri, 2010-05-14 at 11:24 -0500, Srivastava, Subodh K [AGRON] wrote: > hi, > I am running a perl script and getting error like: > > Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. > > How to set the path for this? > the other related scripts are working in same directory. > > I am running; perl, v5.8.8 built for x86_64-linux-thread-multi > > thank you > subodh > ************************************* > G-302 > Agronomy Hall > Iowa State University > Ames, IA -50010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Mon May 17 12:26:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 07:26:20 -0500 Subject: [Bioperl-l] running perl script In-Reply-To: <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> References: <928625966EB3BC439F7BEF6B34EBAB1B12BEAB2F23@EXITS713.its.iastate.edu> <1274085711.5288.33.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <63D0BEDA-27F7-48AB-ABE8-1F39B09B349A@illinois.edu> Frank, Bio::Perl is the generic user module for very simple tasks. See here: http://github.com/bioperl/bioperl-live/blob/master/Bio/Perl.pm Subodh, you need to make sure the modules are in your perl library path. See the following link, under 'INSTALLING BIOPERL IN A PERSONAL MODULE AREA': http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix chris On May 17, 2010, at 3:41 AM, Frank Schwach wrote: > why are you requiring "Bio::Perl"? You would normally use somethink > specific in the BioPerl bundle, like Bio::Seq or whatever. Can you show > some of your script? > Frank > > > On Fri, 2010-05-14 at 11:24 -0500, Srivastava, Subodh K [AGRON] wrote: >> hi, >> I am running a perl script and getting error like: >> >> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/subodhs/SHORE_map/SHOREmap_release_1.1 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /home/subodhs/SHORE_map/SHOREmap_release_1.1/GeneSNPlist.pm line 12. >> >> How to set the path for this? >> the other related scripts are working in same directory. >> >> I am running; perl, v5.8.8 built for x86_64-linux-thread-multi >> >> thank you >> subodh >> ************************************* >> G-302 >> Agronomy Hall >> Iowa State University >> Ames, IA -50010 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ross at cuhk.edu.hk Mon May 17 12:42:35 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 17 May 2010 20:42:35 +0800 Subject: [Bioperl-l] extracting genbank content Message-ID: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Dear all, When there are more than one genbank records in a file, except by splitting the file into separate records, what can I do to transverse the records? $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); $seqobj=$obj->next_seq(); Do I just use another $obj->next_seq() so it will point to another record? Thanks for your advice. From amackey at virginia.edu Mon May 17 13:51:31 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Mon, 17 May 2010 09:51:31 -0400 Subject: [Bioperl-l] Bio::Coordinate::GeneMapper cds to peptide bug In-Reply-To: References: <226EE9C6-C233-43BF-9593-0E39262C3568@illinois.edu> Message-ID: On Thu, May 13, 2010 at 2:20 AM, Heikki Lehvaslaiho < heikki.lehvaslaiho at gmail.com> wrote: > > As of getting values outseide the defined region, that is a feature rather > than a bug. The idea was to be able to ask what would the new coordinate be > if the feature extended beyond the known limits. The is the capability of > Bio::Coordinate::ExtrapolatingPair that is used here. That class also has a > method strict that can be used to prevent extrapolating, but the code to > access that has not been written into GeneMapper. I'll see if I can get it > to work. > > I had this same thought/expectation, but that in fact is not what's going on. There is no place in the GeneMapper code where the CDS end coordinate is being used, only the begin coordinate. The implicit assumption is that the CDS ends at the last exon. >From the perspective of the translate/revtranslate methods, an extrapolating pair does not make sense (at least to me) -- just as a CDS coordinate is undefined within an intron, so too would I expect a CDS coordinate to be undefined in an UTR or intragenic region. Alternatively, it would be nice (in general) to be able to check whether the provided mapping is an extrapolation or not. -Aaron From David.Messina at sbc.su.se Mon May 17 13:56:35 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 17 May 2010 15:56:35 +0200 Subject: [Bioperl-l] extracting genbank content In-Reply-To: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: Hi Ross, > Do I just use another $obj->next_seq() so it will point to another record? Yes. The common approach is to use a while loop: my $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); while(my $seqobj = $obj->next_seq) { # do stuff with $seqobj } For more details, see the SeqIO HOWTO: http://www.bioperl.org/wiki/HOWTO:SeqIO Dave From cjfields at illinois.edu Mon May 17 16:36:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 11:36:37 -0500 Subject: [Bioperl-l] extracting genbank content In-Reply-To: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: <9952EA98-248E-41B8-9816-A3A01EC6ADFE@illinois.edu> Depends on what you need to do. If you are just interested in pulling out certain bits of data from each record, using SeqIO is a good option. But if you want to access the records as a flat database (not iteration, but indexed for fast access), use Bio::Index::GenBank or Bio::DB::Flat to make a simple flat file database and access them by ID. chris On May 17, 2010, at 7:42 AM, Ross KK Leung wrote: > Dear all, > > > > When there are more than one genbank records in a file, except by splitting > the file into separate records, what can I do to transverse the records? > > > > $obj=Bio::SeqIO->new(-file=>$gbfile,-format=>"genbank"); > > > $seqobj=$obj->next_seq(); > > > > Do I just use another $obj->next_seq() so it will point to another record? > > > > Thanks for your advice. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Mon May 17 16:50:21 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 May 2010 09:50:21 -0700 Subject: [Bioperl-l] GenomeeTools In-Reply-To: References: Message-ID: <4BF173CD.8020600@cornell.edu> I haven't used GenomeTools but I've used GenomeThreader, one of Gordon's other tools. Rob Chris Fields wrote: > Anyone used GenomeTools? I'm thinking of setting up some C bindings to it. It has a C-based GFF3 parser, among other goodies. > > http://genometools.org/index.html > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Tue May 18 00:15:13 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 May 2010 17:15:13 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> Message-ID: <4BF1DC11.6030402@cornell.edu> OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches Rob Chris Fields wrote: > On May 14, 2010, at 6:01 PM, Robert Buels wrote: > >> Chris Fields wrote: >>> On a related note, going through, it appears the git conversion didn't track merges back to trunk. For instance, I know the featann_rollback was merged to trunk but it's not showing up. I know svn had poor merge-tracking prior to 1.5 (where svn:mergeinfo came into play), so it may be hard to actually find true merges w/o that. >> OK, here are all our current branches, I will go through them in order of last-modified date. >> >> 1998-12-11 bioperl >> 1999-02-19 release-0-04-bug >> 1999-04-13 bioperl-live >> 1999-04-13 stable-0-05 >> 2000-01-27 branch-ensembl-m1 >> 2000-02-07 internal-branch-pre-delete-06-tag >> 2000-03-22 stable-0-05-new >> 2001-02-19 branch-06 >> 2001-11-14 branch-07-ensembl-120 >> 2001-12-28 steve_chervitz >> 2002-01-16 branch-07 >> 2002-10-22 branch-1-0-0 >> 2003-07-07 branch-1-2-collection >> 2003-10-13 branch-1-2 >> 2004-10-20 ontology-cache >> 2005-04-14 branch-1-4 >> 2006-01-11 bioperl-branch-1-5-1 >> 2006-08-14 branch-experimental >> 2007-02-14 branch-1-5-2 >> 2007-08-28 featann_rollback >> 2007-11-07 lightweight_feature_branch >> >> Proposal: move the above to refs/archive and not worry any further about them. Maybe we can throw them out in 2020. > > Just as long as we know they are there. Rob, can you document the archive set up on the wiki so we don't forget it? > > I deleted the featann_rollback branch. That was a feature branch (no pun intended) to rollback overloading and a host of other changes introduced to bioperl just before the 1.5 release. It was merged a few years ago in svn. > >> 2009-06-17 restriction-refactor >> >> Proposal: delete, looks like it was merged in a2cb40e6c9c7da4f776dbb72a0266f54320fa37f > > This may have been Mark's refactoring, so yes, delete. > >> 2009-08-13 TRY_gff_refactor >> proposal: delete, git claims it is merged >> >> 2009-08-13 TRY_locatableseq_refactor >> proposal: delete, git claims it is merged > > I deleted these. The primary goal of TRY_gff_refactor was to work in GFF3 work, but that may rely on FeatureIO so will have to be done in stages. At some point, if we do a larger scale refactoring of GFF for GFF3 compat we can make another branch. TRY_locatableseq_refactor will be obsoleted once GSoC starts. > >> 2009-09-29 branch-1-6 >> keep, 1.6 maint branch i think. > > Yes. I will probably work on another set of merges from to 1.6 soon to bring it up to speed, maybe for one last 1.6 release. > >> 2009-10-14 anydbm-branch >> keep, MAJ working. MAJ, maybe you should move this to topic/ ? >> >> 2010-01-31 TRY_featureio_refactor >> keep, but looks dead. cjfields, maybe you want to delete it? > > Yes. I've deleted this, as FeatureIO is on it's own. > >> 2010-05-12 topic/bug_3077 >> delete, git claims it is merged. > > That's already deleted. Maybe needs to be pruned locally? > >> Please review, and I'll do the work if people agree. >> >> Rob > > Good start! > > chris > > From jay at jays.net Tue May 18 00:35:33 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 17 May 2010 19:35:33 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BF1DC11.6030402@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> Message-ID: <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> On May 17, 2010, at 7:15 PM, Robert Buels wrote: > OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches Thank you!! git pull --prune and suddenly I feel clean again! :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From amackey at virginia.edu Tue May 18 00:42:17 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Mon, 17 May 2010 20:42:17 -0400 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: <20100518001029.CD8644229D@smtp1.rs.github.com> References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: I probably missed some prior discussion of this, but any chance that the new commit messages can actually include the (unified, possibly truncated-for-length) diff of the changes? My own 2 cents is that community-wide visual skims of the diffs provide a valuable spot-check for typo's and other think-o's. Plus it gives me an indication of how major the change was. A corollary -- might there be an RSS feed by which I could subscribe to such diffs, rather than get emails about them? Since the emails are sent from "noreply", I already have to step out of the normal email flow to respond to a diff, might as well go whole hog and remove them from my email consciousness entirely, and place them with the other various information streams in my RSS reader. Thanks, -Aaron On Mon, May 17, 2010 at 8:10 PM, wrote: > Branch: refs/archives/heads/branch-1-0-0 > Home: http://github.com/bioperl/bioperl-live > > Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 > > http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 > Author: sac > Date: 2002-10-22 (Tue, 22 Oct 2002) > > Changed paths: > M Bio/SearchIO/Writer/HitTableWriter.pm > > Log Message: > ----------- > Added frame to the column map. > > svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > From jay at jays.net Tue May 18 01:10:56 2010 From: jay at jays.net (Jay Hannah) Date: Mon, 17 May 2010 20:10:56 -0500 Subject: [Bioperl-l] 319a6e: Added frame to the column map. In-Reply-To: References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > I probably missed some prior discussion of this, but any chance that the new > commit messages can actually include the (unified, possibly > truncated-for-length) diff of the changes? I'm 5 years behind the cool-kids curve on this stuff. :) I just discovered SVN::Notify for $work[0]. By default it kicks out really pretty color HTML diffs of every change. I assume there's an equivalent for git? You could always click to github. It's color HTML diffs are very pretty. That commit for example: http://github.com/bioperl/bioperl-live/commit/319a6e Plus all the other github shiny -- comment specific lines of the commit, or the commit itself, etc. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From cjfields at illinois.edu Tue May 18 01:35:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 May 2010 20:35:21 -0500 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: References: <20100518001029.CD8644229D@smtp1.rs.github.com> Message-ID: <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> Aaron, We can do either, though setting up diffs will take a bit more work (will have to set up a post-receive URL to a CGI script to process this). RSS is quite a bit easier: http://github.com/bioperl/bioperl-live/commits/master.atom Replace 'bioperl-live' with any of the other repos for repo-specific RSS commits. The links go to the commits where you can also make in-line notes/comments by clicking in the diff code, or simple comments at the bottom. Those comments are then passed on to bioperl-guts-l for everyone to see. Example here: http://github.com/bioperl/bioperl-live/commit/c86c048c96786f8517ae1ad1fc5e5823eecf52c3 and the relevant bioperl-guts-l posts: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031259.html http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031260.html chris On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > I probably missed some prior discussion of this, but any chance that the new > commit messages can actually include the (unified, possibly > truncated-for-length) diff of the changes? > > My own 2 cents is that community-wide visual skims of the diffs provide a > valuable spot-check for typo's and other think-o's. Plus it gives me an > indication of how major the change was. > > A corollary -- might there be an RSS feed by which I could subscribe to such > diffs, rather than get emails about them? Since the emails are sent from > "noreply", I already have to step out of the normal email flow to respond to > a diff, might as well go whole hog and remove them from my email > consciousness entirely, and place them with the other various information > streams in my RSS reader. > > Thanks, > > -Aaron > > On Mon, May 17, 2010 at 8:10 PM, wrote: > >> Branch: refs/archives/heads/branch-1-0-0 >> Home: http://github.com/bioperl/bioperl-live >> >> Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 >> >> http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 >> Author: sac >> Date: 2002-10-22 (Tue, 22 Oct 2002) >> >> Changed paths: >> M Bio/SearchIO/Writer/HitTableWriter.pm >> >> Log Message: >> ----------- >> Added frame to the column map. >> >> svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 >> >> >> _______________________________________________ >> Bioperl-guts-l mailing list >> Bioperl-guts-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Tue May 18 07:16:52 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 00:16:52 -0700 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> References: <4BEC79A0.5000505@cornell.edu> <47CCA579-A128-4040-AFDC-8817F266DD7A@drycafe.net> <3B012988-D239-478D-8080-7721633A4AA5@jays.net> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> Message-ID: <4BF23EE4.6020704@cornell.edu> We may want to do the same for our tags as well. Our github download page is fairly disastrous. See: http://github.com/bioperl/bioperl-live/downloads It's not clear that a similar date-cutoff policy would work for tags. Pretty much all of these things were before my time, I don't know what most of them are. Does someone with more history than me have some thoughts as to what should stay on that download page? The rest of the tags could be archived. Rob Jay Hannah wrote: > On May 17, 2010, at 7:15 PM, Robert Buels wrote: >> OK, implemented as proposed. How-to for resurrecting archived branches is on the wiki at http://www.bioperl.org/wiki/Using_Git#Archived_Branches > > Thank you!! git pull --prune and suddenly I feel clean again! :) > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > From bpcwhite at gmail.com Tue May 18 09:49:29 2010 From: bpcwhite at gmail.com (Bryan White) Date: Tue, 18 May 2010 02:49:29 -0700 (PDT) Subject: [Bioperl-l] distance Message-ID: Hello, I am trying to create a simple program to show me the distance between taxa on a given tree. However, I am having trouble getting the bioperl code to work. Here is the code that I am using: -------- #! /usr/bin/perl use strict; use warnings; use Bio::Tree::Draw::Cladogram; use Bio::TreeIO; #use Bio::TreeFunctionsI; my $node1 = 'homo_sapiens'; my $node2 = 'murinae'; my $input = new Bio::TreeIO('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree = $input->next_tree; my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); my $distance = $tree->distance(-nodes => \@nodes); #print $distance; -------- And here is the error message I receive: ------------- EXCEPTION ------------- MSG: Must provide 2 nodes STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ Bio/Tree/TreeFunctionsI.pm:811 STACK toplevel ./phylo.pl:19 ------------------------------------- It seems that the nodes are not being read into the @nodes variable. Any help in figuring this out would be appreciated. Thanks, Bryan From biopython at maubp.freeserve.co.uk Tue May 18 10:07:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 May 2010 11:07:15 +0100 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <4BF23EE4.6020704@cornell.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> Message-ID: On Tue, May 18, 2010 at 8:16 AM, Robert Buels wrote: > We may want to do the same for our tags as well. ?Our github download page > is fairly disastrous. ?See: > > http://github.com/bioperl/bioperl-live/downloads > > It's not clear that a similar date-cutoff policy would work for tags. Pretty > much all of these things were before my time, I don't know what most of them > are. > > Does someone with more history than me have some thoughts as to what should > stay on that download page? ?The rest of the tags could be archived. > > Rob Or just turn off the download feature in github. When you prepare a BioPerl release does it contain anything else not found in the repository (e.g. compiled documentation)? We have this for Biopython (compiled PDF and HTML docs) so we prefer to direct casual release downloads via the website not via the tag on github to ensure they get these extra files in the archive. Peter From adsj at novozymes.com Tue May 18 10:21:25 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Tue, 18 May 2010 12:21:25 +0200 Subject: [Bioperl-l] distance References: Message-ID: <87k4r11pei.fsf@topper.koldfront.dk> On Tue, 18 May 2010 02:49:29 -0700 (PDT), Bryan wrote: > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); I think you may have misunderstood the documentation of find_node(). You are supposed to give the fieldname after the dash, so what you want is: my @nodes = $tree->find_node(-id => 'Homo_sapiens','Murinae'); - if the field you want to match on is 'id'. Also, I don't think you can get find_node() to do 'OR'-searches , so you'll need to do something like this: = = = #!/usr/bin/perl use strict; use warnings; use Bio::TreeIO; my $input=Bio::TreeIO->new('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree=$input->next_tree; my ($node1)=$tree->find_node(-id=>'Homo_sapiens'); # this (arbitrarily) picks the first match my ($node2)=$tree->find_node(-id=>'Murinae'); # -"- my $distance=$tree->distance(-nodes=>[$node1, $node2]); print "$distance\n"; = = = It is much easier to help if you give an example of the input as well as the script. I constructed this stand-in for your newick file to test on: (Homo_sapiens:1.1,B:2.2,(C:3.3,Murinae:4.4):5.5); Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From David.Messina at sbc.su.se Tue May 18 10:50:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 12:50:52 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> Message-ID: <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> On May 18, 2010, at 12:07, Peter wrote: > Or just turn off the download feature in github. That might be the best solution, at least for now. The download page is somewhat unfriendly anyway ? the tag names are truncated, there's no way to sort, and the descriptions are, well, not so descriptive (they appear to be just the last commit message). Probably better to keep http://www.bioperl.org/wiki/Getting_BioPerl as our main distribution point for downloads. Dave From jun.yin at ucd.ie Tue May 18 11:15:14 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 18 May 2010 12:15:14 +0100 Subject: [Bioperl-l] distance In-Reply-To: <87k4r11pei.fsf@topper.koldfront.dk> References: <87k4r11pei.fsf@topper.koldfront.dk> Message-ID: <002d01caf67b$637c20d0$2a746270$%yin@ucd.ie> Hi, Bryan, Use Adam's code. The last sentence of my code was wrong. I made a wrong reference... Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Adam "Sj?gren" Sent: Tuesday, May 18, 2010 11:21 AM To: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] distance On Tue, 18 May 2010 02:49:29 -0700 (PDT), Bryan wrote: > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); I think you may have misunderstood the documentation of find_node(). You are supposed to give the fieldname after the dash, so what you want is: my @nodes = $tree->find_node(-id => 'Homo_sapiens','Murinae'); - if the field you want to match on is 'id'. Also, I don't think you can get find_node() to do 'OR'-searches , so you'll need to do something like this: = = = #!/usr/bin/perl use strict; use warnings; use Bio::TreeIO; my $input=Bio::TreeIO->new('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree=$input->next_tree; my ($node1)=$tree->find_node(-id=>'Homo_sapiens'); # this (arbitrarily) picks the first match my ($node2)=$tree->find_node(-id=>'Murinae'); # -"- my $distance=$tree->distance(-nodes=>[$node1, $node2]); print "$distance\n"; = = = It is much easier to help if you give an example of the input as well as the script. I constructed this stand-in for your newick file to test on: (Homo_sapiens:1.1,B:2.2,(C:3.3,Murinae:4.4):5.5); Best regards, Adam -- Adam Sj?gren adsj at novozymes.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From amackey at virginia.edu Tue May 18 11:26:17 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 18 May 2010 07:26:17 -0400 Subject: [Bioperl-l] [Bioperl-guts-l] [bioperl/bioperl-live] 319a6e: Added frame to the column map. In-Reply-To: <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> References: <20100518001029.CD8644229D@smtp1.rs.github.com> <2024A1D4-BE3F-42D1-97D2-0F84421DDBB2@illinois.edu> Message-ID: Thanks for the info, and the thoroughness of your explanation! -Aaron On Mon, May 17, 2010 at 9:35 PM, Chris Fields wrote: > Aaron, > > We can do either, though setting up diffs will take a bit more work (will > have to set up a post-receive URL to a CGI script to process this). > > RSS is quite a bit easier: > > http://github.com/bioperl/bioperl-live/commits/master.atom > > Replace 'bioperl-live' with any of the other repos for repo-specific RSS > commits. The links go to the commits where you can also make in-line > notes/comments by clicking in the diff code, or simple comments at the > bottom. Those comments are then passed on to bioperl-guts-l for everyone to > see. Example here: > > > http://github.com/bioperl/bioperl-live/commit/c86c048c96786f8517ae1ad1fc5e5823eecf52c3 > > and the relevant bioperl-guts-l posts: > > http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031259.html > http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031260.html > > chris > > On May 17, 2010, at 7:42 PM, Aaron Mackey wrote: > > > I probably missed some prior discussion of this, but any chance that the > new > > commit messages can actually include the (unified, possibly > > truncated-for-length) diff of the changes? > > > > My own 2 cents is that community-wide visual skims of the diffs provide a > > valuable spot-check for typo's and other think-o's. Plus it gives me an > > indication of how major the change was. > > > > A corollary -- might there be an RSS feed by which I could subscribe to > such > > diffs, rather than get emails about them? Since the emails are sent from > > "noreply", I already have to step out of the normal email flow to respond > to > > a diff, might as well go whole hog and remove them from my email > > consciousness entirely, and place them with the other various information > > streams in my RSS reader. > > > > Thanks, > > > > -Aaron > > > > On Mon, May 17, 2010 at 8:10 PM, wrote: > > > >> Branch: refs/archives/heads/branch-1-0-0 > >> Home: http://github.com/bioperl/bioperl-live > >> > >> Commit: 319a6e90f1428dafb66994878d86b5d213bc9bf8 > >> > >> > http://github.com/bioperl/bioperl-live/commit/319a6e90f1428dafb66994878d86b5d213bc9bf8 > >> Author: sac > >> Date: 2002-10-22 (Tue, 22 Oct 2002) > >> > >> Changed paths: > >> M Bio/SearchIO/Writer/HitTableWriter.pm > >> > >> Log Message: > >> ----------- > >> Added frame to the column map. > >> > >> svn path=/bioperl-live/branches/branch-1-0-0/; revision=4944 > >> > >> > >> _______________________________________________ > >> Bioperl-guts-l mailing list > >> Bioperl-guts-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jun.yin at ucd.ie Tue May 18 11:07:43 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 18 May 2010 12:07:43 +0100 Subject: [Bioperl-l] distance In-Reply-To: References: Message-ID: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> Hi, Bryan, In your code: my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); First, You should specify the fieldname. The "fieldname" itself doesnot seem like a valid key. The default field name is "id". Second, the find_node method can only search for one specific term at one time. Third, distance method can only work on two nodes. So try this: my @nodes_human = $tree->find_node(-id => 'Homo_sapiens'); my @nodes_murinae=$tree->find_node(-id=>'Murinae'); my $distance = $tree->distance(-nodes => \($nodes_human[0],$nodes_murinae[0])); #Providing you only have one match for "Homo_sapiens" and " Murinae". Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bryan White Sent: Tuesday, May 18, 2010 10:49 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] distance Hello, I am trying to create a simple program to show me the distance between taxa on a given tree. However, I am having trouble getting the bioperl code to work. Here is the code that I am using: -------- #! /usr/bin/perl use strict; use warnings; use Bio::Tree::Draw::Cladogram; use Bio::TreeIO; #use Bio::TreeFunctionsI; my $node1 = 'homo_sapiens'; my $node2 = 'murinae'; my $input = new Bio::TreeIO('-format' => 'newick', '-file' => 'tree_mammalia_newick.txt'); my $tree = $input->next_tree; my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); my $distance = $tree->distance(-nodes => \@nodes); #print $distance; -------- And here is the error message I receive: ------------- EXCEPTION ------------- MSG: Must provide 2 nodes STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ Bio/Tree/TreeFunctionsI.pm:811 STACK toplevel ./phylo.pl:19 ------------------------------------- It seems that the nodes are not being read into the @nodes variable. Any help in figuring this out would be appreciated. Thanks, Bryan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Tue May 18 12:47:10 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 07:47:10 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> Message-ID: <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> On May 18, 2010, at 5:50 AM, Dave Messina wrote: > > On May 18, 2010, at 12:07, Peter wrote: > >> Or just turn off the download feature in github. > > That might be the best solution, at least for now. > > The download page is somewhat unfriendly anyway ? the tag names are truncated, there's no way to sort, and the descriptions are, well, not so descriptive (they appear to be just the last commit message). > > Probably better to keep > > http://www.bioperl.org/wiki/Getting_BioPerl > > as our main distribution point for downloads. > > > Dave We can turn that off for now, though it is a nice feature. If we need a replacement link for downloads we can use the repo.or.cz mirror link, for example: http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip chris From David.Messina at sbc.su.se Tue May 18 12:53:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 14:53:29 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: On May 18, 2010, at 14:47, Chris Fields wrote: > http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz > http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. I'll go ahead and update the nightly build links on http://www.bioperl.org/wiki/Getting_BioPerl to point to those, then, unless there are objections. Dave From cjfields at illinois.edu Tue May 18 13:56:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 08:56:45 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> On May 18, 2010, at 7:53 AM, Dave Messina wrote: > > On May 18, 2010, at 14:47, Chris Fields wrote: > >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip > > > Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. > > > I'll go ahead and update the nightly build links on > > http://www.bioperl.org/wiki/Getting_BioPerl > > to point to those, then, unless there are objections. > > > Dave This link also still works, even with the 'Downloads' tab off: http://github.com/bioperl/bioperl-live/archives/master Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. 'build' really never applied either, but oh well... chris From biopython at maubp.freeserve.co.uk Tue May 18 13:57:50 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 May 2010 14:57:50 +0100 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> Message-ID: On Tue, May 18, 2010 at 1:53 PM, Dave Messina wrote: > > > On May 18, 2010, at 14:47, Chris Fields wrote: > >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip > > > Oh right, I forgot about the mirror. Silly me. :) So probably > unnecessary to make our own nightly snapshots then. > Just like what you'd get from the big "Download Source" button on github? Equivalent to visiting this page: http://github.com/bioperl/bioperl-live/archives/master Peter From cjfields at illinois.edu Tue May 18 14:03:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 09:03:46 -0500 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> Message-ID: On May 18, 2010, at 8:56 AM, Chris Fields wrote: > On May 18, 2010, at 7:53 AM, Dave Messina wrote: > >> >> On May 18, 2010, at 14:47, Chris Fields wrote: >> >>> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.tar.gz >>> http://repo.or.cz/w/bioperl-live.git/snapshot/HEAD.zip >> >> >> Oh right, I forgot about the mirror. Silly me. :) So probably unnecessary to make our own nightly snapshots then. >> >> >> I'll go ahead and update the nightly build links on >> >> http://www.bioperl.org/wiki/Getting_BioPerl >> >> to point to those, then, unless there are objections. >> >> >> Dave > > This link also still works, even with the 'Downloads' tab off: > > http://github.com/bioperl/bioperl-live/archives/master > > Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. > > 'build' really never applied either, but oh well... > > chris Oh, and on the topic of annotated tags for downloads: http://github.com/blog/651-annotated-downloads chris From David.Messina at sbc.su.se Tue May 18 14:23:34 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 16:23:34 +0200 Subject: [Bioperl-l] branch pruning v.2 In-Reply-To: References: <4BEC79A0.5000505@cornell.edu> <8B621CC8-59E5-42B1-989C-E1D7B4F6F91E@drycafe.net> <19356494-588D-463C-9218-43AD11D7C3E2@illinois.edu> <4BED9892.5070408@cornell.edu> <4BEDD65E.9070702@cornell.edu> <7F9A4242-AF98-4081-ACFD-DADA166F727F@illinois.edu> <4BF1DC11.6030402@cornell.edu> <7427BBA8-DA94-403C-844C-E3C2235A8DC4@jays.net> <4BF23EE4.6020704@cornell.edu> <789B4843-C474-4BFE-947F-C4AAC58D12B1@sbc.su.se> <0689EF76-0833-4DA3-9607-F11DA6857BF9@illinois.edu> <320F66C1-021F-4802-857B-17622B74EB75@illinois.edu> Message-ID: <075CC735-0573-4E79-975F-23AD61C41C72@sbc.su.se> On May 18, 2010, at 16:03, Chris Fields wrote: > > This link also still works, even with the 'Downloads' tab off: > > http://github.com/bioperl/bioperl-live/archives/master Ah, great, thanks Chris and Peter. > Either works for me, and they're synced, so they should be renamed as 'snapshots' as 'nightly' no longer applies. > > 'build' really never applied either, but oh well... Righto ? done. 'Snapshots' it is. > Oh, and on the topic of annotated tags for downloads: > > http://github.com/blog/651-annotated-downloads Heh, how timely. :) Good, that will solve the description part of it nicely. Dave From jay at jays.net Tue May 18 14:32:47 2010 From: jay at jays.net (Jay Hannah) Date: Tue, 18 May 2010 09:32:47 -0500 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: <20100518030511.59C314202D@smtp1.rs.github.com> References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: Hi Florent, Can you add a line to the /Changes please? New features are especially great to add to that file. :) If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. You also might want to set your git config so your email is valid in your commits. e.g.: $ git config user.name "Jay Hannah" $ git config user.email jay at jays.net (these end up in ~/.gitconfig) Thanks! Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah On May 17, 2010, at 10:05 PM, noreply at github.com wrote: > Branch: refs/heads/master > Home: http://github.com/bioperl/bioperl-live > > Commit: 87c530525da35a981e9f7b06134184f0adfae156 > http://github.com/bioperl/bioperl-live/commit/87c530525da35a981e9f7b06134184f0adfae156 > Author: Florent Angly > Date: 2010-05-17 (Mon, 17 May 2010) > > Changed paths: > M Bio/Assembly/IO.pm > M Bio/Assembly/IO/ace.pm > M t/Assembly/Assembly.t > > Log Message: > ----------- > Implemented the 454 Newbler ACE assembly variant > > > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l From florent.angly at gmail.com Tue May 18 15:11:40 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 18 May 2010 08:11:40 -0700 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: <4BF2AE2C.209@gmail.com> Good idea Jay! I did as you suggested. Florent On 18/05/10 07:32, Jay Hannah wrote: > Can you add a line to the /Changes please? New features are especially great to add to that file.:) > > If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. > > You also might want to set your git config so your email is valid in your commits. e.g.: > From bimber at wisc.edu Tue May 18 15:28:06 2010 From: bimber at wisc.edu (Ben Bimber) Date: Tue, 18 May 2010 10:28:06 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? Message-ID: this question is more of a general perl one than bioperl specific, so I hope it is appropriate for this list: I am writing code that has two steps. the first generates a large, complex hash describing mutations. it takes a fair amount of time to run this step. the second step uses this data to perform downstream calculations. for the purposes of writing/debugging this downstream code, it would save me a lot of time if i could run the first step once, then store this hash in something like the file system. this way I could quickly load it, when debugging the downstream code without waiting for the hash to be recreated. is there a 'best practice' way to do something like this? I could save a tab-delimited file, which is human readable, but does not represent the structure of the hash, so I would need code to re-parse it. I assume I could probably do something along the lines of dumping a JSON string, then read/decode it. this is easy, but not so human-readable. is there another option i'm not thinking of? what do others do in this sort of situation? thanks in advance. -Ben From cjfields at illinois.edu Tue May 18 15:31:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 10:31:14 -0500 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: On May 18, 2010, at 9:32 AM, Jay Hannah wrote: > Hi Florent, > > Can you add a line to the /Changes please? New features are especially great to add to that file. :) > > If we all add to /Changes every time we do anything significant then whoever does the next release has less work to do and CPAN pushes can become less painful, more frequent. Agreed (or, +1, depending on your taste). Also, I would really like to break the habit of committing everything straight to trunk and promote using branches more. Branches are cheap. Something like: # on master git checkout -b 'topic/feature_foo' # switches over to branch 'topic/feature_foo' # hack hack hack # make commits # add tests # add to Changes # make more commits # push to remote branch # merge to master git checkout master git merge 'topic/feature_foo' # test test test, etc, push to origin or similar. Of course, there would be more to it (handling merge conflicts, etc), just need to get a decent workflow document started up. Ah tuits, where are you? > You also might want to set your git config so your email is valid in your commits. e.g.: > > $ git config user.name "Jay Hannah" > $ git config user.email jay at jays.net > (these end up in ~/.gitconfig) > > Thanks! > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah I think these are only set there if you use --global, correct? Otherwise it's repo-specific, would be in .git/ somewhere. chris From s.denaxas at gmail.com Tue May 18 15:41:01 2010 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Tue, 18 May 2010 16:41:01 +0100 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Hello, it all really depends on your definition of readable. YAML is readable but requires a parser ; XML is readable but is bloated and requires a code and a parser. You can directly dump the output from Data::Dumper and then eval() it back in a hash. I would think this is the cleanest way if you specifically want to dump a hash and re-generate it with no additional code. You can set the $Data::Dumper::Indent flag to control how readable the hash is. hope this helps, Spiros On Tue, May 18, 2010 at 4:28 PM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. ?the first generates a large, > complex hash describing mutations. ?it takes a fair amount of time to > run this step. ?the second step uses this data to perform downstream > calculations. ?for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. ?this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? ?I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. ?I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. ?this is easy, but not so > human-readable. ?is there another option i'm not thinking of? ?what do > others do in this sort of situation? > > thanks in advance. > > -Ben > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From adsj at novozymes.com Tue May 18 15:57:12 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Tue, 18 May 2010 17:57:12 +0200 Subject: [Bioperl-l] storing/retrieving a large hash on file system? References: Message-ID: <87zkzxmcdj.fsf@topper.koldfront.dk> On Tue, 18 May 2010 10:28:06 -0500, Ben wrote: > is there a 'best practice' way to do something like this? The only one I can think of is "Don't make up your own format unless you really, really have to". > I could save a tab-delimited file, which is human readable, but does > not represent the structure of the hash, so I would need code to > re-parse it. I assume I could probably do something along the lines of > dumping a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? I would use YAML or JSON if I had to look at it "by hand" or if it had to be somehow portable. I would prefer those over CSV, which hasn't necessarily got well-defined handling of special chars, whitespace etc. If speed is more important, I think the Storable module is quite a bit quicker, but the format is "binary". Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From sdavis2 at mail.nih.gov Tue May 18 16:09:38 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 18 May 2010 12:09:38 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > There are a number of solutions on CPAN, probably. This is one maybe off the beaten path, but it is getting a lot of press in the NoSQL database realm: http://1978th.net/tokyocabinet/ Sean From David.Messina at sbc.su.se Tue May 18 16:19:18 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 18:19:18 +0200 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Hi Ben, Storable should do the trick. http://search.cpan.org/~ams/Storable-2.21/ It allows you to save arbitrary perl data structures to disk and load them back in without needing to dump into another format and then parse it later. Dave From cjfields at illinois.edu Tue May 18 16:22:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 11:22:09 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On May 18, 2010, at 10:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > -Ben Would a simple DB_File tied hash work? chris From cjfields at illinois.edu Tue May 18 16:25:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 11:25:11 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: <87zkzxmcdj.fsf@topper.koldfront.dk> References: <87zkzxmcdj.fsf@topper.koldfront.dk> Message-ID: On May 18, 2010, at 10:57 AM, Adam Sj?gren wrote: > On Tue, 18 May 2010 10:28:06 -0500, Ben wrote: > >> is there a 'best practice' way to do something like this? > > The only one I can think of is "Don't make up your own format unless you > really, really have to". > >> I could save a tab-delimited file, which is human readable, but does >> not represent the structure of the hash, so I would need code to >> re-parse it. I assume I could probably do something along the lines of >> dumping a JSON string, then read/decode it. this is easy, but not so >> human-readable. is there another option i'm not thinking of? what do >> others do in this sort of situation? > > I would use YAML or JSON if I had to look at it "by hand" or if it had > to be somehow portable. I would prefer those over CSV, which hasn't > necessarily got well-defined handling of special chars, whitespace etc. > > If speed is more important, I think the Storable module is quite a bit > quicker, but the format is "binary". > > > Best regards, > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Yes, that in combination with a AnyDBM tied hash would work (essentially what Bio::SeqFeature::Collection is under the hood). chris From sdavis2 at mail.nih.gov Tue May 18 16:39:44 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 18 May 2010 12:39:44 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: On Tue, May 18, 2010 at 12:09 PM, Sean Davis wrote: > > > On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: > >> this question is more of a general perl one than bioperl specific, so >> I hope it is appropriate for this list: >> >> I am writing code that has two steps. the first generates a large, >> complex hash describing mutations. it takes a fair amount of time to >> run this step. the second step uses this data to perform downstream >> calculations. for the purposes of writing/debugging this downstream >> code, it would save me a lot of time if i could run the first step >> once, then store this hash in something like the file system. this >> way I could quickly load it, when debugging the downstream code >> without waiting for the hash to be recreated. >> >> is there a 'best practice' way to do something like this? I could >> save a tab-delimited file, which is human readable, but does not >> represent the structure of the hash, so I would need code to re-parse >> it. I assume I could probably do something along the lines of dumping >> a JSON string, then read/decode it. this is easy, but not so >> human-readable. is there another option i'm not thinking of? what do >> others do in this sort of situation? >> >> thanks in advance. >> >> > There are a number of solutions on CPAN, probably. This is one maybe off > the beaten path, but it is getting a lot of press in the NoSQL database > realm: > > http://1978th.net/tokyocabinet/ > > Just to be clear, I am assuming that the problem at hand is storing a key/value pair and then retrieving it later. If what you are talking about is a multi-level hash data structure, then Data::Dumper might be the easiest way to go. Sorry for the confusion.... Sean From bimber at wisc.edu Tue May 18 16:47:33 2010 From: bimber at wisc.edu (Ben Bimber) Date: Tue, 18 May 2010 11:47:33 -0500 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: Thanks for all the suggestions. Storable seems like the simplest route. This will save me hours of staring at my computer. -Ben On Tue, May 18, 2010 at 11:39 AM, Sean Davis wrote: > > > On Tue, May 18, 2010 at 12:09 PM, Sean Davis wrote: >> >> >> On Tue, May 18, 2010 at 11:28 AM, Ben Bimber wrote: >>> >>> this question is more of a general perl one than bioperl specific, so >>> I hope it is appropriate for this list: >>> >>> I am writing code that has two steps. ?the first generates a large, >>> complex hash describing mutations. ?it takes a fair amount of time to >>> run this step. ?the second step uses this data to perform downstream >>> calculations. ?for the purposes of writing/debugging this downstream >>> code, it would save me a lot of time if i could run the first step >>> once, then store this hash in something like the file system. ?this >>> way I could quickly load it, when debugging the downstream code >>> without waiting for the hash to be recreated. >>> >>> is there a 'best practice' way to do something like this? ?I could >>> save a tab-delimited file, which is human readable, but does not >>> represent the structure of the hash, so I would need code to re-parse >>> it. ?I assume I could probably do something along the lines of dumping >>> a JSON string, then read/decode it. ?this is easy, but not so >>> human-readable. ?is there another option i'm not thinking of? ?what do >>> others do in this sort of situation? >>> >>> thanks in advance. >>> >> >> There are a number of solutions on CPAN, probably.? This is one maybe off >> the beaten path, but it is getting a lot of press in the NoSQL database >> realm: >> >> http://1978th.net/tokyocabinet/ >> > > Just to be clear, I am assuming that the problem at hand is storing a > key/value pair and then retrieving it later.? If what you are talking about > is a multi-level hash data structure, then Data::Dumper might be the easiest > way to go. > > Sorry for the confusion.... > > Sean > > > From bosborne11 at verizon.net Tue May 18 16:00:06 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 18 May 2010 12:00:06 -0400 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> Ben, I've use Storable to do things like this, for example: use Storable; my %species = ( "Sc" => 4932, # Saccharomyces cerevisiae "Ec" => 83333, # Escherichia coli K12 "Hs" => 9606 # H. sapiens ); my ($help,$id,$name); GetOptions( "s=s" => \$name, "i=i" => \$id, "h" => \$help ); usage() if ($help || !$id || !$name); my $storedHash = $name . ".dump"; # create index for a directory of fasta files my $db = Bio::DB::Fasta->new($name, -makeid => \&make_my_id); # extract species-specific data from gene2accession unless (-e $storedHash) { my $ref; # extract species-specific information from gene2accession open MYIN,"gene2accession" or die "No gene2accession file\n"; while () { my @arr = split "\t",$_; if ($arr[0] == $species{$name} && $arr[9] =~ /\d+/ && $arr[10] =~ /\d+/) { ($ref->{$arr[1]}->{"start"}, $ref->{$arr[1]}->{"end"}, $ref->{$arr[1]}->{"strand"}, $ref->{$arr[1]}->{"id"}) = ($arr[9], $arr[10], $arr[11], $arr[7]); } } # save species-specific information using Storable store $ref, $storedHash; } # retrieve the species-specific data from a stored hash my $ref = retrieve($storedHash); Take away all the parsing details and you can see that it's simple, and that Storable exports store() and retrieve(). Make up a file name, "store" the hash reference. Brian O. On May 18, 2010, at 11:28 AM, Ben Bimber wrote: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? > > thanks in advance. > > -Ben > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Tue May 18 16:06:54 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 18 May 2010 12:06:54 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? Message-ID: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> bioperl-l, Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. We want these to point to github, yes? I'll fix it if the answer is 'yes'. Brian O. From cjfields at illinois.edu Tue May 18 18:04:55 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 May 2010 13:04:55 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> Message-ID: <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> Yes. chris On May 18, 2010, at 11:06 AM, Brian Osborne wrote: > bioperl-l, > > Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. > > We want these to point to github, yes? I'll fix it if the answer is 'yes'. > > Brian O. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Tue May 18 19:39:48 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 12:39:48 -0700 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: References: <20100518030511.59C314202D@smtp1.rs.github.com> Message-ID: <4BF2ED04.2050106@cornell.edu> Chris Fields wrote: > Agreed (or, +1, depending on your taste). Also, I would really like to break the habit of committing everything straight to trunk and promote using branches more. Branches are cheap. I did some work on our git workflow at http://www.bioperl.org/wiki/Using_Git#Developing_BioPerl, but it still needs some more work. So, there's the start of the workflow document I think. Rob From rmb32 at cornell.edu Tue May 18 19:42:44 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 May 2010 12:42:44 -0700 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> References: <5EF45327-CFAE-4A75-91F7-04D154CA2A36@verizon.net> Message-ID: <4BF2EDB4.4060907@cornell.edu> Based on your description, you want to use either: Storable - if you want to load the whole hash into memory or AnyDBM - if you want to be able to look things up from the hash without loading the whole thing in memory Rob From David.Messina at sbc.su.se Tue May 18 20:16:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 May 2010 22:16:14 +0200 Subject: [Bioperl-l] Please update /Changes as you commit things In-Reply-To: <4BF2ED04.2050106@cornell.edu> References: <20100518030511.59C314202D@smtp1.rs.github.com> <4BF2ED04.2050106@cornell.edu> Message-ID: <2D6396F7-E478-4544-B26A-F8A5799F2039@sbc.su.se> Nice, Rob! > I did some work on our git workflow at http://www.bioperl.org/wiki/Using_Git#Developing_BioPerl, but it still needs some more work. > > So, there's the start of the workflow document I think. From bpcwhite at gmail.com Tue May 18 21:34:06 2010 From: bpcwhite at gmail.com (Bryan White) Date: Tue, 18 May 2010 14:34:06 -0700 (PDT) Subject: [Bioperl-l] distance In-Reply-To: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> References: <002901caf67a$56c9cff0$045d6fd0$%yin@ucd.ie> Message-ID: <1a2c786f-07e6-4499-8dc9-19a8d4169653@u3g2000prl.googlegroups.com> Thanks guys, I got it working! Bryan On May 18, 4:07?am, Jun Yin wrote: > Hi, Bryan, > > In your code: > ? ? ? ? my @nodes = $tree->find_node(-fieldname => > 'Homo_sapiens','Murinae'); > > First, You should specify the fieldname. The "fieldname" itself doesnot seem > like a valid key. The default field name is "id". > Second, the find_node method can only search for one specific term at one > time. > Third, distance method can only work on two nodes. > > So try this: > > my @nodes_human = $tree->find_node(-id => 'Homo_sapiens'); > my @nodes_murinae=$tree->find_node(-id=>'Murinae'); > > my $distance = $tree->distance(-nodes => > \($nodes_human[0],$nodes_murinae[0])); #Providing you only have one match > for "Homo_sapiens" and " Murinae". > > Cheers, > Jun Yin > Ph.D.?student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > -----Original Message----- > From: bioperl-l-boun... at lists.open-bio.org > > [mailto:bioperl-l-boun... at lists.open-bio.org] On Behalf Of Bryan White > Sent: Tuesday, May 18, 2010 10:49 AM > To: bioper... at bioperl.org > Subject: [Bioperl-l] distance > > Hello, > > I am trying to create a simple program to show me the distance between > taxa on a given tree. However, I am having trouble getting the bioperl > code to work. Here is the code that I am using: > -------- > #! /usr/bin/perl > use strict; > use warnings; > use Bio::Tree::Draw::Cladogram; > use Bio::TreeIO; > #use Bio::TreeFunctionsI; > > my $node1 = 'homo_sapiens'; > my $node2 = 'murinae'; > my $input = new Bio::TreeIO('-format' => 'newick', > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-file' => 'tree_mammalia_newick.txt'); > > my $tree = $input->next_tree; > > my @nodes = $tree->find_node(-fieldname => 'Homo_sapiens','Murinae'); > > my $distance = $tree->distance(-nodes => \@nodes); > > #print $distance; > > -------- > > And here is the error message I receive: > > ------------- EXCEPTION ------------- > MSG: Must provide 2 nodes > STACK Bio::Tree::TreeFunctionsI::distance /usr/local/share/perl/5.10.1/ > Bio/Tree/TreeFunctionsI.pm:811 > STACK toplevel ./phylo.pl:19 > ------------------------------------- > > It seems that the nodes are not being read into the @nodes variable. > Any help in figuring this out would be appreciated. > > Thanks, > Bryan > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________ Information from ESET Smart Security, version of virus signature > database 5099 (20100509) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > __________ Information from ESET Smart Security, version of virus signature > database 5099 (20100509) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Wed May 19 04:17:24 2010 From: hartzell at alerce.com (George Hartzell) Date: Tue, 18 May 2010 21:17:24 -0700 Subject: [Bioperl-l] storing/retrieving a large hash on file system? In-Reply-To: References: Message-ID: <19443.26196.893455.52821@gargle.gargle.HOWL> Ben Bimber writes: > this question is more of a general perl one than bioperl specific, so > I hope it is appropriate for this list: > > I am writing code that has two steps. the first generates a large, > complex hash describing mutations. it takes a fair amount of time to > run this step. the second step uses this data to perform downstream > calculations. for the purposes of writing/debugging this downstream > code, it would save me a lot of time if i could run the first step > once, then store this hash in something like the file system. this > way I could quickly load it, when debugging the downstream code > without waiting for the hash to be recreated. > > is there a 'best practice' way to do something like this? I could > save a tab-delimited file, which is human readable, but does not > represent the structure of the hash, so I would need code to re-parse > it. I assume I could probably do something along the lines of dumping > a JSON string, then read/decode it. this is easy, but not so > human-readable. is there another option i'm not thinking of? what do > others do in this sort of situation? Someone early on in the thread said not to invent another format, and I concur with that whole heartedly. Your choice of words, "large complex hash" makes me worry that you have something more than a large single level hash with sensible keys. Hashes of references to hashes to references to lists to etc... give me hives. If you'ld like to put add a nice general purpose tool to your kit, think about putting it into a simple SQLite database. Put it into an SQLite db and talk to it via DBI and you get some really cool tricks: - you can store complex stuff, - get back the just the part you need, a column, several columns, or the result of a join among multiple tables, - add indexes to make it Go Fast. and in the cool tricks category - you can use SQLite's backup interface to build the database in memory (nice and fast) then quickly stream it out to a disk based file for persistence. - same trick in reverse, if you know you're going to do a reasonably large number of complex queries you can stream a database into memory and then run your queries quickly. - rtree indexes are cool. Going forward you can scale things up to big databases (Pg, Oracle), you can provide safe multiuser access, transactions, etc.... (NFS not withstanding), etc.... g. From avilella at gmail.com Wed May 19 08:36:25 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 19 May 2010 09:36:25 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Message-ID: Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. From jun.yin at ucd.ie Wed May 19 10:40:51 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Wed, 19 May 2010 11:40:51 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: <008101caf73f$c04973c0$40dc5b40$%yin@ucd.ie> Hi, Albert, Check this page for the BioPerl wrapper on next-gen sequencing results http://bioperl.org/wiki/HOWTO:Short-read_assemblies_with_BWA And, I don't think Bio::SimpleAlign works on assembly files. It is targeted at global alignment, e.g. clustalw output file. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Albert Vilella Sent: Wednesday, May 19, 2010 9:36 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5099 (20100509) __________ The message was checked by ESET Smart Security. http://www.eset.com From maj at fortinbras.us Wed May 19 13:34:01 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 09:34:01 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > Hi, > > I would like to know what would be the best way to generate a SAM/BAM file > with cDNA alignments against the human reference from a bunch of > Bio::SimpleAlign > cDNA multiple sequence alignment objects. > > Considering I've got a way to map the cDNAs to chromosome coordinates, > how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 > human > coordinates? > > As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads > assemblies. > Should I be using some other tool existing not in bioperl? > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed May 19 13:59:03 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 09:59:03 -0400 Subject: [Bioperl-l] out of memory issue In-Reply-To: References: Message-ID: Hi Shalabh and all, Sorry to comment on an old thread, but Dan Kortschak just pointed me to Tie::File. This may be the right solution to this issue. It turns out that DB_File will read in the entire file to memory anyway, while Tie::File (by MJD of course) works on pieces as it should. See Tie::File in CPAN and also this informative post: http://perl.plover.com/TieFile/why-not-DB_File cheers all- (someday, maybe next month, I'll return in force) MAJ ----- Original Message ----- From: "shalabh sharma" To: "bioperl-l" Sent: Wednesday, April 28, 2010 10:13 AM Subject: [Bioperl-l] out of memory issue > Hi All, > I am trying to make a hash of 38 Million ids but every time i get the > following message : > > perl(191) malloc: *** mmap(size=16777216) failed (error code=12) > *** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > Out of memory! > > I am working on MacOX 10.5.8 with 4GB of memory. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From avilella at gmail.com Wed May 19 15:00:27 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 19 May 2010 16:00:27 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Awesome, thanks. I'll give it a try :-) On Wed, May 19, 2010 at 2:34 PM, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use > of Bio::Assembly::IO::sam (I think). I know there is only read capability > for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing > writes (some assembly (so to speak) required...)-- cheers MAJ > ----- Original Message ----- From: "Albert Vilella" > To: > Sent: Wednesday, May 19, 2010 4:36 AM > Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > > >> Hi, >> >> I would like to know what would be the best way to generate a SAM/BAM file >> with cDNA alignments against the human reference from a bunch of >> Bio::SimpleAlign >> cDNA multiple sequence alignment objects. >> >> Considering I've got a way to map the cDNAs to chromosome coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >> human >> coordinates? >> >> As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads >> assemblies. >> Should I be using some other tool existing not in bioperl? >> >> Cheers, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > From lincoln.stein at gmail.com Wed May 19 16:40:31 2010 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Wed, 19 May 2010 12:40:31 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the > use of Bio::Assembly::IO::sam (I think). I know there is only read > capability for B:A:I:sam, but Samtools may give you the appropriate wrapper > for doing writes (some assembly (so to speak) required...)-- cheers MAJ > ----- Original Message ----- From: "Albert Vilella" > > To: > Sent: Wednesday, May 19, 2010 4:36 AM > > Subject: [Bioperl-l] from SimpleAlign to SAM/BAM > > > Hi, >> >> I would like to know what would be the best way to generate a SAM/BAM file >> with cDNA alignments against the human reference from a bunch of >> Bio::SimpleAlign >> cDNA multiple sequence alignment objects. >> >> Considering I've got a way to map the cDNAs to chromosome coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >> human >> coordinates? >> >> As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads >> assemblies. >> Should I be using some other tool existing not in bioperl? >> >> Cheers, >> >> Albert. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From john.marshall at sanger.ac.uk Wed May 19 16:22:19 2010 From: john.marshall at sanger.ac.uk (John Marshall) Date: Wed, 19 May 2010 17:22:19 +0100 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: On 19 May 2010, at 14:34, Mark A. Jensen wrote: > Albert-- have a look at Bio::Tools::Run::Samtools which incorporates > the use of Bio::Assembly::IO::sam (I think). I've only briefly skimmed the B:T:R:Samtools documentation, but it would appear that this mostly encapsulates running the various samtools subcommands. These provide various manipulations on SAM and BAM files, but don't give you anything in terms of converting from not- SAM/BAM to SAM/BAM. > ----- Original Message ----- From: "Albert Vilella" > >> Considering I've got a way to map the cDNAs to chromosome >> coordinates, >> how can I generate a SAM/BAM file with ~1,000,000 entries against >> ~23.000 human >> coordinates? Perhaps I misunderstand, but if you already have a bunch of snippets of sequence and their mapped coordinates, then the easy way to generate a SAM file containing them is just to print it out by hand. A SAM file is just a tab-separated text file. For each sequence in your Bio::SimpleAlign objects, print out a line containing appropriate values for each of the 11 main SAM fields. (If the snippets are effectively unpaired, then MRNM,MPOS,ISIZE can just be *,0,0, and the only FLAG values you'll be choosing between are 0, 4, 16, and 20.) You should also start the file with an @SQ header for each of the chromosomes you've mapped against. (I'm assuming you've read http://samtools.sourceforge.net/SAM1.pdf -- it's a little vague, but should be more than enough to explain how to e.g. print out a basic SAM file with only the main fields.) Once you've printed out a simple SAM file, you can use B:T:R:Samtools or samtools directly or other tools to convert it to the binary BAM format and/or otherwise work with it. Cheers, John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From maj at fortinbras.us Wed May 19 17:26:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:26:16 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: <42F365BE46A545CE9DF897BA0B18B8EF@NewLife> CORRECTION: B:T:R:Samtools wraps samtools directly, as John said. Sorry, it's been a while... MAJ ----- Original Message ----- From: Lincoln Stein To: Mark A. Jensen Cc: Albert Vilella ; bioperl-l at bioperl.org Sent: Wednesday, May 19, 2010 12:40 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From maj at fortinbras.us Wed May 19 17:30:25 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:30:25 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: Yes that's right John; B:T:R:Samtools is used within the B:A:.I:sam to do the write out with samtools command line pgms. Interested parties might look at Bio::Asssembly::IO::sam to see how Lincoln's Bio::DB::Sam (which uses the libbam library directly via XS, also not BioPerl proper but we love it anyway) might be employed. ----- Original Message ----- From: "John Marshall" To: Cc: "Albert Vilella" Sent: Wednesday, May 19, 2010 12:22 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM > On 19 May 2010, at 14:34, Mark A. Jensen wrote: >> Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use >> of Bio::Assembly::IO::sam (I think). > > I've only briefly skimmed the B:T:R:Samtools documentation, but it would > appear that this mostly encapsulates running the various samtools > subcommands. These provide various manipulations on SAM and BAM files, but > don't give you anything in terms of converting from not- SAM/BAM to SAM/BAM. > >> ----- Original Message ----- From: "Albert Vilella" > > >>> Considering I've got a way to map the cDNAs to chromosome coordinates, >>> how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 >>> human >>> coordinates? > > Perhaps I misunderstand, but if you already have a bunch of snippets of > sequence and their mapped coordinates, then the easy way to generate a SAM > file containing them is just to print it out by hand. > > A SAM file is just a tab-separated text file. For each sequence in your > Bio::SimpleAlign objects, print out a line containing appropriate values for > each of the 11 main SAM fields. (If the snippets are effectively unpaired, > then MRNM,MPOS,ISIZE can just be *,0,0, and the only FLAG values you'll be > choosing between are 0, 4, 16, and 20.) > > You should also start the file with an @SQ header for each of the chromosomes > you've mapped against. > > (I'm assuming you've read http://samtools.sourceforge.net/SAM1.pdf -- it's a > little vague, but should be more than enough to explain how to e.g. print out > a basic SAM file with only the main fields.) > > Once you've printed out a simple SAM file, you can use B:T:R:Samtools or > samtools directly or other tools to convert it to the binary BAM format > and/or otherwise work with it. > > Cheers, > > John > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a > charity registered in England with number 1021457 and a company registered in > England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed May 19 17:21:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 19 May 2010 13:21:56 -0400 Subject: [Bioperl-l] from SimpleAlign to SAM/BAM In-Reply-To: References: Message-ID: B:T:R:Samtools wraps Bio::Samtools ----- Original Message ----- From: Lincoln Stein To: Mark A. Jensen Cc: Albert Vilella ; bioperl-l at bioperl.org Sent: Wednesday, May 19, 2010 12:40 PM Subject: Re: [Bioperl-l] from SimpleAlign to SAM/BAM Bio::Samtools, which is separate from bioperl but compatible with it, provides read/write access to SAM and BAM via Heng's C library. Lincoln On Wed, May 19, 2010 at 9:34 AM, Mark A. Jensen wrote: Albert-- have a look at Bio::Tools::Run::Samtools which incorporates the use of Bio::Assembly::IO::sam (I think). I know there is only read capability for B:A:I:sam, but Samtools may give you the appropriate wrapper for doing writes (some assembly (so to speak) required...)-- cheers MAJ ----- Original Message ----- From: "Albert Vilella" To: Sent: Wednesday, May 19, 2010 4:36 AM Subject: [Bioperl-l] from SimpleAlign to SAM/BAM Hi, I would like to know what would be the best way to generate a SAM/BAM file with cDNA alignments against the human reference from a bunch of Bio::SimpleAlign cDNA multiple sequence alignment objects. Considering I've got a way to map the cDNAs to chromosome coordinates, how can I generate a SAM/BAM file with ~1,000,000 entries against ~23.000 human coordinates? As far as I can see, there is an Bio::Assembly::IO::sam.pm which loads assemblies. Should I be using some other tool existing not in bioperl? Cheers, Albert. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Thu May 20 15:37:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 10:37:16 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> Message-ID: <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Yes, if you have time. I have started along that path already, but I'm sure there are lingering spots where links point to the wrong place, or subversion/svn is mentioned. chris On May 20, 2010, at 10:34 AM, Brian Osborne wrote: > Chris, > > Done, easy. Should I remove all references to SVN from the Wiki? > > Brian O. > > On May 18, 2010, at 2:04 PM, Chris Fields wrote: > >> Yes. >> >> chris >> >> On May 18, 2010, at 11:06 AM, Brian Osborne wrote: >> >>> bioperl-l, >>> >>> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >>> >>> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >>> >>> Brian O. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu May 20 16:05:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 11:05:56 -0500 Subject: [Bioperl-l] Regarding git commits... Message-ID: All, Please make sure to update your local git repos prior to doing commits and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. chris From florent.angly at gmail.com Thu May 20 16:22:50 2010 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 20 May 2010 09:22:50 -0700 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: <4BF561DA.1070700@gmail.com> On 20/05/10 09:05, Chris Fields wrote: > All, > > Please make sure to update your local git repos prior to doing commits That's done with "git pull", as mentioned on the wiki (http://www.bioperl.org/wiki/Using_Git), right? > and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bosborne11 at verizon.net Thu May 20 15:34:39 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 20 May 2010 11:34:39 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> Message-ID: <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> Chris, Done, easy. Should I remove all references to SVN from the Wiki? Brian O. On May 18, 2010, at 2:04 PM, Chris Fields wrote: > Yes. > > chris > > On May 18, 2010, at 11:06 AM, Brian Osborne wrote: > >> bioperl-l, >> >> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >> >> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >> >> Brian O. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu May 20 16:58:22 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 20 May 2010 09:58:22 -0700 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <4BF561DA.1070700@gmail.com> References: <4BF561DA.1070700@gmail.com> Message-ID: <4BF56A2E.8060309@bioperl.org> I think you want $ git pull upstream master http://help.github.com/forking/ Florent Angly wrote, On 5/20/10 9:22 AM: > On 20/05/10 09:05, Chris Fields wrote: >> All, >> >> Please make sure to update your local git repos prior to doing commits > That's done with "git pull", as mentioned on the wiki > (http://www.bioperl.org/wiki/Using_Git), right? > >> and pushing to master, and merge commits in properly if they don't >> match. Please please please don't save over files if they don't >> merge correctly. I just found out I had a prior commit that fixed >> the test number and removed old files that was completely clobbered, >> so I'm having to hand-merge those changes back in now. If it were >> anything more involved I would revert that prior commit completely. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu May 20 17:35:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 12:35:09 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <4BF56A2E.8060309@bioperl.org> References: <4BF561DA.1070700@gmail.com> <4BF56A2E.8060309@bioperl.org> Message-ID: <86401472-ECAB-4C21-8BD1-61AB37003F64@illinois.edu> Yes. The general syntax is: git pull If you have a read-write checkout directly from bioperl/bioperl-live.git, 'origin' should be set to that, and if you are on the a specific branch a simple 'git pull' will work (it implies 'git pull origin '). All collabs can do this. In the case of a forked repo (which anyone can do), it's a little trickier as it's essentially a branch from the repository at a specific point; it isn't automatically synced. You can see that here: http://github.com/bioperl/bioperl-live/network In order to sync with the original repo, you need to specify exactly which remote to pull from, likely not 'origin' (which is your forked repo), but 'upstream' or whatever you set the original bioperl read-only repo to via: git remote add upstream git://github.com/bioperl/bioperl-live.git Then, to sync, do: git pull upstream master git push # goes to your forked repo chris PS - Note on the graph linked to I just synced my branch using the above. On May 20, 2010, at 11:58 AM, Jason Stajich wrote: > I think you want > $ git pull upstream master > > http://help.github.com/forking/ > > Florent Angly wrote, On 5/20/10 9:22 AM: >> On 20/05/10 09:05, Chris Fields wrote: >>> All, >>> >>> Please make sure to update your local git repos prior to doing commits >> That's done with "git pull", as mentioned on the wiki (http://www.bioperl.org/wiki/Using_Git), right? >> >>> and pushing to master, and merge commits in properly if they don't match. Please please please don't save over files if they don't merge correctly. I just found out I had a prior commit that fixed the test number and removed old files that was completely clobbered, so I'm having to hand-merge those changes back in now. If it were anything more involved I would revert that prior commit completely. >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Thu May 20 18:06:13 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 20 May 2010 13:06:13 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: On May 20, 2010, at 11:05 AM, Chris Fields wrote: > Please make sure to update your local git repos prior to doing commits and pushing to master I thought git refused to push if your local was out of date? (I thought this was one of the general selling points of git?) It seems to be doing that to me, below. Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah jhannah at jaysnet-MacBook:~/src/sandbox$ git push To git at github.com:jhannah/sandbox.git ! [rejected] master -> master (non-fast-forward) error: failed to push some refs to 'git at github.com:jhannah/sandbox.git' To prevent you from losing history, non-fast-forward updates were rejected Merge the remote changes before pushing again. See the 'Note about fast-forwards' section of 'git push --help' for details. From cjfields at illinois.edu Thu May 20 18:43:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 13:43:12 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: References: Message-ID: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. chris On May 20, 2010, at 1:06 PM, Jay Hannah wrote: > On May 20, 2010, at 11:05 AM, Chris Fields wrote: >> Please make sure to update your local git repos prior to doing commits and pushing to master > > I thought git refused to push if your local was out of date? (I thought this was one of the general selling points of git?) It seems to be doing that to me, below. > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > > > jhannah at jaysnet-MacBook:~/src/sandbox$ git push > To git at github.com:jhannah/sandbox.git > ! [rejected] master -> master (non-fast-forward) > error: failed to push some refs to 'git at github.com:jhannah/sandbox.git' > To prevent you from losing history, non-fast-forward updates were rejected > Merge the remote changes before pushing again. See the 'Note about > fast-forwards' section of 'git push --help' for details. > From jay at jays.net Thu May 20 19:09:00 2010 From: jay at jays.net (Jay Hannah) Date: Thu, 20 May 2010 14:09:00 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> References: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> Message-ID: <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> On May 20, 2010, at 1:43 PM, Chris Fields wrote: > It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. So... you're saying don't commit if you don't have any idea what you're committing? :) git pull git diff git status if local is clean then -edit- git diff if it looks good then git commit git status if it looks good then git push Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah enjoys preaching to the choir ;) From cjfields at illinois.edu Thu May 20 19:24:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 14:24:17 -0500 Subject: [Bioperl-l] Regarding git commits... In-Reply-To: <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> References: <7DDA00AF-3374-4EC9-9AF8-FE8A01AFBA52@illinois.edu> <77429AE4-B21F-4EFD-A4BD-430051E66A22@jays.net> Message-ID: <95305268-0D84-478C-A380-68E81742F18F@illinois.edu> On May 20, 2010, at 2:09 PM, Jay Hannah wrote: > On May 20, 2010, at 1:43 PM, Chris Fields wrote: >> It should, yes, and it is a nice selling point. But you can update local or set up a new clone from remote, then save over any changes with a backup copy from an old (non-merged) version, then commit. That's something that can happen in any VCS. > > So... you're saying don't commit if you don't have any idea what you're committing? :) > > git pull > git diff > git status > if local is clean then > -edit- > git diff if it looks good then git commit > git status if it looks good then git push > > Jay Hannah > http://biodoc.ist.unomaha.edu/wiki/User:Jhannah > enjoys preaching to the choir ;) Maybe the point is, if someone is having a problem with git either pulling from or pushing to the remote repo, it's very likely b/c of a merge conflict (git is trying to tell you something). There are lots of ways to resolve those (most easily by hand if the change is small). But saving over the top of someone else's commit in a re-cloned repo is definitely not one of them. Possibly a section of 'Using git' that needs some work? chris From charles.tilford at bms.com Thu May 20 20:27:27 2010 From: charles.tilford at bms.com (Charles Tilford) Date: Thu, 20 May 2010 16:27:27 -0400 Subject: [Bioperl-l] Bio::Species irritated with "unclassified sequences" Message-ID: <4BF59B2F.9000300@bms.com> Bio::Species::classification() is irritated with me when I provide it with a @class_array that is composed of one node, particularly: $obj->classification("unclassified sequences") AFAICT this is a valid, single node taxa "tree": http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=12908 Subroutine classification is expecting at least two class members, the problem with the above call crops up as: Use of uninitialized value $vals[1] in quotemeta at /stf/biocgi/tilfordc/patch_lib/Bio/Species.pm line 179 ( $Id: Species.pm 16700 2010-01-15 19:50:11Z dave_messina $) ... and the relevant code is: sub classification { my ($self, @vals) = @_; if (@vals) { if (ref($vals[0]) eq 'ARRAY') { @vals = @{$vals[0]}; } # make sure the lineage contains us as first or second element # (lineage may have subspecies, species, genus ...) my $name = $self->node_name; my ($genus, $species) = (quotemeta($vals[1]), quotemeta($vals[0])); That is, it's expecting at least (species, genus) in the array. Am I misusing classification(), or Bio::Species in general? I know it's named "Species", but I've been using it as a generic tree object for arbitrary taxonomy nodes, not just species and subspecies. This block a little lower down: unless ($self->rank) { # and that we are rank species $self->rank('species'); } ... implies that the module can be used for taxa ranks other than species. However, doing so would not prevent the module being aggravated over a null $vals[1]. The use case here is building Bio::Seq::RichSeq objects pulled from a (very large) sequence database, and then dumped / displayed with SeqIO. Most are well behaved, but there's a non-trivial number of 'artificial' constructs that don't root to an organism. -CAT From dimitark at bii.a-star.edu.sg Fri May 21 02:18:21 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Fri, 21 May 2010 10:18:21 +0800 Subject: [Bioperl-l] a problem with HspI module? Message-ID: <4BF5ED6D.6030506@bii.a-star.edu.sg> Hello guys, i think i found a problem with ' Bio::Search::HSP::HSPI'. Consider the following HSP: ------------- Score = 48.9 bits (115), Expect = 8e-04, Method: Compositional matrix adjust. Identities = 27/77 (35%), Positives = 40/77 (51%), Gaps = 14/77 (18%) Frame = +1 Query 371 PSGMLLA-----SCSDDMTLKIWSMKQEVCIHDLQAHNKEIYTIKWSPTGPATSNPNSNI 425 P LLA S S D T+++W ++Q VC H L H + +Y++ +SP G Sbjct 6955270 PGLQLLAFSHPPSASFDSTVRLWDVEQGVCTHTLMKHQEPVYSVAFSPDGK--------- 6955422 Query 426 MLASASFDSTVRLWDIE 442 LAS SFD V +W+ + Sbjct 6955423 YLASGSFDKYVHIWNTQ 6955473 --------------- The method 'frac_identical' is not functioning right. ------------- Title : frac_identical Usage : my $frac_id = $hsp->frac_identical( ['query'|'hit'|'total'] ); Function: Returns the fraction of identitical positions for this HSP Returns : Float in range 0.0 -> 1.0 Args : 'query' = num identical / length of query seq (without gaps) 'hit' = num identical / length of hit seq (without gaps) 'total' = num identical / length of alignment (with gaps) default = 'total' --------------- According to the method description, for the HSP above, 'frac_identical' should return '0.42' with 'hit'. But it doesnt. Now with 'hit' gives '0.13'. With 'total' gives normal result '0.35'. Thats all. Cheers Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From cjfields at illinois.edu Fri May 21 02:24:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 May 2010 21:24:46 -0500 Subject: [Bioperl-l] a problem with HspI module? In-Reply-To: <4BF5ED6D.6030506@bii.a-star.edu.sg> References: <4BF5ED6D.6030506@bii.a-star.edu.sg> Message-ID: It would be best to file this in a bug report, along with example data. chris On May 20, 2010, at 9:18 PM, Dimitar Kenanov wrote: > Hello guys, > i think i found a problem with ' Bio::Search::HSP::HSPI'. Consider the following HSP: > ------------- > Score = 48.9 bits (115), Expect = 8e-04, Method: Compositional matrix adjust. > Identities = 27/77 (35%), Positives = 40/77 (51%), Gaps = 14/77 (18%) > Frame = +1 > > Query 371 PSGMLLA-----SCSDDMTLKIWSMKQEVCIHDLQAHNKEIYTIKWSPTGPATSNPNSNI 425 > P LLA S S D T+++W ++Q VC H L H + +Y++ +SP G > Sbjct 6955270 PGLQLLAFSHPPSASFDSTVRLWDVEQGVCTHTLMKHQEPVYSVAFSPDGK--------- 6955422 > > Query 426 MLASASFDSTVRLWDIE 442 > LAS SFD V +W+ + > Sbjct 6955423 YLASGSFDKYVHIWNTQ 6955473 > --------------- > > The method 'frac_identical' is not functioning right. > ------------- > Title : frac_identical > Usage : my $frac_id = $hsp->frac_identical( ['query'|'hit'|'total'] ); > Function: Returns the fraction of identitical positions for this HSP > Returns : Float in range 0.0 -> 1.0 > Args : 'query' = num identical / length of query seq (without gaps) > 'hit' = num identical / length of hit seq (without gaps) > 'total' = num identical / length of alignment (with gaps) > default = 'total' > --------------- > According to the method description, for the HSP above, 'frac_identical' should return '0.42' with 'hit'. But it doesnt. Now with 'hit' gives '0.13'. With 'total' gives normal result '0.35'. > > Thats all. > Cheers > > Dimitar > > -- > Dimitar Kenanov > Postdoctoral research fellow > Protein Sequence Analysis Group > Bioinformatics Institute > A*STAR, Singapore > tel: +65 6478 8514 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Fri May 21 17:44:26 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 10:44:26 -0700 Subject: [Bioperl-l] codon tables, finding ORFs Message-ID: <4BF6C67A.4040202@cornell.edu> Hi all, Right now, Bio::Tools::CodonTable uses as its 'standard' table the NCBI one, described at http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG1. This table recognizes three different start codons: the usual ATG, plus TTG and CTG (which I'd never heard of before looking there, seems they are rare). The issue is, if you use this codon scheme to find open reading frames in nucleotide sequences, you get some ORFs that I think a lot of biologists would be surprised at, from these two (rare?) start codons. Seems to me, this might be a problem. I mean, a naive user (which just about everyone is!) would expect the default codon table to only recognize the canonical ATG as a start, right? And would be rather displeased if BioPerl said (by default) that something starting with one of these rare codons was an open reading frame? So I guess my question is, do we think BioPerl (Bio::Tools::CodonTable) should really recognize these rare start codons by default? Rob From scott at scottcain.net Fri May 21 18:15:20 2010 From: scott at scottcain.net (Scott Cain) Date: Fri, 21 May 2010 14:15:20 -0400 Subject: [Bioperl-l] [Gmod-schema] Trying to load my first database In-Reply-To: References: Message-ID: Hi Daniel, I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. Of course, the file you sent me would be the same file you sent me yesterday; sorry for my poor memory :-) This file uncovered a bug in BioPerl in the FeatureIO module. While fixing the bug may be difficult, working around it might not be too bad. Additionally, I'm not sure we should fix it right now, as this is an effort underway to rework this section of BioPerl anyway. The good news is that the work around is fairly simple. In the GFF that MAKER created, when parsing prodigal output, it generates GFF lines like this: Contig125 pred_gff:prodigal_v2.00 match 104 1723 157.5 + . ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; The tricky part is this tag/value in the ninth column: type=ATG. The tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is in the third column, so when it is parsing this line of GFF, it tries to reassign the feature type to something that isn't valid. The work around is pretty easy: since "type" is a problematic tag, and it appears that the type tag here is defining the start type, I would suggest doing a global search and replace on the file to replace "type=" with "start_type=". I did that and the file loaded fine. I don't know if it is MAKER that creates this tag or the BioPerl parser for prodigal, but changing this at the source might be nice (of course, it might also break somebody else's code :-/ I'll enter a bug for this in the BioPerl bug tracker. Scott On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: > Hi Scott, > > I used Maker to generate the attached file. > > -Daniel > > On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >> Hi Daniel, >> >> Please keep the schema mailing list cc'ed in so the responses can be >> archived and more eyes than just mine can try to solve the problem. >> >> Can you send a sample of the GFF that is causing the problem? ?Any >> ontology term that is in Chado should be "legal." ?If there's >> something causing a problem, we need to figure out what it is. >> >> Scott >> >> >> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>> Hi Scott, >>> >>> I am using the same image as we used in class. ?I was able to load >>> each of the examples in the GMOD course (Pythium) and on the Chado >>> website (yeast). >>> >>> On another note, is there an easy way to navigate the ontology terms >>> that are legal and standard in both GFF3 and in Chado. ?I am having >>> trouble understanding how to convert from an arbitrary analysis (e.g. >>> Blasting KEGG) into a format that works. >>> >>> Thanks so much! >>> -Daniel >>> >>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>> Hi Daniel, >>>> >>>> That error message looks like one that would come from an older >>>> version of BioPerl. ?What version do you have? >>>> >>>> Scott >>>> >>>> >>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>> Hi Scott, >>>>> >>>>> Thanks for the reply. ?Sorry, I should have been able to track down >>>>> that error. ?Could you tell me what the following error means? >>>>> >>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>> --noexon --recreate_cache >>>>> (Re)creating the uniquename cache in the database... >>>>> Creating table... >>>>> Populating table... >>>>> Creating indexes... >>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>> Preparing data for inserting into the chado database >>>>> (This may take a while ...) >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>> with key type. If you were adding new keys in, perhaps you want to >>>>> make use >>>>> of the archetype method to allow registration to a more basic type >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>> ----------------------------------------------------------- >>>>> >>>>> Abnormal termination, trying to clean up... >>>>> >>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>> won't be needed)... >>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>> Exiting... >>>>> >>>>> >>>>> Thanks so much! >>>>> -Daniel >>>>> >>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>> Hi Daniel, >>>>>> >>>>>> The error message you got said that the GFF file that you are trying >>>>>> to load couldn't be found; are you sure the path was correct? ?The >>>>>> file itself looks OK. >>>>>> >>>>>> Scott >>>>>> >>>>>> >>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>> Hello All, >>>>>>> >>>>>>> I am trying to load my first genome from maker. ?Not sure what the >>>>>>> problem is... any help is awesome! ?I am attaching at least part of >>>>>>> the dataset. >>>>>>> >>>>>>> -Daniel >>>>>>> >>>>>>> >>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>> --noexon >>>>>>> >>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>> MSG: Could not open >>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>> such file or directory >>>>>>> STACK: Error::throw >>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>> ----------------------------------------------------------- >>>>>>> >>>>>>> Abnormal termination, trying to clean up... >>>>>>> >>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>> Exiting... >>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gmod-schema mailing list >>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ------------------------------------------------------------------------ >>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>> Ontario Institute for Cancer Research >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>> Ontario Institute for Cancer Research >>>> >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >> Ontario Institute for Cancer Research >> > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Fri May 21 18:45:01 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 21 May 2010 14:45:01 -0400 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6C67A.4040202@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> Message-ID: <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> Rob, The user will use translate(), which can do something like this: $prot_obj = $my_seq_object->translate(-orf => 1, -start => "atg" ); CodonTable does little more than hold the codon/aa data. All the useful work is done by translate(), and there are lots of options. Here is part of the documentation: Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! Brian O. On May 21, 2010, at 1:44 PM, Robert Buels wrote: > Hi all, > > Right now, Bio::Tools::CodonTable uses as its 'standard' table the NCBI one, described at http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG1. > > This table recognizes three different start codons: the usual ATG, plus TTG and CTG (which I'd never heard of before looking there, seems they are rare). > > The issue is, if you use this codon scheme to find open reading frames in nucleotide sequences, you get some ORFs that I think a lot of biologists would be surprised at, from these two (rare?) start codons. > > Seems to me, this might be a problem. I mean, a naive user (which just about everyone is!) would expect the default codon table to only recognize the canonical ATG as a start, right? And would be rather displeased if BioPerl said (by default) that something starting with one of these rare codons was an open reading frame? > > So I guess my question is, do we think BioPerl (Bio::Tools::CodonTable) should really recognize these rare start codons by default? > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From briano at bioteam.net Fri May 21 18:52:19 2010 From: briano at bioteam.net (Brian Osborne) Date: Fri, 21 May 2010 14:52:19 -0400 Subject: [Bioperl-l] What is CPAN doing? Message-ID: bioperl-l, Here's the POD for the translate() method: =head2 translate Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Or if you expect a complete coding sequence (CDS) translation, with inititator at the beginning and terminator at the end: $protein_seq_obj = $cds_seq_obj->translate(-complete => 1); Or if you want translate() to find the first initiation codon and return the corresponding protein: $protein_seq_obj = $cds_seq_obj->translate(-orf => 1); Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The complete CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translated protein object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! By default translate() translates termination codons to the some character (default is *), both internal and trailing codons. Setting "-complete" to 1 tells translate() to remove the trailing character. -offset is used for seqfeatures which contain the the \codon_start tag and can be set to 1, 2, or 3. This is the offset by which the sequence translation starts relative to the first base of the feature For details on codon tables used by translate() see L. Deprecated argument set (v. 1.5.1 and prior versions) where each argument is an element in an array: 1: character for terminator (optional), defaults to '*'. 2: character for unknown amino acid (optional), defaults to 'X'. 3: frame (optional), valid values are 0, 1, 2, defaults to 0. 4: codon table id (optional), defaults to 1. 5: complete coding sequence expected, defaults to 0 (false). 6: boolean, throw exception if not complete coding sequence (true), defaults to warning (false) 7: codontable, a custom Bio::Tools::CodonTable object (optional). =cut And here's what appears on CPAN: Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The full CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translation object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : character for terminator (optional) defaults to '*' character for unknown amino acid (optional) defaults to 'X' frame (optional) valid values 0, 1, 2, defaults to 0 codon table id (optional) defaults to 1 complete coding sequence expected, defaults to 0 (false) boolean, throw exception if not complete CDS (true) or defaults to warning (false) Most of the POD is missing - does anyone know why? Brian O. From barani at avesthagen.com Thu May 20 11:27:04 2010 From: barani at avesthagen.com (barani at avesthagen.com) Date: Thu, 20 May 2010 16:57:04 +0530 (IST) Subject: [Bioperl-l] Bio::Biblio find method proxy problem Message-ID: <49660.192.168.1.5.1274354824.squirrel@mail.avesthagen.com> Hi, Our lab is behind firewall. I am using FC10 Linux. I have set the httpproxy in /etc/bash_profile. I am searching for research articles using Bio::Biblio "find" method as shown in the following PERL code.This program executes well, when I run it in the command line. But when i use the same code in PERL CGI, it does not work.(Says "couldn't retrieve results from http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"). Is there anyway that I can set the proxy within the codes as argument and make it executable ? It will be very useful if you guys can help me. ##################################################### #!/usr/bin/perl use Bio::Biblio; use Bio::Biblio::IO; my $search="ABySS[title] AND (Simpson[Author]) AND 2009[dp]"; my $biblio = Bio::Biblio->new(-access=> 'eutils'); $biblio->find($search)->has_next; while(my $xml = $biblio->get_next){ my $io = Bio::Biblio::IO->new( -data => $xml, -format => 'medlinexml' ); my $article = $io->next_bibref(); >>>>>>>>>>>>>>> XML Parser >>>>>>>>>>>> <<<<<<<<<<<<<<< XML Parser <<<<<<<<<<<< } ############################################################### Best Regards barani ----------------------------------- Baranidharan P Project Head Bioinformatics - Genomics Group Avesthagen Ltd Ground floor, Innovator Building International Tech Park Bangalore Whitefield Bangalore - 560066 Ph. 09900727597 Mail Off .barani at avesthagen.com Per. baranidharanp at gmail.com ------------------------------------- From bbimber at gmail.com Fri May 21 13:58:03 2010 From: bbimber at gmail.com (Ben Bimber) Date: Fri, 21 May 2010 08:58:03 -0500 Subject: [Bioperl-l] CommandExts and arrays Message-ID: I am getting an error when trying to pass an array as a param with command exts. I hope there is something obvious i'm missing, but I cant seem to figure this out. I am trying to run the merge two BAM files using Bio::Tools::Run::Samtools using something like this: my $new_bam = Bio::Tools::Run::Samtools->new( -command => 'merge', -program_dir => '/usr/bin/samtools/', )->run( -obm => output_file.bam', -ibm => ['file1.bam', 'file2.bam'], ); When i use an array for the -ibm param, I get an error saying 'cannot use string 'file1' as an arrayref while strict refs in place'. The error comes from this code in CommandExts.pm, around line 989. adding 'no strict' right before the final line stops the error: # expand arrayrefs my $l = $#files; for (0..$l) { if (ref($files[$_]) eq 'ARRAY') { splice(@files, $_, 1, @{$files[$_]}); #error thrown from this line splice(@switches, $_, 1, ($switches[$_]) x @{$files[$_]}); } Thanks for the help. From daniel.quest at gmail.com Fri May 21 19:34:35 2010 From: daniel.quest at gmail.com (Daniel Quest) Date: Fri, 21 May 2010 12:34:35 -0700 Subject: [Bioperl-l] [Gmod-schema] Trying to load my first database In-Reply-To: References: Message-ID: Hey Scott, Thanks so much for the work on this. I have CC'ed Doug Hyatt, the developer of Prodigal so that he is aware of this problem. I am thinking that Maker just passed the Prodigal tags through and then the conflict happened on the Chado load. From my POV it is probably easiest to make small changes to the Prodigal GFF3 output to sync up with the Chado schema. Thanks so much -Daniel On Fri, May 21, 2010 at 11:15 AM, Scott Cain wrote: > Hi Daniel, > > I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. > > Of course, the file you sent me would be the same file you sent me > yesterday; sorry for my poor memory :-) > > This file uncovered a bug in BioPerl in the FeatureIO module. ?While > fixing the bug may be difficult, working around it might not be too > bad. ?Additionally, I'm not sure we should fix it right now, as this > is an effort underway to rework this section of BioPerl anyway. ?The > good news is that the work around is fairly simple. > > In the GFF that MAKER created, when parsing prodigal output, it > generates GFF lines like this: > > Contig125 ? ? ? pred_gff:prodigal_v2.00 match ? 104 ? ? 1723 ? ?157.5 > ?+ ? ? ? . > ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; > > The tricky part is this tag/value in the ninth column: type=ATG. ?The > tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is > in the third column, so when it is parsing this line of GFF, it tries > to reassign the feature type to something that isn't valid. ?The work > around is pretty easy: since "type" is a problematic tag, and it > appears that the type tag here is defining the start type, I would > suggest doing a global search and replace on the file to replace > "type=" with "start_type=". ?I did that and the file loaded fine. ?I > don't know if it is MAKER that creates this tag or the BioPerl parser > for prodigal, but changing this at the source might be nice (of > course, it might also break somebody else's code :-/ ?I'll enter a bug > for this in the BioPerl bug tracker. > > Scott > > > On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: >> Hi Scott, >> >> I used Maker to generate the attached file. >> >> -Daniel >> >> On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >>> Hi Daniel, >>> >>> Please keep the schema mailing list cc'ed in so the responses can be >>> archived and more eyes than just mine can try to solve the problem. >>> >>> Can you send a sample of the GFF that is causing the problem? ?Any >>> ontology term that is in Chado should be "legal." ?If there's >>> something causing a problem, we need to figure out what it is. >>> >>> Scott >>> >>> >>> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>>> Hi Scott, >>>> >>>> I am using the same image as we used in class. ?I was able to load >>>> each of the examples in the GMOD course (Pythium) and on the Chado >>>> website (yeast). >>>> >>>> On another note, is there an easy way to navigate the ontology terms >>>> that are legal and standard in both GFF3 and in Chado. ?I am having >>>> trouble understanding how to convert from an arbitrary analysis (e.g. >>>> Blasting KEGG) into a format that works. >>>> >>>> Thanks so much! >>>> -Daniel >>>> >>>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>>> Hi Daniel, >>>>> >>>>> That error message looks like one that would come from an older >>>>> version of BioPerl. ?What version do you have? >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>>> Hi Scott, >>>>>> >>>>>> Thanks for the reply. ?Sorry, I should have been able to track down >>>>>> that error. ?Could you tell me what the following error means? >>>>>> >>>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>>> --noexon --recreate_cache >>>>>> (Re)creating the uniquename cache in the database... >>>>>> Creating table... >>>>>> Populating table... >>>>>> Creating indexes... >>>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>>> Preparing data for inserting into the chado database >>>>>> (This may take a while ...) >>>>>> >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>>> with key type. If you were adding new keys in, perhaps you want to >>>>>> make use >>>>>> of the archetype method to allow registration to a more basic type >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Abnormal termination, trying to clean up... >>>>>> >>>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>>> won't be needed)... >>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>> Exiting... >>>>>> >>>>>> >>>>>> Thanks so much! >>>>>> -Daniel >>>>>> >>>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> The error message you got said that the GFF file that you are trying >>>>>>> to load couldn't be found; are you sure the path was correct? ?The >>>>>>> file itself looks OK. >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>>> Hello All, >>>>>>>> >>>>>>>> I am trying to load my first genome from maker. ?Not sure what the >>>>>>>> problem is... any help is awesome! ?I am attaching at least part of >>>>>>>> the dataset. >>>>>>>> >>>>>>>> -Daniel >>>>>>>> >>>>>>>> >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>>> --noexon >>>>>>>> >>>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>>> MSG: Could not open >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>>> such file or directory >>>>>>>> STACK: Error::throw >>>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>>> ----------------------------------------------------------- >>>>>>>> >>>>>>>> Abnormal termination, trying to clean up... >>>>>>>> >>>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>>> Exiting... >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gmod-schema mailing list >>>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>>>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 >>> Ontario Institute for Cancer Research >>> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > From rmb32 at cornell.edu Fri May 21 20:11:24 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 13:11:24 -0700 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> Message-ID: <4BF6E8EC.6050001@cornell.edu> Brian Osborne wrote: > The user will use translate(), which can do something like this: > > $prot_obj = $my_seq_object->translate(-orf => 1, > -start => "atg" ); Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. Rob From carson.holt at genetics.utah.edu Fri May 21 19:53:35 2010 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 21 May 2010 13:53:35 -0600 Subject: [Bioperl-l] [maker-devel] [Gmod-schema] Trying to load my first database In-Reply-To: Message-ID: That is correct. MAKER will just pass user defined GFF3 tags through rather than trying to make sense of them or trimming them off. Carson On 5/21/10 1:34 PM, "Daniel Quest" wrote: Hey Scott, Thanks so much for the work on this. I have CC'ed Doug Hyatt, the developer of Prodigal so that he is aware of this problem. I am thinking that Maker just passed the Prodigal tags through and then the conflict happened on the Chado load. From my POV it is probably easiest to make small changes to the Prodigal GFF3 output to sync up with the Chado schema. Thanks so much -Daniel On Fri, May 21, 2010 at 11:15 AM, Scott Cain wrote: > Hi Daniel, > > I'm cc'ing the MAKER and BioPerl lists, since this bug is germane to both lists. > > Of course, the file you sent me would be the same file you sent me > yesterday; sorry for my poor memory :-) > > This file uncovered a bug in BioPerl in the FeatureIO module. While > fixing the bug may be difficult, working around it might not be too > bad. Additionally, I'm not sure we should fix it right now, as this > is an effort underway to rework this section of BioPerl anyway. The > good news is that the work around is fairly simple. > > In the GFF that MAKER created, when parsing prodigal output, it > generates GFF lines like this: > > Contig125 pred_gff:prodigal_v2.00 match 104 1723 157.5 > + . > ID=Contig125:hit:75;Name=pred_gff_Prodigal_v2.00-Contig125-abinit-gene-0.0-mRNA-1;_AED=0.25;cscore=151.05;partial=00;rbs_motif=AGGA;rbs_spacer=5-10bp;rscore=3.57;score=157.5,157.53;sscore=6.48;tscore=3.50;type=ATG;uscore=-0.59; > > The tricky part is this tag/value in the ninth column: type=ATG. The > tag "type" is (semi) reserved in Bio::FeatureIO::gff to mean what is > in the third column, so when it is parsing this line of GFF, it tries > to reassign the feature type to something that isn't valid. The work > around is pretty easy: since "type" is a problematic tag, and it > appears that the type tag here is defining the start type, I would > suggest doing a global search and replace on the file to replace > "type=" with "start_type=". I did that and the file loaded fine. I > don't know if it is MAKER that creates this tag or the BioPerl parser > for prodigal, but changing this at the source might be nice (of > course, it might also break somebody else's code :-/ I'll enter a bug > for this in the BioPerl bug tracker. > > Scott > > > On Fri, May 21, 2010 at 1:40 PM, Daniel Quest wrote: >> Hi Scott, >> >> I used Maker to generate the attached file. >> >> -Daniel >> >> On Fri, May 21, 2010 at 10:34 AM, Scott Cain wrote: >>> Hi Daniel, >>> >>> Please keep the schema mailing list cc'ed in so the responses can be >>> archived and more eyes than just mine can try to solve the problem. >>> >>> Can you send a sample of the GFF that is causing the problem? Any >>> ontology term that is in Chado should be "legal." If there's >>> something causing a problem, we need to figure out what it is. >>> >>> Scott >>> >>> >>> On Fri, May 21, 2010 at 1:24 PM, Daniel Quest wrote: >>>> Hi Scott, >>>> >>>> I am using the same image as we used in class. I was able to load >>>> each of the examples in the GMOD course (Pythium) and on the Chado >>>> website (yeast). >>>> >>>> On another note, is there an easy way to navigate the ontology terms >>>> that are legal and standard in both GFF3 and in Chado. I am having >>>> trouble understanding how to convert from an arbitrary analysis (e.g. >>>> Blasting KEGG) into a format that works. >>>> >>>> Thanks so much! >>>> -Daniel >>>> >>>> On Fri, May 21, 2010 at 9:41 AM, Scott Cain wrote: >>>>> Hi Daniel, >>>>> >>>>> That error message looks like one that would come from an older >>>>> version of BioPerl. What version do you have? >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Fri, May 21, 2010 at 11:51 AM, Daniel Quest wrote: >>>>>> Hi Scott, >>>>>> >>>>>> Thanks for the reply. Sorry, I should have been able to track down >>>>>> that error. Could you tell me what the following error means? >>>>>> >>>>>> gmod at ubuntu:~/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125$ >>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>> /home/gmod/Cthe/ProdigalONLYcthe.maker.output/cthe_datastore/Contig125/Contig125.gff >>>>>> --noexon --recreate_cache >>>>>> (Re)creating the uniquename cache in the database... >>>>>> Creating table... >>>>>> Populating table... >>>>>> Creating indexes... >>>>>> Adjusting the primary key sequences (if necessary)...Done. >>>>>> Preparing data for inserting into the chado database >>>>>> (This may take a while ...) >>>>>> >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: Object Bio::Annotation::SimpleValue=HASH(0xa858ac8) was not valid >>>>>> with key type. If you were adding new keys in, perhaps you want to >>>>>> make use >>>>>> of the archetype method to allow registration to a more basic type >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>> STACK: Bio::Annotation::Collection::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/Annotation/Collection.pm:361 >>>>>> STACK: Bio::SeqFeature::Annotated::add_Annotation >>>>>> /usr/local/share/perl/5.10.0/Bio/SeqFeature/Annotated.pm:609 >>>>>> STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:797 >>>>>> STACK: Bio::FeatureIO::gff::_handle_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:752 >>>>>> STACK: Bio::FeatureIO::gff::next_feature >>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:172 >>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:775 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> Abnormal termination, trying to clean up... >>>>>> >>>>>> Attempting to clean up the loader temp table (so that --recreate_cache >>>>>> won't be needed)... >>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>> Exiting... >>>>>> >>>>>> >>>>>> Thanks so much! >>>>>> -Daniel >>>>>> >>>>>> On Thu, May 20, 2010 at 6:20 PM, Scott Cain wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> The error message you got said that the GFF file that you are trying >>>>>>> to load couldn't be found; are you sure the path was correct? The >>>>>>> file itself looks OK. >>>>>>> >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Thu, May 20, 2010 at 5:06 PM, Daniel Quest wrote: >>>>>>>> Hello All, >>>>>>>> >>>>>>>> I am trying to load my first genome from maker. Not sure what the >>>>>>>> problem is... any help is awesome! I am attaching at least part of >>>>>>>> the dataset. >>>>>>>> >>>>>>>> -Daniel >>>>>>>> >>>>>>>> >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> gmod_bulk_load_gff3.pl --organism Cthe -a -g >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3 >>>>>>>> --noexon >>>>>>>> >>>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>>> MSG: Could not open >>>>>>>> /home/gmod/Cthe/cthe.maker.output/cthe_datastore/Contig125.gff3: No >>>>>>>> such file or directory >>>>>>>> STACK: Error::throw >>>>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:368 >>>>>>>> STACK: Bio::Root::IO::_initialize_io >>>>>>>> /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:341 >>>>>>>> STACK: Bio::FeatureIO::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:353 >>>>>>>> STACK: Bio::FeatureIO::gff::_initialize >>>>>>>> /usr/local/share/perl/5.10.0/Bio/FeatureIO/gff.pm:102 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:276 >>>>>>>> STACK: Bio::FeatureIO::new /usr/local/share/perl/5.10.0/Bio/FeatureIO.pm:296 >>>>>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:720 >>>>>>>> ----------------------------------------------------------- >>>>>>>> >>>>>>>> Abnormal termination, trying to clean up... >>>>>>>> >>>>>>>> Trying to remove the run lock (so that --remove_lock won't be needed)... >>>>>>>> Exiting... >>>>>>>> gmod at ubuntu:~/Cthe/cthe.maker.output/cthe_datastore/Contig125$ >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gmod-schema mailing list >>>>>>>> Gmod-schema at lists.sourceforge.net >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------------------------------------------------------ >>>>>>> Scott Cain, Ph. D. scott at scottcain dot net >>>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>>>> Ontario Institute for Cancer Research >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------------ >>>>> Scott Cain, Ph. D. scott at scottcain dot net >>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>>>> Ontario Institute for Cancer Research >>>>> >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >>> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Fri May 21 20:44:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 15:44:18 -0500 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6E8EC.6050001@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> Message-ID: <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> On May 21, 2010, at 3:11 PM, Robert Buels wrote: > Brian Osborne wrote: >> The user will use translate(), which can do something like this: >> $prot_obj = $my_seq_object->translate(-orf => 1, >> -start => "atg" ); > > Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. > > Rob Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. chris From rmb32 at cornell.edu Fri May 21 20:48:20 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 13:48:20 -0700 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> Message-ID: <4BF6F194.3080209@cornell.edu> Chris Fields wrote: > Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. > > chris Oh they're available, CodonTable has a number of tables in it that you make translate() use optionally, and there are bacterial tables in there (but they are not well documented). The default behavior is the 'NCBI standard' (eukaryotic) table that I linked to in the original post on this thread. What I am looking for is a discussion of what the best default behavior of $seq->translate( -orf => 1 ) with no arguments should be. But also, there should be better documentation about the codon tables that are available, I can add that in my topic/longest_orf branch. Rob From cjfields at illinois.edu Fri May 21 20:52:15 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 15:52:15 -0500 Subject: [Bioperl-l] codon tables, finding ORFs In-Reply-To: <4BF6F194.3080209@cornell.edu> References: <4BF6C67A.4040202@cornell.edu> <7ECEEE64-8DDF-4C76-B16C-B37DB1218AAE@verizon.net> <4BF6E8EC.6050001@cornell.edu> <7F51C8B0-1C8C-409B-B0AC-10784DA16420@illinois.edu> <4BF6F194.3080209@cornell.edu> Message-ID: <06B1B1F1-979F-461C-BC9B-57A79C26CCE7@illinois.edu> On May 21, 2010, at 3:48 PM, Robert Buels wrote: > Chris Fields wrote: > > Maybe it shouldn't be all by default, but I have personally worked with two (bacterial) genes that had alternative start codons (TTG, GTG), so they should be made available in some way. > > > > chris > > Oh they're available, CodonTable has a number of tables in it that you make translate() use optionally, and there are bacterial tables in there (but they are not well documented). The default behavior is the 'NCBI standard' (eukaryotic) table that I linked to in the original post on this thread. > > What I am looking for is a discussion of what the best default behavior of $seq->translate( -orf => 1 ) with no arguments should be. Probably the simplest, with documentation on how to change it when needed. > But also, there should be better documentation about the codon tables that are available, I can add that in my topic/longest_orf branch. > > Rob Agreed. More docs never hurt. chris From bosborne11 at verizon.net Fri May 21 20:32:30 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 21 May 2010 16:32:30 -0400 Subject: [Bioperl-l] codon tables, finding ORFs Message-ID: Rob, translate() is one of these methods where reading the documentation is required. Or to put it another way, if you tried to use it without reading the docs most of the time you'd get a result that differs from what you wanted, given the variety of ways to use it, quite apart from the issue of the 3 initiation codons. So really, you have to read the docs, and they say: By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! My concern right now is that CPAN has removed this text and more! If you wanted to add an additional codon table and make it a default I have no problem with that. But, the "naive user" who doesn't read the documentation is probably still going to get "surprising" results. I don't think there's any way around RTFM for this method, changing the default table does not change this. Brian O. On May 21, 2010, at 4:11 PM, Robert Buels wrote: > Brian Osborne wrote: >> The user will use translate(), which can do something like this: >> $prot_obj = $my_seq_object->translate(-orf => 1, >> -start => "atg" ); > > Yes, translate() has some non-default options for overriding this behavior. This doesn't address the question of whether including these other start codons is a good default. > > Rob From rmb32 at cornell.edu Fri May 21 21:53:34 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 14:53:34 -0700 Subject: [Bioperl-l] POD rendering question/problem (was [Fwd: What is CPAN doing?]) Message-ID: <4BF700DE.8040804@cornell.edu> Hi search.cpan.org maintainers, For one of the methods in BioPerl, a good portion of the POD that's in the source [1] isn't being rendered into HTML on its search.cpan.org page [2]. We'd like to get this POD displaying properly, either by us (BioPerl) tweaking the POD on our end, or by you guys tweaking whatever process is making the HTML. So: do we need to tweak our POD to get it displaying properly? If so, what needs to change in that POD? Rob [1] The source and POD in question: http://search.cpan.org/src/CJFIELDS/BioPerl-1.6.1/Bio/PrimarySeqI.pm [2] The HTML in question: http://search.cpan.org/~cjfields/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm#translate -------- Original Message -------- Subject: [Bioperl-l] What is CPAN doing? Date: Fri, 21 May 2010 14:52:19 -0400 From: Brian Osborne To: BioPerl List bioperl-l, Here's the POD for the translate() method: =head2 translate Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Or if you expect a complete coding sequence (CDS) translation, with inititator at the beginning and terminator at the end: $protein_seq_obj = $cds_seq_obj->translate(-complete => 1); Or if you want translate() to find the first initiation codon and return the corresponding protein: $protein_seq_obj = $cds_seq_obj->translate(-orf => 1); Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The complete CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translated protein object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 Notes : The -start argument only applies when -orf is set to 1. By default all initiation codons found in the given codon table are used but when "start" is set to some codon this codon will be used exclusively as the initiation codon. Note that the default codon table (NCBI "Standard") has 3 initiation codons! By default translate() translates termination codons to the some character (default is *), both internal and trailing codons. Setting "-complete" to 1 tells translate() to remove the trailing character. -offset is used for seqfeatures which contain the the \codon_start tag and can be set to 1, 2, or 3. This is the offset by which the sequence translation starts relative to the first base of the feature For details on codon tables used by translate() see L. Deprecated argument set (v. 1.5.1 and prior versions) where each argument is an element in an array: 1: character for terminator (optional), defaults to '*'. 2: character for unknown amino acid (optional), defaults to 'X'. 3: frame (optional), valid values are 0, 1, 2, defaults to 0. 4: codon table id (optional), defaults to 1. 5: complete coding sequence expected, defaults to 0 (false). 6: boolean, throw exception if not complete coding sequence (true), defaults to warning (false) 7: codontable, a custom Bio::Tools::CodonTable object (optional). =cut And here's what appears on CPAN: Title : translate Usage : $protein_seq_obj = $dna_seq_obj->translate Function: Provides the translation of the DNA sequence using full IUPAC ambiguities in DNA/RNA and amino acid codes. The full CDS translation is identical to EMBL/TREMBL database translation. Note that the trailing terminator character is removed before returning the translation object. Note: if you set $dna_seq_obj->verbose(1) you will get a warning if the first codon is not a valid initiator. Returns : A Bio::PrimarySeqI implementing object Args : character for terminator (optional) defaults to '*' character for unknown amino acid (optional) defaults to 'X' frame (optional) valid values 0, 1, 2, defaults to 0 codon table id (optional) defaults to 1 complete coding sequence expected, defaults to 0 (false) boolean, throw exception if not complete CDS (true) or defaults to warning (false) Most of the POD is missing - does anyone know why? Brian O. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Fri May 21 21:56:37 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 21 May 2010 23:56:37 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser Message-ID: <1274478997.1997.4.camel@gonzo.home.kblin.org> Hi list, hi Thomas, I've just bumped into the fact that bioperl-live still doesn't seem to support the hmmer3 hmmscan output format (thanks for the help at #bioperl). The nice folks on IRC pointed me at an email from Thomas Sharpton, noting that he was already working on a parser for this. So I thought I'd ask about the status of that before I run off writing my own. Is there anything I can help with? Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From rmb32 at cornell.edu Fri May 21 22:32:20 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 May 2010 15:32:20 -0700 Subject: [Bioperl-l] [perl #75252] AutoReply: POD rendering question/problem (was [Fwd: What is CPAN doing?]) In-Reply-To: References: <4BF700DE.8040804@cornell.edu> Message-ID: <4BF709F4.4030705@cornell.edu> Doing a little more investigation, the culprit seems to actually be a stray old (non-installed) version of the module in our uploaded dist. No action required on your part, unless there is a tweak to the indexing that would have not made this module be the top hit. Status: resolved Rob From cjfields at illinois.edu Fri May 21 23:22:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 May 2010 18:22:41 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274478997.1997.4.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> Message-ID: <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. Relevant commit msg here: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl =========================================== dev.open-bio.org - Authorized Access Only =========================================== ... bioperl-hmmer3/ ... perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 =========================================== dev.open-bio.org - Authorized Access Only =========================================== perllib cjfields$ chris On May 21, 2010, at 4:56 PM, Kai Blin wrote: > Hi list, hi Thomas, > > I've just bumped into the fact that bioperl-live still doesn't seem to > support the hmmer3 hmmscan output format (thanks for the help at > #bioperl). The nice folks on IRC pointed me at an email from Thomas > Sharpton, noting that he was already working on a parser for this. So I > thought I'd ask about the status of that before I run off writing my > own. Is there anything I can help with? > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin > Abteilung Mikrobiologie/Biotechnologie > Eberhard-Karls-Universit?t T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Mon May 24 10:19:55 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 12:19:55 +0200 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: Hi Ben, This looks like it might be a bug. When I ask for the filespec for the 'merge' command: my @filespec = $new_bam->filespec; print join "\n", @filespec, "\n"; I get: obm *ibm (note the leading '*'). Could you please submit this as a bug? http://www.bioperl.org/wiki/Bugs Thanks, Dave From David.Messina at sbc.su.se Mon May 24 13:00:56 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 15:00:56 +0200 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: <8565_1274696770_ZZg0Z3D5iEeCi.00_C34B77C6-2A3E-4B97-83C2-9BE8679CA331@sbc.su.se> Message-ID: > ok, i put in that bug. Thanks. > why exactly does having the asterisk indicate > this is a bug? i thought the asterisk indicated that multiple values > were allowed for that argument? Ah okay, my ignorance of this module is showing. :) > on a related note, are we supposed to be able to pass file names that > have spaces to command exts? on the few cases where this came up, i > have never seemed to get this to work right, so i just got rid of the > spaces. Sorry, I don't know. Paging Mark Jensen ? have you got a moment to look into this? Dave From diment at gmail.com Sat May 22 08:25:55 2010 From: diment at gmail.com (Kieren Diment) Date: Sat, 22 May 2010 18:25:55 +1000 Subject: [Bioperl-l] OT: The Perl Survey Message-ID: <63B7289C-E218-4BBB-A5A4-33AFECA4C867@gmail.com> Hi, Sorry about the off topic posting, but I'm trying to get as large a sample of programmers that use Perl as possible. The Perl Foundation have funded The Perl Survey, 2010 which is ready for people to complete at http://survey.perlfoundation.org. If you could spend a little time to complete the survey, we would be most grateful. It should take around 10-15 minutes to complete. The official announcement is at: http://news.perlfoundation.org/2010/05/grant-update-the-perl-survey-1.html Thanks in advance Kieren Diment From parametres-personnels at hotmail.fr Sun May 23 15:57:14 2010 From: parametres-personnels at hotmail.fr (NamNAme) Date: Sun, 23 May 2010 08:57:14 -0700 (PDT) Subject: [Bioperl-l] Pfam database Message-ID: <28650160.post@talk.nabble.com> Dear all, A few weeks ago I wrote a program that need the pfam database, and I tested it on the first version of pfam where each protein family sequences are in one file. But now I would like to test it on the last version of pfam but the organization changed. I've found a file called Pfam-A.fasta which contains sequences and the family they belong to. But the sequences inside are not complete. So, I've two questions : Why these sequences are not complete ? And, How can I find a file with complete sequences and the family they belong to ? Thank you for your help. Bye. P-S : There is the file pfamseq, I tried to make a script to read it and then retreive the database structure i want but, this file is enourmous and use too much memory so it crashed. -- View this message in context: http://old.nabble.com/Pfam-database-tp28650160p28650160.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From staffa at niehs.nih.gov Mon May 24 14:32:26 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Mon, 24 May 2010 10:32:26 -0400 Subject: [Bioperl-l] Restriction Enzymes Message-ID: So, back in 2007 I wrote a script using use Bio::Tools::RestrictionEnzyme; and generated some useful restriction maps for a client. This year he comes back to me with some very new enzymes that RestrictionEnzyme did not recognize. I erroneously thought that I needed an update of BioPerl, which I requested of SysAdmin. They did this across the board, there is no going back. (I did learn about the NEB file that needed to be installed) Now it appears that I must re-write my scripts because RestrictionEnzyme is not known to the latest version of bioperl. Is this true? How hard would it be to keep things backward compatible. Have I missed something here? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Enterprise-Wide Information Technology Support Contract National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From David.Messina at sbc.su.se Mon May 24 15:55:45 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 24 May 2010 17:55:45 +0200 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <4046E576-2109-45BB-969C-F0B6F5749957@sbc.su.se> Hi Nick, Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that module, see also: http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > How hard would it be to keep things backward compatible. > Have I missed something here? I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones are intended to be at least partially backwards compatible. Dave From cjfields at illinois.edu Mon May 24 15:58:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 10:58:11 -0500 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > So, back in 2007 I wrote a script using > > use Bio::Tools::RestrictionEnzyme; > > and generated some useful restriction maps for a client. > > This year he comes back to me with some very new enzymes > that RestrictionEnzyme did not recognize. I erroneously thought that I > needed an update of BioPerl, which I requested of SysAdmin. > They did this across the board, there is no going back. > (I did learn about the NEB file that needed to be installed) > > Now it appears that I must re-write my scripts because RestrictionEnzyme is > not known to the latest version of bioperl. Is this true? > How hard would it be to keep things backward compatible. > Have I missed something here? Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, with removal at 1.6 (an announcement was made to the list regarding this, with no respondents, prior to the 1.6.0 release). The live version of the DEPRECATED docs are here: http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED If I understand correctly, the main reason was most development was put into Bio::Restriction modules, with very little change occurring in Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme and call it via a 'use lib' directive (or local::lib) or package it with your script, it should still work. However, from my perspective, if the older module wasn't recognizing specific enzyme cut sites, and the supported one did, wouldn't it be easier to modify your script to use the newer supported one instead? If the supported Bio::Restriction modules don't recognize the new sites I would consider that a bug. > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Enterprise-Wide Information Technology Support Contract > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina chris From maj at fortinbras.us Mon May 24 16:21:03 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 May 2010 12:21:03 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <13392E899AB04A0E8F66336CDBE417BE@NewLife> The rewrite this summer of Bio::Restriction made several funky enzyme (non-pal, non-symmetric) types workable. I would think it wouldn't be too onerous to convert code to the new system and have it work rather quickly- MAJ ----- Original Message ----- From: "Chris Fields" To: "Staffa, Nick (NIH/NIEHS) [C]" Cc: "Bioperl-l" Sent: Monday, May 24, 2010 11:58 AM Subject: Re: [Bioperl-l] Restriction Enzymes > On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > >> So, back in 2007 I wrote a script using >> >> use Bio::Tools::RestrictionEnzyme; >> >> and generated some useful restriction maps for a client. >> >> This year he comes back to me with some very new enzymes >> that RestrictionEnzyme did not recognize. I erroneously thought that I >> needed an update of BioPerl, which I requested of SysAdmin. >> They did this across the board, there is no going back. >> (I did learn about the NEB file that needed to be installed) >> >> Now it appears that I must re-write my scripts because RestrictionEnzyme is >> not known to the latest version of bioperl. Is this true? >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, > with removal at 1.6 (an announcement was made to the list regarding this, with > no respondents, prior to the 1.6.0 release). The live version of the > DEPRECATED docs are here: > > http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED > > If I understand correctly, the main reason was most development was put into > Bio::Restriction modules, with very little change occurring in > Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older > BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme > and call it via a 'use lib' directive (or local::lib) or package it with your > script, it should still work. > > However, from my perspective, if the older module wasn't recognizing specific > enzyme cut sites, and the supported one did, wouldn't it be easier to modify > your script to use the newer supported one instead? If the supported > Bio::Restriction modules don't recognize the new sites I would consider that a > bug. > >> Nick Staffa >> Telephone: 919-316-4569 (NIEHS: 6-4569) >> Scientific Computing Support Group >> NIEHS Enterprise-Wide Information Technology Support Contract >> National Institute of Environmental Health Sciences >> National Institutes of Health >> Research Triangle Park, North Carolina > > > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Mon May 24 16:54:29 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 24 May 2010 09:54:29 -0700 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] Message-ID: <4BFAAF45.4090400@cornell.edu> -------- Original Message -------- Subject: Re: [perl #75252] POD rendering question/problem (was [Fwd: [Bioperl-l] What is CPAN doing?]) Date: Mon, 24 May 2010 08:33:35 -0700 From: Graham Barr via RT Reply-To: search-rt at cpan.org To: rmb32 at cornell.edu References: <4BF700DE.8040804 at cornell.edu> <3F316B7B-DBCC-4668-94E4-45471ED5ACBB at pobox.com> On May 21, 2010, at 4:54 PM, Robert Buels via RT wrote: > > [1] The source and POD in question: > http://search.cpan.org/src/CJFIELDS/BioPerl-1.6.1/Bio/PrimarySeqI.pm > > [2] The HTML in question: > http://search.cpan.org/~cjfields/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm#translate that HTML is not for the above POD, it is located at http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/PrimarySeqI.pm the issue seems to be that when displaying the POD from the examples directory the source link is linking to the real module the html shown in [2] is representative of http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.1/examples/root/lib/Bio/PrimarySeqI.pm IMO it is confusing to include 2 different copies of the same module. I would suggest adding to META.yml no_index: dir: - examples/root/lib Graham. From staffa at niehs.nih.gov Mon May 24 18:32:54 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Mon, 24 May 2010 14:32:54 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: <13392E899AB04A0E8F66336CDBE417BE@NewLife> Message-ID: Thanks, all. On 5/24/10 12:21 PM, "Mark A. Jensen" wrote: The rewrite this summer of Bio::Restriction made several funky enzyme (non-pal, non-symmetric) types workable. I would think it wouldn't be too onerous to convert code to the new system and have it work rather quickly- MAJ ----- Original Message ----- From: "Chris Fields" To: "Staffa, Nick (NIH/NIEHS) [C]" Cc: "Bioperl-l" Sent: Monday, May 24, 2010 11:58 AM Subject: Re: [Bioperl-l] Restriction Enzymes > On May 24, 2010, at 9:32 AM, Staffa, Nick (NIH/NIEHS) [C] wrote: > >> So, back in 2007 I wrote a script using >> >> use Bio::Tools::RestrictionEnzyme; >> >> and generated some useful restriction maps for a client. >> >> This year he comes back to me with some very new enzymes >> that RestrictionEnzyme did not recognize. I erroneously thought that I >> needed an update of BioPerl, which I requested of SysAdmin. >> They did this across the board, there is no going back. >> (I did learn about the NEB file that needed to be installed) >> >> Now it appears that I must re-write my scripts because RestrictionEnzyme is >> not known to the latest version of bioperl. Is this true? >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > Bio::Tools::RestrictionEnyzme was deprecated quite a while ago,around v. 1.5, > with removal at 1.6 (an announcement was made to the list regarding this, with > no respondents, prior to the 1.6.0 release). The live version of the > DEPRECATED docs are here: > > http://github.com/bioperl/bioperl-live/blob/master/DEPRECATED > > If I understand correctly, the main reason was most development was put into > Bio::Restriction modules, with very little change occurring in > Bio::Tools::RestrictionEnzyme. We did similar changes with some of the older > BLAST parsers (BPLite). You could just download Bio::Tools::RestrictionEnyzme > and call it via a 'use lib' directive (or local::lib) or package it with your > script, it should still work. > > However, from my perspective, if the older module wasn't recognizing specific > enzyme cut sites, and the supported one did, wouldn't it be easier to modify > your script to use the newer supported one instead? If the supported > Bio::Restriction modules don't recognize the new sites I would consider that a > bug. > >> Nick Staffa >> Telephone: 919-316-4569 (NIEHS: 6-4569) >> Scientific Computing Support Group >> NIEHS Enterprise-Wide Information Technology Support Contract >> National Institute of Environmental Health Sciences >> National Institutes of Health >> Research Triangle Park, North Carolina > > > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bbimber at gmail.com Mon May 24 19:43:07 2010 From: bbimber at gmail.com (Ben Bimber) Date: Mon, 24 May 2010 14:43:07 -0500 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: <1274729912.4373.19.camel@epistle> References: <1274729912.4373.19.camel@epistle> Message-ID: as long as the limitation is known, i dont see it as a big problem. On Mon, May 24, 2010 at 2:38 PM, Dan Kortschak wrote: > Hi Dave, > > You are right, spaces are not allowed - they are actively stripped from > filenames (the other option would be to escape or otherwise quote them - > the is certainly doable, is there enough of a call to do this?). > > You can use last_execution() to see what was attempted to be run, this > should show the filenames (and everything else) that were used in the > IPC call. > > cheers > Dan > > On Mon, 2010-05-24 at 12:00 -0400, Dave Messina wrote: >> Message: 2 >> Date: Mon, 24 May 2010 15:00:56 +0200 >> From: Dave Messina >> Subject: Re: [Bioperl-l] CommandExts and arrays >> To: Ben Bimber >> Message-ID: >> Content-Type: text/plain; charset=windows-1252 >> >> > ok, i put in that bug. >> >> Thanks. >> >> >> > why exactly does having the asterisk indicate >> > this is a bug? ?i thought the asterisk indicated that multiple >> values >> > were allowed for that argument? >> >> Ah okay, my ignorance of this module is showing. :) >> >> >> > on a related note, are we supposed to be able to pass file names >> that >> > have spaces to command exts? ?on the few cases where this came up, i >> > have never seemed to get this to work right, so i just got rid of >> the >> > spaces. >> >> Sorry, I don't know. >> >> >> Paging Mark Jensen ? have you got a moment to look into this? >> >> >> Dave > > From David.Messina at sbc.su.se Mon May 24 22:03:19 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 00:03:19 +0200 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: <4BFAAF45.4090400@cornell.edu> References: <4BFAAF45.4090400@cornell.edu> Message-ID: From: Graham Barr via RT > IMO it is confusing to include 2 different copies of the same module. I agree. It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. Dave From dan.kortschak at adelaide.edu.au Mon May 24 19:38:32 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 25 May 2010 05:08:32 +0930 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: <1274729912.4373.19.camel@epistle> Hi Dave, You are right, spaces are not allowed - they are actively stripped from filenames (the other option would be to escape or otherwise quote them - the is certainly doable, is there enough of a call to do this?). You can use last_execution() to see what was attempted to be run, this should show the filenames (and everything else) that were used in the IPC call. cheers Dan On Mon, 2010-05-24 at 12:00 -0400, Dave Messina wrote: > Message: 2 > Date: Mon, 24 May 2010 15:00:56 +0200 > From: Dave Messina > Subject: Re: [Bioperl-l] CommandExts and arrays > To: Ben Bimber > Message-ID: > Content-Type: text/plain; charset=windows-1252 > > > ok, i put in that bug. > > Thanks. > > > > why exactly does having the asterisk indicate > > this is a bug? i thought the asterisk indicated that multiple > values > > were allowed for that argument? > > Ah okay, my ignorance of this module is showing. :) > > > > on a related note, are we supposed to be able to pass file names > that > > have spaces to command exts? on the few cases where this came up, i > > have never seemed to get this to work right, so i just got rid of > the > > spaces. > > Sorry, I don't know. > > > Paging Mark Jensen ? have you got a moment to look into this? > > > Dave From Russell.Smithies at agresearch.co.nz Mon May 24 22:01:25 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 25 May 2010 10:01:25 +1200 Subject: [Bioperl-l] taxonomy nightmare Message-ID: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> We've upgraded BioPerl recently and now lots of stuff appears broken though I'm sure it's not as bad as it looks. Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm deluged with errors. AFAIK, there were no changes to Perl 5.8.8 Any help greatly appreciated!!! Thanx, Russell Smithies ----------------------------------- #! /usr/local/bin/perl use strict; use warnings; use Bio::DB::Taxonomy; use Data::Dumper; my $idx_dir = '/data/home/smithiesr/taxonomy'; my $TAXDIR = "/data/home/smithiesr/taxdump"; my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -nodesfile => $nodefile, -namesfile => $namesfile, -directory => $idx_dir, -force => 1) or die $!; my $human = $db->get_taxon(-name => 'Homo sapiens'); print Dumper $human; ----------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references are not implemented in the version of perl at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. Compilation failed in require at (eval 21) line 3. ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 STACK: Bio::DB::Taxonomy::_load_tax_module /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 STACK: Bio::DB::Taxonomy::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 STACK: taxonomyTest.pl:15 ----------------------------------------------------------- ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Tue May 25 02:17:57 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:17:57 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: On May 24, 2010, at 7:46 PM, Thomas Sharpton wrote: > Hi all, > > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and hmmsearch output. It appears to be fully functional and I have had a handful of users test and integrate this module. > > We decided to push this module into a standalone svn repo (bioperl-hmmer3). I am a bit confused about why the repo is empty, as I committed the code back in March and have made a few updates since then to correct bugs identified by test users. Perhaps I screwed something up during the last commit. The commit doesn't show any added files. The original code apparently is on a branch of bioperl-dev, though (think this was pointed out on IRC): http://github.com/bioperl/bioperl-dev/tree/bioperl-hmmer3 Maybe that was the mixup? > Chris, should I just add the code to the github repo? I might need a pointer on how to do this without screwing it up. I started up a new github repo for it. You would just need to let me know your github ID so I can add you to it. Then (after you are added) the instructions are here: http://github.com/bioperl/bioperl-hmmer3 > Kai, I can mail an archive of the parser your way if you're in a hurry. With some assistance from Chris et. al., I expect the code to be in the github repo by the day's end. > > Apologies for any confusion and the delayed reply - I've been on the road. > > Best, > Tom No problem. Thanks for letting us know. chris > >> On May 21, 2010 4:24 PM, "Chris Fields" wrote: >> >> To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. >> >> Relevant commit msg here: >> >> http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html >> >> perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl >> =========================================== >> dev.open-bio.org - Authorized Access Only >> =========================================== >> ... >> bioperl-hmmer3/ >> ... >> perllib cjfields$ svn ls svn+ssh://dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 >> =========================================== >> dev.open-bio.org - Authorized Access Only >> =========================================== >> perllib cjfields$ >> >> chris >> >> On May 21, 2010, at 4:56 PM, Kai Blin wrote: >> >> > Hi list, hi Thomas, >> > >> > I've just bumped into the ... >> > From cjfields at illinois.edu Tue May 25 02:20:38 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:20:38 -0500 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: On May 24, 2010, at 5:03 PM, Dave Messina wrote: > From: Graham Barr via RT >> IMO it is confusing to include 2 different copies of the same module. > > I agree. > > It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). > > In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). > > I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. > > So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. > > Dave I agree. We should either prevent indexing or remove it, unless someone can suggest it's utility. chris From thomas.sharpton at gmail.com Tue May 25 00:46:04 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Mon, 24 May 2010 17:46:04 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: Hi all, To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and hmmsearch output. It appears to be fully functional and I have had a handful of users test and integrate this module. We decided to push this module into a standalone svn repo (bioperl-hmmer3). I am a bit confused about why the repo is empty, as I committed the code back in March and have made a few updates since then to correct bugs identified by test users. Perhaps I screwed something up during the last commit. Chris, should I just add the code to the github repo? I might need a pointer on how to do this without screwing it up. Kai, I can mail an archive of the parser your way if you're in a hurry. With some assistance from Chris et. al., I expect the code to be in the github repo by the day's end. Apologies for any confusion and the delayed reply - I've been on the road. Best, Tom On May 21, 2010 4:24 PM, "Chris Fields" wrote: To add to this, it appears there was an attempt to commit this code to SVN just prior to the github migration, but nothing is present in the svn repo (which is still reachable, but is read-only). Empty repos were not migrated over. Thomas, let me know if you need addition to the github repo, would be easy enough to add this in when ready. Relevant commit msg here: http://lists.open-bio.org/pipermail/bioperl-guts-l/2010-May/031172.html perllib cjfields$ svn ls svn+ssh:// dev.open-bio.org/home/svn-repositories/bioperl =========================================== dev.open-bio.org - Authorized Access Only =========================================== ... bioperl-hmmer3/ ... perllib cjfields$ svn ls svn+ssh:// dev.open-bio.org/home/svn-repositories/bioperl/bioperl-hmmer3 =========================================== dev.open-bio.org - Authorized Access Only =========================================== perllib cjfields$ chris On May 21, 2010, at 4:56 PM, Kai Blin wrote: > Hi list, hi Thomas, > > I've just bumped into the ... From Russell.Smithies at agresearch.co.nz Tue May 25 02:25:41 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 25 May 2010 14:25:41 +1200 Subject: [Bioperl-l] taxonomy nightmare In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32D88D065AA@exchsth.agresearch.co.nz> Fixed I think, some file permissions got screwed somewhere ;-( --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > Sent: Tuesday, 25 May 2010 10:01 a.m. > To: 'bioperl-l' > Subject: [Bioperl-l] taxonomy nightmare > > We've upgraded BioPerl recently and now lots of stuff appears broken > though I'm sure it's not as bad as it looks. > Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm > deluged with errors. > AFAIK, there were no changes to Perl 5.8.8 > > Any help greatly appreciated!!! > > Thanx, > > Russell Smithies > > ----------------------------------- > #! /usr/local/bin/perl > > use strict; > use warnings; > use Bio::DB::Taxonomy; > use Data::Dumper; > > my $idx_dir = '/data/home/smithiesr/taxonomy'; > my $TAXDIR = "/data/home/smithiesr/taxdump"; > > my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); > > my $db = new Bio::DB::Taxonomy(-source => 'flatfile', > -nodesfile => $nodefile, > -namesfile => $namesfile, > -directory => $idx_dir, > -force => 1) or die $!; > > my $human = $db->get_taxon(-name => 'Homo sapiens'); > print Dumper $human; > > ----------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references > are not implemented in the version of perl at > /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. > Compilation failed in require at (eval 21) line 3. > ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. > > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::Root::Root::_load_module > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 > STACK: Bio::DB::Taxonomy::_load_tax_module > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 > STACK: Bio::DB::Taxonomy::new > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 > STACK: taxonomyTest.pl:15 > ----------------------------------------------------------- > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dimitark at bii.a-star.edu.sg Tue May 25 02:28:19 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 25 May 2010 10:28:19 +0800 Subject: [Bioperl-l] about gene names Message-ID: <4BFB35C3.4010808@bii.a-star.edu.sg> Hi guys, i have a question How can I get only the gene names from NCBI Gene when i have the sequence id? For example with this id - NP_005264.2 i can search NCBI Gene online but i want to get only the gene name automatically. I was checking the Bio::DB::EntrezGene module but it didnt became clear to me if i can use it for my purposes. Thank you in advance. Greetings Dimitar -- Dimitar Kenanov Postdoctoral research fellow Protein Sequence Analysis Group Bioinformatics Institute A*STAR, Singapore tel: +65 6478 8514 From David.Messina at sbc.su.se Mon May 24 22:23:32 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 00:23:32 +0200 Subject: [Bioperl-l] Pfam database In-Reply-To: <28650160.post@talk.nabble.com> References: <28650160.post@talk.nabble.com> Message-ID: Hi, The release notes for the latest Pfam (24.0) do mention file format changes, but I could not find documentation describing those changes. Your questions relating to that would best be answered by the people at Pfam. You can contact them here: pfam-help at sanger.ac.uk However, please do report back to us what you learn. It's quite likely our code is not compatible with Pfam 24.0, and we would need that information to fix it. Thanks, Dave On May 23, 2010, at 5:57 PM, NamNAme wrote: > > Dear all, > A few weeks ago I wrote a program that need the pfam database, and I tested > it on the first version of pfam where each protein family sequences are in > one file. > But now I would like to test it on the last version of pfam but the > organization changed. > I've found a file called Pfam-A.fasta which contains sequences and the > family they belong to. But the sequences inside are not complete. > So, I've two questions : Why these sequences are not complete ? > And, How can I find a file with complete sequences and the family they > belong to ? > Thank you for your help. > Bye. > P-S : There is the file pfamseq, I tried to make a script to read it and > then retreive the database structure i want but, this file is enourmous and > use too much memory so it crashed. > -- > View this message in context: http://old.nabble.com/Pfam-database-tp28650160p28650160.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue May 25 02:54:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 May 2010 21:54:03 -0500 Subject: [Bioperl-l] taxonomy nightmare In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32D88D063B1@exchsth.agresearch.co.nz> Message-ID: You may have a version of perl that either doesn't include Scalar::Util or includes a broken version. Try installing Scalar::Util from CPAN to see if it fixes the problem. Here's a link on the problem: http://www.perlmonks.org/?node_id=424737 chris On May 24, 2010, at 5:01 PM, Smithies, Russell wrote: > We've upgraded BioPerl recently and now lots of stuff appears broken though I'm sure it's not as bad as it looks. > Under v1.5.2, the Bio::DB::Taxonomy worked fine but under 1.6.0 I'm deluged with errors. > AFAIK, there were no changes to Perl 5.8.8 > > Any help greatly appreciated!!! > > Thanx, > > Russell Smithies > > ----------------------------------- > #! /usr/local/bin/perl > > use strict; > use warnings; > use Bio::DB::Taxonomy; > use Data::Dumper; > > my $idx_dir = '/data/home/smithiesr/taxonomy'; > my $TAXDIR = "/data/home/smithiesr/taxdump"; > > my ($nodefile,$namesfile) = ("$TAXDIR/nodes.dmp","$TAXDIR/names.dmp"); > > my $db = new Bio::DB::Taxonomy(-source => 'flatfile', > -nodesfile => $nodefile, > -namesfile => $namesfile, > -directory => $idx_dir, > -force => 1) or die $!; > > my $human = $db->get_taxon(-name => 'Homo sapiens'); > print Dumper $human; > > ----------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::DB::Taxonomy::flatfile. Weak references are not implemented in the version of perl at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89 > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 89. > Compilation failed in require at (eval 21) line 3. > ...propagated at /usr/lib/perl5/5.8.8/base.pm line 85. > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Taxon.pm line 155. > Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy/flatfile.pm line 89. > Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 439. > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:441 > STACK: Bio::DB::Taxonomy::_load_tax_module /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:264 > STACK: Bio::DB::Taxonomy::new /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Taxonomy.pm:115 > STACK: taxonomyTest.pl:15 > ----------------------------------------------------------- > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Tue May 25 05:58:27 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 07:58:27 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: <1274767107.2271.11.camel@gonzo.home.kblin.org> On Mon, 2010-05-24 at 17:46 -0700, Thomas Sharpton wrote: > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and > hmmsearch output. It appears to be fully functional and I have had a handful > of users test and integrate this module. That's pretty much what I need. Thanks to the folks on IRC, I got pointed at the correct repository yesterday evening. > Kai, I can mail an archive of the parser your way if you're in a hurry. With > some assistance from Chris et. al., I expect the code to be in the github > repo by the day's end. No worries, that's fine. I've got a checkout of the standalone repository that I can play with now. Is there any particular reason you decided to create a new parser instead of integrating the code into the existing hmmer.pm module? I haven't looked at how the hmmer2 hmmsearch output looks compared to the hmmer3 version and if there's any conflicts. Cheers, Kai PS: Tom, sorry for the repost, forgot to CC the list. Pre-coffee email sending, it never works. -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From dan.kortschak at adelaide.edu.au Tue May 25 06:12:27 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 25 May 2010 15:42:27 +0930 Subject: [Bioperl-l] Bioperl-l Digest, Vol 85, Issue 34 In-Reply-To: References: Message-ID: <1274767947.32025.49.camel@zoidberg.mbs.adelaide.edu.au> Dimitar, Try having a look through the EUtilities cookbook: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook cheers Dan On Tue, 2010-05-25 at 01:58 -0400, Dimitar Kenanov wrote: > Date: Tue, 25 May 2010 10:28:19 +0800 > From: Dimitar Kenanov > Subject: [Bioperl-l] about gene names > To: "'bioperl-l at bioperl.org'" > Message-ID: <4BFB35C3.4010808 at bii.a-star.edu.sg> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi guys, > i have a question How can I get only the gene names from NCBI Gene > when > i have the sequence id? For example with this id - NP_005264.2 i can > search NCBI Gene online but i want to get only the gene name > automatically. I was checking the Bio::DB::EntrezGene module but it > didnt became clear to me if i can use it for my purposes. > > Thank you in advance. > > Greetings > Dimitar > From kai.blin at biotech.uni-tuebingen.de Tue May 25 11:41:59 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 13:41:59 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> Message-ID: <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> On Mon, 2010-05-24 at 17:46 -0700, Thomas Sharpton wrote: Hi Tom, > To clarify, I have constructed a SearchIO parser for hmmer3 hmmscan and > hmmsearch output. It appears to be fully functional and I have had a handful > of users test and integrate this module. I've tried using the hmmer3 parser for my script, but it seems like the hmm_name member of the result object isn't set, and I'm using that. I saw this before when trying to write a test case that integrates into the Bioperl test framework. (Error output is Can't locate object method "hmm_name" via package "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, line 152.) I'm happy to work on this a bit myself if you're not working on this anyway, so we don't duplicate efforts. I just don't get why the hmm_name isn't picked up correctly, and I haven't been able to figure out how to get at the output that $self->debug() when running the tests. Oh well, it's a learning experience in any case. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Tue May 25 12:37:47 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 14:37:47 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> Message-ID: <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> On Tue, 2010-05-25 at 13:41 +0200, Kai Blin wrote: Whined a little too early. > I've tried using the hmmer3 parser for my script, but it seems like the > hmm_name member of the result object isn't set, and I'm using that. > > I saw this before when trying to write a test case that integrates into > the Bioperl test framework. > (Error output is Can't locate object method "hmm_name" via package > "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, > line 152.) I just found the stuff I needed to add to the hmmer3Result.pm file. I'm currently busy adding a comprehensive test case for this module that integrates into the bioperl test harness. What's the best way to publish my additions? Do I create a fork of bioperl-live on Github or how is this handled? Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Tue May 25 12:46:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 07:46:48 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274787719.1897.165.camel@mikropc7.biotech.uni-tuebingen.de> <1274791067.25985.3.camel@mikropc7.biotech.uni-tuebingen.de> Message-ID: On May 25, 2010, at 7:37 AM, Kai Blin wrote: > On Tue, 2010-05-25 at 13:41 +0200, Kai Blin wrote: > > Whined a little too early. > >> I've tried using the hmmer3 parser for my script, but it seems like the >> hmm_name member of the result object isn't set, and I'm using that. >> >> I saw this before when trying to write a test case that integrates into >> the Bioperl test framework. >> (Error output is Can't locate object method "hmm_name" via package >> "Bio::Search::Result::hmmer3Result" at t/SearchIO/hmmer3.t line 23, >> line 152.) > > I just found the stuff I needed to add to the hmmer3Result.pm file. I'm > currently busy adding a comprehensive test case for this module that > integrates into the bioperl test harness. > > What's the best way to publish my additions? Do I create a fork of > bioperl-live on Github or how is this handled? Create a fork of the proper repository, which will eventually be bioperl-hmmer3. However, Thomas hasn't added that code in yet; not sure how much has changed since the original deposition into bioperl-dev in March, but it's possible more has been done. chris > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin > Abteilung Mikrobiologie/Biotechnologie > Eberhard-Karls-Universit?t T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > From dueldor at yahoo.com Tue May 25 12:30:59 2010 From: dueldor at yahoo.com (Dubi Eldor) Date: Tue, 25 May 2010 05:30:59 -0700 (PDT) Subject: [Bioperl-l] How to find secondary structures Message-ID: <766825.32163.qm@web37308.mail.mud.yahoo.com> Hi, I am a new user of BioPerl. I would like to find secondary sturctures in sequences of ~10K nt long. Are there any functions that can help me? Thanks, Dubi From David.Messina at sbc.su.se Tue May 25 13:58:38 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 15:58:38 +0200 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <3065CE83-3E61-4080-B475-F609E74A9FD4@sbc.su.se> On May 25, 2010, at 15:54, Staffa, Nick (NIH/NIEHS) [C] wrote: > The tutorial, I discovered, has an error. > a very bad experience for a trusting newby. > whereas the tutorial has these bold examples in the first box under > Identifying restriction enzyme sites (Bio::Restriction) > > use Bio::Restriction::EnzymeCollection; > my $all_collection = Bio::Restriction::EnzymeCollection; > > This is the form of the statement that seems to work: > my $all_collection = Bio::Restriction::EnzymeCollection->new(); Thanks, fixed. From bosborne11 at verizon.net Tue May 25 13:04:01 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 25 May 2010 09:04:01 -0400 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: Dave, I looked at the scripts, and like you I concluded they didn't use that local Bio/ directory. Then I ran then with and without that Bio/ directory, same results. So I removed that local Bio/ directory. Rob, does some additional action need to be taken by Chris, or some other Bioperl maintainer, at CPAN/PAUSE? Brian O. On May 24, 2010, at 6:03 PM, Dave Messina wrote: > From: Graham Barr via RT >> IMO it is confusing to include 2 different copies of the same module. > > I agree. > > It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). > > In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). > > I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. > > So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Tue May 25 13:54:17 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Tue, 25 May 2010 09:54:17 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: <4046E576-2109-45BB-969C-F0B6F5749957@sbc.su.se> Message-ID: The tutorial, I discovered, has an error. a very bad experience for a trusting newby. whereas the tutorial has these bold examples in the first box under Identifying restriction enzyme sites (Bio::Restriction) use Bio::Restriction::EnzymeCollection; my $all_collection = Bio::Restriction::EnzymeCollection; This is the form of the statement that seems to work: my $all_collection = Bio::Restriction::EnzymeCollection->new(); All the other stuff necessary for my purpose of getting fragment lengths is there and seems to work if the $enzyme database has the enzyme under the name you enter. Updating the database with the file from NEB seems to be up to the user or his sysadmin. On 5/24/10 11:55 AM, "Dave Messina" wrote: Hi Nick, Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that module, see also: http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > How hard would it be to keep things backward compatible. > Have I missed something here? I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones are intended to be at least partially backwards compatible. Dave From cjfields at illinois.edu Tue May 25 14:30:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 09:30:09 -0500 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: References: <4BFAAF45.4090400@cornell.edu> Message-ID: <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> I have added a 'no_index' to that specific directory in Build.PL, suppose we can change that back if there is no purpose to it (though it might come in handy with spots we don't need to be indexed). chris On May 25, 2010, at 8:04 AM, Brian Osborne wrote: > Dave, > > I looked at the scripts, and like you I concluded they didn't use that local Bio/ directory. Then I ran then with and without that Bio/ directory, same results. So I removed that local Bio/ directory. > > Rob, does some additional action need to be taken by Chris, or some other Bioperl maintainer, at CPAN/PAUSE? > > Brian O. > > On May 24, 2010, at 6:03 PM, Dave Messina wrote: > >> From: Graham Barr via RT >>> IMO it is confusing to include 2 different copies of the same module. >> >> I agree. >> >> It would be better to use the main copies of PrimarySeq, PrimarySeqI, Seq, and SeqI instead of private copies (and not just because of this POD conflict). >> >> In fact, my quick examination suggests that the examples/root scripts don't even use the duplicate modules in that private lib (just TestInterface and TestObject). >> >> I haven't extensively played with them, though, so there may be some compelling reason for using private copies that I'm overlooking. In that case we could just do as Graham suggested and block CPAN from indexing the private copies. >> >> So I propose we test if the example scripts perform as expected without those private copies and if so, remove them. >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From staffa at niehs.nih.gov Tue May 25 14:51:02 2010 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Tue, 25 May 2010 10:51:02 -0400 Subject: [Bioperl-l] New Restriction Analysis Message-ID: I have tried both these methods for getting new enzyme info into the system: use Bio::Restriction::IO; my $re_io = Bio::Restriction::IO->new(-file => $file, -format=>'withrefm'); my $rebase_collection = $re_io->read; A REBASE file in the correct format can be found at ftp://ftp.neb.com/pub/rebase - it will have a name like "withrefm.308". If need be you can also create new enzymes, like this: my $re = new Bio::Restriction::Enzyme(-enzyme => 'BioRI', -seq => 'GG^AATTCC'); But the BioPerl sends an error without informing me which of my statements caused it: Using first the withreftm.005 file from rebase and then these statements (not both at the same time): my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'SgrDI', -seq => 'CG^TCGACG'); Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.8/Bio/Restriction/Analysis.pm line 529. This works: my $pattern = $enzyme->site; print "pattern = $pattern\n"; which would lead me to believe there is nothing wrong with my enzyme. Could there be a problem if there were no cuts? That must be it, because putting info for EcoRI in instead of SgrDI, the program works: [Not the whole program, but only the bioPerl stuff. my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'EcoRI', -seq => 'G^AATTC'); use Bio::Restriction::Analysis; my $pattern = $enzyme->site; print "pattern = $pattern\n"; my $db = Bio::DB::Fasta->new("/uoldhome/estaffa/westmoreland/$filename", -makeid => \&make_my_id); my $obj = $db->get_Seq_by_id("$sequenceID"); #Sequence Object my $analysis = Bio::Restriction::Analysis->new(-seq => $obj); my @strings = $analysis->fragments($enzyme); What to do? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Enterprise-Wide Information Technology Support Contract National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From maj at fortinbras.us Tue May 25 16:20:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:20:41 -0400 Subject: [Bioperl-l] How to find secondary structures In-Reply-To: <766825.32163.qm@web37308.mail.mud.yahoo.com> References: <766825.32163.qm@web37308.mail.mud.yahoo.com> Message-ID: <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> Sounds like a job for infernal and it's Bioperl wrapper (in Bio::Tools::Run); right Chris? MAJ ----- Original Message ----- From: "Dubi Eldor" To: Sent: Tuesday, May 25, 2010 8:30 AM Subject: [Bioperl-l] How to find secondary structures > Hi, > > I am a new user of BioPerl. > I would like to find secondary sturctures in sequences of ~10K nt long. > Are there any functions that can help me? > > Thanks, > Dubi > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue May 25 16:19:42 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:19:42 -0400 Subject: [Bioperl-l] New Restriction Analysis In-Reply-To: References: Message-ID: Hi Nick, You're right, as far as I can tell; the offending line is @cut_positions=@{$self->{'_cut_positions'}->{$enz}}; so $self->{_cut_positions}->{$enz} must be null. I would say this is a bug; if you can put what you've reported below in a bug report at http://bugzilla.bioperl.org, that would be great. A workaround would be to check whether you have cuts first before calling the method; but that may be impossible, in which case a truly awful kludge would be to append a recognized site at the end of your sequences. Just till we can get to the fix. cheers Mark ----- Original Message ----- From: "Staffa, Nick (NIH/NIEHS) [C]" To: "Bioperl-l" Sent: Tuesday, May 25, 2010 10:51 AM Subject: [Bioperl-l] New Restriction Analysis >I have tried both these methods for getting new enzyme info into the system: > > use Bio::Restriction::IO; > my $re_io = Bio::Restriction::IO->new(-file => $file, > -format=>'withrefm'); > my $rebase_collection = $re_io->read; > A REBASE file in the correct format can be found at > ftp://ftp.neb.com/pub/rebase - it will have a name like "withrefm.308". If > need be you can also create new enzymes, like this: > my $re = new Bio::Restriction::Enzyme(-enzyme => 'BioRI', > -seq => 'GG^AATTCC'); > But the BioPerl sends an error without informing me which of my statements > caused it: > > Using first the withreftm.005 file from rebase and then these statements (not > both at the same time): > my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'SgrDI', > -seq => 'CG^TCGACG'); > > > Can't use an undefined value as an ARRAY reference at > /usr/lib/perl5/site_perl/5.8.8/Bio/Restriction/Analysis.pm line 529. > > This works: > my $pattern = $enzyme->site; > print "pattern = $pattern\n"; > which would lead me to believe there is nothing wrong with my enzyme. > Could there be a problem if there were no cuts? > That must be it, because putting info for EcoRI in instead of SgrDI, the > program works: > > [Not the whole program, but only the bioPerl stuff. > my $enzyme = new Bio::Restriction::Enzyme(-enzyme => 'EcoRI', > -seq => 'G^AATTC'); > use Bio::Restriction::Analysis; > my $pattern = $enzyme->site; > print "pattern = $pattern\n"; > my $db = Bio::DB::Fasta->new("/uoldhome/estaffa/westmoreland/$filename", > -makeid => \&make_my_id); > my $obj = $db->get_Seq_by_id("$sequenceID"); #Sequence Object > my $analysis = Bio::Restriction::Analysis->new(-seq => $obj); > my @strings = $analysis->fragments($enzyme); > > What to do? > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Enterprise-Wide Information Technology Support Contract > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue May 25 16:38:12 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 11:38:12 -0500 Subject: [Bioperl-l] How to find secondary structures In-Reply-To: <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> References: <766825.32163.qm@web37308.mail.mud.yahoo.com> <1ACA7DD7CFCE41CDBB9E7304F14E5BA0@NewLife> Message-ID: <2B6207D9-7221-4949-A7EE-EE6ED54EFF7B@illinois.edu> Yes, that would look for Rfam-based conserved structures. Should work for the latest infernal release, but let me know if you run into problems. Should also look at ERPIN and RNAMotif (both have similar BioPerl wrappers). chris On May 25, 2010, at 11:20 AM, Mark A. Jensen wrote: > Sounds like a job for infernal and it's Bioperl wrapper (in Bio::Tools::Run); right Chris? > MAJ > ----- Original Message ----- From: "Dubi Eldor" > To: > Sent: Tuesday, May 25, 2010 8:30 AM > Subject: [Bioperl-l] How to find secondary structures > > >> Hi, >> >> I am a new user of BioPerl. >> I would like to find secondary sturctures in sequences of ~10K nt long. >> Are there any functions that can help me? >> >> Thanks, >> Dubi >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue May 25 16:43:41 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 12:43:41 -0400 Subject: [Bioperl-l] Restriction Enzymes In-Reply-To: References: Message-ID: <8EE661A4491C4A0FAD9875CF790F8164@NewLife> Thanks for the headsup on that-- we can fix. The refm file should be downloaded relatively transparently by the class directly... MAJ ----- Original Message ----- From: "Staffa, Nick (NIH/NIEHS) [C]" To: "Dave Messina" ; "Chris Fields" ; "Mark A. Jensen" Cc: "Bioperl-l" Sent: Tuesday, May 25, 2010 9:54 AM Subject: Re: [Bioperl-l] Restriction Enzymes > The tutorial, I discovered, has an error. > a very bad experience for a trusting newby. > whereas the tutorial has these bold examples in the first box under > Identifying restriction enzyme sites (Bio::Restriction) > > use Bio::Restriction::EnzymeCollection; > my $all_collection = Bio::Restriction::EnzymeCollection; > > This is the form of the statement that seems to work: > my $all_collection = Bio::Restriction::EnzymeCollection->new(); > > All the other stuff necessary for my purpose of getting fragment lengths is > there and seems to work > if the $enzyme database has the enzyme under the name you enter. > Updating the database with the file from NEB seems to be up to the user or his > sysadmin. > > > On 5/24/10 11:55 AM, "Dave Messina" wrote: > > Hi Nick, > > Now it's Bio::Restriction::Enzyme (and friends). Besides the perldoc for that > module, see also: > > http://www.bioperl.org/wiki/BioPerl_Tutorial#Identifying_restriction_enzyme_sites_.28Bio::Restriction.29 > > >> How hard would it be to keep things backward compatible. >> Have I missed something here? > > I don't know the history of the change, but the Bio::Tools::RestrictionEnzyme > was deprecated in BioPerl 1.5.2 (2006?). According to the docs the new ones > are intended to be at least partially backwards compatible. > > > Dave > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue May 25 17:14:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 13:14:24 -0400 Subject: [Bioperl-l] CommandExts and arrays In-Reply-To: References: Message-ID: <409221E1D1E947108DEDBB5F34E1EBB7@NewLife> Don't think you want 'no strict'; the error's saying something about syntax to you. In the snippet, I see a missing opening single quote for output_file.bam. The asterisk means "expect an array ref", so that's ok. ----- Original Message ----- From: "Ben Bimber" To: "bioperl-l" Sent: Friday, May 21, 2010 9:58 AM Subject: [Bioperl-l] CommandExts and arrays >I am getting an error when trying to pass an array as a param with > command exts. I hope there is something obvious i'm missing, but I > cant seem to figure this out. > > I am trying to run the merge two BAM files using > Bio::Tools::Run::Samtools using something like this: > > my $new_bam = Bio::Tools::Run::Samtools->new( > -command => 'merge', > -program_dir => '/usr/bin/samtools/', > )->run( > -obm => output_file.bam', > -ibm => ['file1.bam', 'file2.bam'], > ); > > When i use an array for the -ibm param, I get an error saying 'cannot > use string 'file1' as an arrayref while strict refs in place'. The > error comes from this code in CommandExts.pm, around line 989. adding > 'no strict' right before the final line stops the error: > > # expand arrayrefs > my $l = $#files; > for (0..$l) { > if (ref($files[$_]) eq 'ARRAY') { > splice(@files, $_, 1, @{$files[$_]}); > #error thrown from this line > splice(@switches, $_, 1, ($switches[$_]) x @{$files[$_]}); > } > > > Thanks for the help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From thomas.sharpton at gmail.com Tue May 25 18:33:06 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 25 May 2010 11:33:06 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274767107.2271.11.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> Message-ID: <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> Hi Kai, I've just pushed the code to github, which you can find here: http://github.com/bioperl/bioperl-hmmer3 Please use this updated code before making any significant changes - I think I may have already fixed the bug you brought up earlier (but maybe not?). Do let me know if you have any problems getting ahold of this data or if you find any bugs in the code I'd deposited. Still getting my head wrapped around github. > No worries, that's fine. I've got a checkout of the standalone > repository that I can play with now. Is there any particular reason > you > decided to create a new parser instead of integrating the code into > the > existing hmmer.pm module? I haven't looked at how the hmmer2 hmmsearch > output looks compared to the hmmer3 version and if there's any > conflicts. Trying to integrate hmmer3 into the old hmmer searchIO module was the original idea. But after talking to some of the BioPerl gurus and considering the inherent differences between hmmer3 and hmmer2 (at least during beta, though there are still some major output report differences in the live release), we decided as separate module would be ideal. I don't want to speak out of turn, but it sounds like this might be one of the ways that the bioperl project is expanded in the future without overbloating bioperl-live. In theory, we can extend Bio::Run into this module as well in the future, such that bioperl- hmmer3 has a SearchIO path in addition to a Run path. I don't know what the more experienced developers currently think about this idea. This is an obvious statement, but I feel it's important to be clear on these matters - you should feel free to make any and all contributions to the development of this module as you see fit. BioPerl has been wonderful to me and I started this module to give a little back, but this remains community generated software. FYI - I have a fix that I'm working on to handle the secondary structure track in the alignment report, so if you're particularly interested in that data, give me a bit and I'll have it up and running. All the best, Tom From David.Messina at sbc.su.se Tue May 25 18:52:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 25 May 2010 20:52:29 +0200 Subject: [Bioperl-l] [Fwd: Re: [perl #75252] POD rendering question/problem (was [Fwd: What is CPAN doing?])] In-Reply-To: <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> References: <4BFAAF45.4090400@cornell.edu> <4B11F518-94BC-4763-9FDA-05150C3E328A@illinois.edu> Message-ID: <704A3AD7-BF8E-4C52-A3C5-D402B59BFD66@sbc.su.se> On May 25, 2010, at 4:30 PM, Chris Fields wrote: > I have added a 'no_index' to that specific directory in Build.PL, suppose we can change that back if there is no purpose to it (though it might come in handy with spots we don't need to be indexed). Good idea ? it's bound to come up at some point. On May 25, 2010, at 3:04 PM, Brian Osborne wrote: > So I removed that local Bio/ directory. Great, thanks Brian! Dave From hlapp at gmx.net Tue May 25 21:10:42 2010 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 May 2010 15:10:42 -0600 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> Message-ID: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> I'm a little concerned that this discussion is disconnected from the list and so misses a lot of possible input. Are we moving our development discussion to IRC or github commit comments? Regarding $feature->seq(), the API documentation expressly states that the return type is Bio::PrimarySeqI, as it does for $feature- >entire_seq(). The original rationale for that was to avoid circular references. Bio::SeqI objects contain references to attached features, which in turn contain a reference to the seq object they are attached to. A Bio::SeqI object holds the basic sequence properties (everything except annotation and feature objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a reference to, not the containing Bio::SeqI object. It's possible that S::U::weaken() can solve the circular reference problem, but this fact should be tested. I.e., attach a feature with a SeqI-reference to a SeqI, dispose the SeqI, and then test that the feature has lost the reference to the SeqI too. This still leaves the issue though that then you have a SeqFeatureI object with a dangling reference to a sequence object. If you have those SeqFeatureI objects stored in a feature store, this may wreak havoc. I'd like to see convincing arguments that it doesn't. Bottom line - just forking on git and committing a change isn't a substitute for bringing up an issue and possible solutions on the list, and the vetting of pull requests can fall upon only one or two core developers. Two eyeballs often spot a lot less than a hundred. -hilmar On May 25, 2010, at 2:02 PM, GitHub wrote: > Ah, but my link's old, forget it. This one is better: http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html > > From: cjfields > View this commit online: http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From kai.blin at biotech.uni-tuebingen.de Tue May 25 21:50:29 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Tue, 25 May 2010 23:50:29 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> Message-ID: <1274824229.2271.60.camel@gonzo.home.kblin.org> On Tue, 2010-05-25 at 11:33 -0700, Thomas Sharpton wrote: Hi Thomas, > http://github.com/bioperl/bioperl-hmmer3 > > Please use this updated code before making any significant changes - I > think I may have already fixed the bug you brought up earlier (but > maybe not?). Do let me know if you have any problems getting ahold of > this data or if you find any bugs in the code I'd deposited. Still > getting my head wrapped around github. I've seen the repo, and forked from it already to push my changes. Some of the folks from IRC gave me write access and Chris Fields actually pushed my changes. Most notable about the changes is probably a bit hidden by the noise, but I've changed the Hit->raw_score to contain the overall score, not the "best domain" score. > Trying to integrate hmmer3 into the old hmmer searchIO module was the > original idea. But after talking to some of the BioPerl gurus and > considering the inherent differences between hmmer3 and hmmer2 (at > least during beta, though there are still some major output report > differences in the live release), we decided as separate module would > be ideal. Some of the folks on IRC suggested that we might want to integrate the hmmer.pm parser as well, modularizing this a bit and loading the correct parser depending on the requested format. > This is an obvious statement, but I feel it's important to be clear on > these matters - you should feel free to make any and all contributions > to the development of this module as you see fit. BioPerl has been > wonderful to me and I started this module to give a little back, but > this remains community generated software. I'm planning on adding even more tests, but the basic features for hmmscan parsing seem to be there. I'm currently running an extensive test run on real genome data, hopefully I can see the results of that in a couple of days. Cheers, and thanks for the help, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From cjfields at illinois.edu Tue May 25 21:55:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 May 2010 16:55:53 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: I agree, but we spotted this from IRC, then added the comments on that merge. Dave also spotted my original code comments (which appeared in the fork queue, and which echo the very same concerns you have) after the commit as well, and managed to revert it. So, with forked where it appears further discussion is warranted (like this), we should bring it to the main list (and IRC, if anyone happens to be there) for discussion. Sounds good to me. For those on list, here are Adam's and my comments on this (linked here: http://github.com/adsj/bioperl-live/commit/24ec961b217084e248f4fdbd174aadace1a27ac4#comments): adsj: "Hi Chris, thanks for the comment. The reason is this: I have a class, MyApp::Seq, which ISA Bio::Seq::RichSeq and adds some extra methods I use in the application. When I call ->seq() on a feature from one of my MyApp::Seq objects, I want to get a MyApp::Seq object back (because of the extra methods). Am I making sense? I have been running with this patch since at least 1.5.2, so it has been a while since I digged into it. Maybe there is a cleaner solution. I am not sure what your comment about changing the API means - I think it is quite reasonable/natural that MyApp::Seq->get_Features"->seq" returns MyApp::Seq objects?" My response: "Calling seq() on a feature should return a truncation of whatever your Bio::SeqFeatureI does (it normally calls trunc(start, end) on it's attached sequence). For Bio::Seq it's normally returning a simple Bio::PrimarySeq, not a Bio::Seq, b/c that is what is attached to the Feature. This is why we don't need GC. There are no circular refs: Bio::Seq has-a PrimarySeq and has-a Features (via FeatureHolderI), each Feature has the same PrimarySeq as the parent Bio::Seq. It's hard to know if there is a workaround w/o knowing what you are asking for (e.g. what MyApp::Seq does), but you can certainly override the default methods to DTRT for your specific case. For instance, redefine add_SeqFeature() for your class to attach self as you have above for Bio::Seq. In this case, we should patch SeqFeature::Generic to use weaken() as you show above just in case this is needed by others, but maybe in the context of (pseudocode) 'weaken if $seq to be attached is-a Bio::SeqI', and not hammered down to check the very specific 'Bio::PrimarySeq'. Anyway, this is what I mean by changing the default API, which is what the above Bio::Seq change does. This would change the context of what is currently being returned (self, instead of a simpler contained Bio::PrimarySeqI). Also, anything gained by abstracting the raw seq handling of Feature data by linking to PrimarySeq is lost when you link to the parent, thus always requiring GC and weaken() (which is notoriously flaky dep. on context)." chris On May 25, 2010, at 4:10 PM, Hilmar Lapp wrote: > I'm a little concerned that this discussion is disconnected from the list and so misses a lot of possible input. Are we moving our development discussion to IRC or github commit comments? > > Regarding $feature->seq(), the API documentation expressly states that the return type is Bio::PrimarySeqI, as it does for $feature->entire_seq(). > > The original rationale for that was to avoid circular references. Bio::SeqI objects contain references to attached features, which in turn contain a reference to the seq object they are attached to. A Bio::SeqI object holds the basic sequence properties (everything except annotation and feature objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a reference to, not the containing Bio::SeqI object. > > It's possible that S::U::weaken() can solve the circular reference problem, but this fact should be tested. I.e., attach a feature with a SeqI-reference to a SeqI, dispose the SeqI, and then test that the feature has lost the reference to the SeqI too. > > This still leaves the issue though that then you have a SeqFeatureI object with a dangling reference to a sequence object. If you have those SeqFeatureI objects stored in a feature store, this may wreak havoc. I'd like to see convincing arguments that it doesn't. > > Bottom line - just forking on git and committing a change isn't a substitute for bringing up an issue and possible solutions on the list, and the vetting of pull requests can fall upon only one or two core developers. Two eyeballs often spot a lot less than a hundred. > > -hilmar > > On May 25, 2010, at 2:02 PM, GitHub wrote: > >> Ah, but my link's old, forget it. This one is better: http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html >> >> From: cjfields >> View this commit online: http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From thomas.sharpton at gmail.com Tue May 25 22:29:38 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 25 May 2010 15:29:38 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: <1274824229.2271.60.camel@gonzo.home.kblin.org> References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: Thanks for the contributions, Kai. > I've seen the repo, and forked from it already to push my changes. > Some > of the folks from IRC gave me write access and Chris Fields actually > pushed my changes. Just saw this. Thanks for doing that, Chris. > Most notable about the changes is probably a bit hidden by the noise, > but I've changed the Hit->raw_score to contain the overall score, not > the "best domain" score. So this brings up an interesting point. At some point, we'll have to build out a few additional SearchIO methods to incorporate some of the additional information encoded in the HMMER v3 reports. Sean talks a bit in the user manual about the importance of looking at both the full sequence and the best domain (see page 18 in the manual linked to on this page http://hmmer.janelia.org/#documentation). For example, he mentions that one should consider the e-value of both the full sequence and best domain to ascertain if the query is homologous to a profile being considered via hmmsearch. He also mentions that looking at the full sequence report values without consideration of the best domain report values can be misleading. I'm not saying that your approach regarding Hit->raw_score is wrong - proper interpretation of the results is up to the end user and there are benefits to looking at the full sequence (again, communicated on page 18) - but we might consider how to best encode the SearchIO methods to mitigate end user confusion and mistakes. >> Trying to integrate hmmer3 into the old hmmer searchIO module was the >> original idea. But after talking to some of the BioPerl gurus and >> considering the inherent differences between hmmer3 and hmmer2 (at >> least during beta, though there are still some major output report >> differences in the live release), we decided as separate module would >> be ideal. > > Some of the folks on IRC suggested that we might want to integrate the > hmmer.pm parser as well, modularizing this a bit and loading the > correct > parser depending on the requested format. This might make sense, given that HMMER v3 is now live and seems to be adopted by researchers at an increasing rate. Since I used hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult to do, either. I think a thorough conversation on this point is warranted as others I've talked to have preferred the modules to be separate. I'd be interested to hear what other have to say on this point. >> This is an obvious statement, but I feel it's important to be clear >> on >> these matters - you should feel free to make any and all >> contributions >> to the development of this module as you see fit. BioPerl has been >> wonderful to me and I started this module to give a little back, but >> this remains community generated software. > > I'm planning on adding even more tests, but the basic features for > hmmscan parsing seem to be there. I'm currently running an extensive > test run on real genome data, hopefully I can see the results of > that in > a couple of days. Awesome! > Cheers, and thanks for the help, Likewise. T From kannabiran.nandakumar at gmail.com Tue May 25 22:30:18 2010 From: kannabiran.nandakumar at gmail.com (Kanna) Date: Tue, 25 May 2010 15:30:18 -0700 (PDT) Subject: [Bioperl-l] new to this group Message-ID: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Hi guys, I am new to this group. I work in bioinformatics and would like to contribute to the BioPerl project. I am interested in the OBO file parsing module to start with. I visited the project priority list and the page seems to have been modified around 6 months ago. If it is already completed could anyone suggest modules I can contribute to? Thanks, Kanna From David.Messina at sbc.su.se Tue May 25 22:41:27 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 00:41:27 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: On May 25, 2010, at 11:55 PM, Chris Fields wrote: > Sounds good to me. Me too, and just to clarify for everyone following along, I erroneously committed the code in question to bioperl-live master (head), reverted that commit, and moved it to a branch (http://github.com/bioperl/bioperl-live/commits/topic/adsj-seqobj-return). Dave From maj at fortinbras.us Wed May 26 01:37:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 25 May 2010 21:37:38 -0400 Subject: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: <525D25AC2CDF42E99C1F4072B02D0C1B@NewLife> I +1 Hilmar, but note that already git is doing what it is designed to do: devolve development. My $0.02 is: that is how BioPerl will keep from becoming a dinosaur. I believe that we as a community, judging from the track of the last year or so, are committed to this evolution by devolution, and the move to git is part of that overall plan. The increase in IRC chatter, led by deafferet and rbuels, prefigured this and it was generally considered a Good Thing. So, I would propose that people (devs and users) make their views known (on list and elsewhere) about how best to communicate and have dev-oriented conversations: it may be that a listserv alone is not nimble enough. MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "BioPerl List" Sent: Tuesday, May 25, 2010 5:10 PM Subject: Re: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) > I'm a little concerned that this discussion is disconnected from the list and > so misses a lot of possible input. Are we moving our development discussion > to IRC or github commit comments? > > Regarding $feature->seq(), the API documentation expressly states that the > return type is Bio::PrimarySeqI, as it does for $feature- > >entire_seq(). > > The original rationale for that was to avoid circular references. Bio::SeqI > objects contain references to attached features, which in turn contain a > reference to the seq object they are attached to. A Bio::SeqI object holds > the basic sequence properties (everything except annotation and feature > objects) in a Bio::PrimarySeq delegate, which is what a feature keeps a > reference to, not the containing Bio::SeqI object. > > It's possible that S::U::weaken() can solve the circular reference problem, > but this fact should be tested. I.e., attach a feature with a SeqI-reference > to a SeqI, dispose the SeqI, and then test that the feature has lost the > reference to the SeqI too. > > This still leaves the issue though that then you have a SeqFeatureI object > with a dangling reference to a sequence object. If you have those SeqFeatureI > objects stored in a feature store, this may wreak havoc. I'd like to see > convincing arguments that it doesn't. > > Bottom line - just forking on git and committing a change isn't a substitute > for bringing up an issue and possible solutions on the list, and the vetting > of pull requests can fall upon only one or two core developers. Two eyeballs > often spot a lot less than a hundred. > > -hilmar > > On May 25, 2010, at 2:02 PM, GitHub wrote: > >> Ah, but my link's old, forget it. This one is better: >> http://book.git-scm.com/4_undoing_in_git_-_reset,_checkout_and_revert.html >> >> From: cjfields >> View this commit online: >> http://github.com/bioperl/bioperl-live/commit/fcd90e0f2fa94b61ff8351157129678417c32991 > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From asjo at koldfront.dk Wed May 26 05:41:52 2010 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 26 May 2010 07:41:52 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> Message-ID: <87zkznb4nz.fsf@topper.koldfront.dk> On Tue, 25 May 2010 15:10:42 -0600, Hilmar wrote: > Bottom line - just forking on git and committing a change isn't a > substitute for bringing up an issue and possible solutions on the > list, and the vetting of pull requests can fall upon only one or two > core developers. Two eyeballs often spot a lot less than a hundred. Just to clarify: I specifically _didn't_ make a Pull request yet. I simply created the fork store the patch in a visible way - my intention was then to clean the patch up and make it ready for comments/discussion (I just haven't had time to do so yet). I am new to github, but as I understood the interface there, anyone is free (encouraged?) to "fork" their own clone to work in, as a kind of "public" personal workspace, and when you feel that your clone is ready to be merged, then - only then - you do a "Pull request". If that isn't the way github is supposed to be used, or that isn't the way BioPerl wants to use it, let me know and I'll adjust. I appreciate the comments so far, and will get back to this as soon as I can. Thanks, Adam -- "Sunday morning when the rain begins to fall Adam Sj?gren I believe I have seen the end of it all" asjo at koldfront.dk From David.Messina at sbc.su.se Wed May 26 09:24:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 11:24:11 +0200 Subject: [Bioperl-l] Bio::Species irritated with "unclassified sequences" In-Reply-To: <4BF59B2F.9000300@bms.com> References: <4BF59B2F.9000300@bms.com> Message-ID: <50665C57-007D-49CC-86A7-4595D176EA73@sbc.su.se> Hi Charles, Thanks for your report. I believe your interpretation of Bio::Species::classification is correct. It looks like this is going to require a little more investigation. Could you please submit this as a bug report along with a little test case? http://www.bioperl.org/wiki/Bugs Dave On May 20, 2010, at 22:27, Charles Tilford wrote: > Bio::Species::classification() is irritated with me when I provide it with a @class_array that is composed of one node, particularly: > > $obj->classification("unclassified sequences") > > AFAICT this is a valid, single node taxa "tree": > > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=12908 > > Subroutine classification is expecting at least two class members, the problem with the above call crops up as: > > Use of uninitialized value $vals[1] in quotemeta at /stf/biocgi/tilfordc/patch_lib/Bio/Species.pm line 179 > ( $Id: Species.pm 16700 2010-01-15 19:50:11Z dave_messina $) > > > ... and the relevant code is: > > sub classification { > my ($self, @vals) = @_; > > if (@vals) { > if (ref($vals[0]) eq 'ARRAY') { > @vals = @{$vals[0]}; > } > > # make sure the lineage contains us as first or second element > # (lineage may have subspecies, species, genus ...) > my $name = $self->node_name; > my ($genus, $species) = (quotemeta($vals[1]), quotemeta($vals[0])); > > > That is, it's expecting at least (species, genus) in the array. Am I misusing classification(), or Bio::Species in general? I know it's named "Species", but I've been using it as a generic tree object for arbitrary taxonomy nodes, not just species and subspecies. This block a little lower down: > > unless ($self->rank) { > # and that we are rank species > $self->rank('species'); > } > > > ... implies that the module can be used for taxa ranks other than species. However, doing so would not prevent the module being aggravated over a null $vals[1]. > > The use case here is building Bio::Seq::RichSeq objects pulled from a (very large) sequence database, and then dumped / displayed with SeqIO. Most are well behaved, but there's a non-trivial number of 'artificial' constructs that don't root to an organism. > > -CAT > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed May 26 11:53:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 06:53:50 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <87zkznb4nz.fsf@topper.koldfront.dk> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> Message-ID: On May 26, 2010, at 12:41 AM, Adam Sj?gren wrote: > On Tue, 25 May 2010 15:10:42 -0600, Hilmar wrote: > >> Bottom line - just forking on git and committing a change isn't a >> substitute for bringing up an issue and possible solutions on the >> list, and the vetting of pull requests can fall upon only one or two >> core developers. Two eyeballs often spot a lot less than a hundred. > > Just to clarify: I specifically _didn't_ make a Pull request yet. > > I simply created the fork store the patch in a visible way - my > intention was then to clean the patch up and make it ready for > comments/discussion (I just haven't had time to do so yet). > > I am new to github, but as I understood the interface there, anyone is > free (encouraged?) to "fork" their own clone to work in, as a kind of > "public" personal workspace, and when you feel that your clone is ready > to be merged, then - only then - you do a "Pull request". That's odd; I recall receiving a pull request from your fork at some point, but maybe I simply looked into the fork queue instead (which I thought was derived from pull requests, but maybe not). > If that isn't the way github is supposed to be used, or that isn't the > way BioPerl wants to use it, let me know and I'll adjust. > > I appreciate the comments so far, and will get back to this as soon as I > can. > > > Thanks, > > Adam No problem Adam, we're going through the learning curve on this end as well re: this specific github feature. I think how you are going about this is fine, we'll need to come up with some documentation as to how our collabs pull in forked code. chrus From hlapp at drycafe.net Wed May 26 13:27:55 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 26 May 2010 07:27:55 -0600 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <87zkznb4nz.fsf@topper.koldfront.dk> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> Message-ID: <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> On May 25, 2010, at 11:41 PM, Adam Sj?gren wrote: > as I understood the interface there, anyone is free (encouraged?) to > "fork" their own clone to work in, as a kind of "public" personal > workspace, and when you feel that your clone is ready to be merged, > then - only then - you do a "Pull request". That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) And yes, encouraged to fork indeed. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Wed May 26 14:03:14 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 16:03:14 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> Message-ID: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> On May 26, 2010, at 15:27, Hilmar Lapp wrote: > That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) That would be me. :) His commits were sitting in the fork queue, which I mistakenly understood to mean a pull request had been made. Turns out that's not the case (See http://github.com/blog/270-the-fork-queue). Dave From David.Messina at sbc.su.se Wed May 26 14:52:05 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 16:52:05 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: > So this brings up an interesting point. At some point, we'll have to build out a few additional SearchIO methods to incorporate some of the additional information encoded in the HMMER v3 reports. Would the new methods need to be added to SearchIO if they're specific to H3? (as opposed to just being in the H3 sub-class) > Sean talks a bit in the user manual about the importance of looking at both the full sequence and the best domain (see page 18 in the manual linked to on this page http://hmmer.janelia.org/#documentation). For example, he mentions that one should consider the e-value of both the full sequence and best domain to ascertain if the query is homologous to a profile being considered via hmmsearch. > > He also mentions that looking at the full sequence report values without consideration of the best domain report values can be misleading. I'm not saying that your approach regarding Hit->raw_score is wrong - proper interpretation of the results is up to the end user and there are benefits to looking at the full sequence (again, communicated on page 18) - but we might consider how to best encode the SearchIO methods to mitigate end user confusion and mistakes. I think this is a great idea. Of course it's always best for end-users to RTFM and understand the tools they're using, but it's clearly beneficial to make it easier to do the right thing. Having not considered it too much, I'm not sure how to accomplish this without breaking the SearchIO idiom. But presumably a way could be found. >> Some of the folks on IRC suggested that we might want to integrate the >> hmmer.pm parser as well, modularizing this a bit and loading the correct >> parser depending on the requested format. > This might make sense, given that HMMER v3 is now live and seems to be adopted by researchers at an increasing rate. Since I used hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult to do, either. I think a thorough conversation on this point is warranted as others I've talked to have preferred the modules to be separate. > > I'd be interested to hear what other have to say on this point. I did not follow the IRC discussion, so I confess I'm not totally clear on what "integrate the hmmer.pm parser" means. I'm taking it to mean combining the code that parses HMMER2 with the code that parses HMMER3. But then "modularizing this a bit and loading the correct parser depending on the requested format" seems to contradict that assumption. Perhaps you (or someone) could clarify a bit what the HMMER2 - HMMER3 integration would look like (and the goal of doing so) ? Dave From thomas.sharpton at gmail.com Wed May 26 15:25:24 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 26 May 2010 08:25:24 -0700 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: Thanks for the feedback, Dave. >> So this brings up an interesting point. At some point, we'll have >> to build out a few additional SearchIO methods to incorporate some >> of the additional information encoded in the HMMER v3 reports. > > Would the new methods need to be added to SearchIO if they're > specific to H3? (as opposed to just being in the H3 sub-class) Sorry for being unclear - the methods in question would be, at least in my mind, specific to the H3 sub-class. > >> Sean talks a bit in the user manual about the importance of looking >> at both the full sequence and the best domain (see page 18 in the >> manual linked to on this page http://hmmer.janelia.org/#documentation) >> . For example, he mentions that one should consider the e-value of >> both the full sequence and best domain to ascertain if the query is >> homologous to a profile being considered via hmmsearch. >> >> He also mentions that looking at the full sequence report values >> without consideration of the best domain report values can be >> misleading. I'm not saying that your approach regarding Hit- >> >raw_score is wrong - proper interpretation of the results is up to >> the end user and there are benefits to looking at the full sequence >> (again, communicated on page 18) - but we might consider how to >> best encode the SearchIO methods to mitigate end user confusion and >> mistakes. > > I think this is a great idea. > > Of course it's always best for end-users to RTFM and understand the > tools they're using, but it's clearly beneficial to make it easier > to do the right thing. > > Having not considered it too much, I'm not sure how to accomplish > this without breaking the SearchIO idiom. But presumably a way could > be found. > I'll see if I can't hit the drawing board and come up with a naming scheme for additional H3 methods that retrieve some of the extra data encoded in the new reports. It *probably* makes most sense, at least from the standpoint of the user's perspective, to adopt the full- length report values as the standard hit->significance and hit- >raw_score while having something like hit->best_significance and hit- >best_score as H3 methods that return the best-domain report values. Again, this could use some thought/discussion. > >>> Some of the folks on IRC suggested that we might want to integrate >>> the >>> hmmer.pm parser as well, modularizing this a bit and loading the >>> correct >>> parser depending on the requested format. > >> This might make sense, given that HMMER v3 is now live and seems to >> be adopted by researchers at an increasing rate. Since I used >> hmmer.pm as a template for hmmer3.pm, it shouldn't be too difficult >> to do, either. I think a thorough conversation on this point is >> warranted as others I've talked to have preferred the modules to be >> separate. >> >> I'd be interested to hear what other have to say on this point. > > I did not follow the IRC discussion, so I confess I'm not totally > clear on what "integrate the hmmer.pm parser" means. I'm taking it > to mean combining the code that parses HMMER2 with the code that > parses HMMER3.= > But then "modularizing this a bit and loading the correct parser > depending on the requested format" seems to contradict that > assumption. > > Perhaps you (or someone) could clarify a bit what the HMMER2 - > HMMER3 integration would look like (and the goal of doing so) ? > I was not a part of that conversation either and I'm also operating under a similar assumption about what "integrating the hmmer.pm parser" means. I too am confused about the statement regarding modularization; I assume Kai meant that next_result would leverage the HMMER version number (which it already grabs) to guide the appropriate parsing of the datafile. Not thinking about this too carefully, it might be a simple as: next_result{ version = get_hmmer_version if version == 2 parse V2 report file if version == 3 parse V3 report file } to make the code a bit more manageable, the various version parsers could be appropriated to independent subroutines. Kai, is this along the lines of what you were thinking? If this is correct (that is, merging the H2 and H3 parsers into a single hmmer.pm module), I see one primary benefit - the end user need not specify which HMMER module they want to implement, just use Bio::SearchIO::hmmer - and one secondary benefit - there's enough similarity between H2 and H3 reports that some from the H2 parser redundantly appears in the H3 parser. There are certainly other benefits that I'm overlooking. The only real downside I see at the moment is that the hmmer.pm parser becomes a bit more complicated and bloated. But I suspect this can be remedied with careful partitioning of the code into appropriate subroutines and thorough documentation. I am a bit concerned about how the aforementioned H3 specific methods are incorporated, but that should be manageable. I wonder if anyone involved in the IRC discussion cares to weigh in? Regardless, I'd advocate getting the H3 version fully flushed out to deal with the issues brought up in the first half of this message prior to an attempt to merge the two modules, as the merging process may be affected by the structure of the H3 parser. Best, Tom From cjfields at illinois.edu Wed May 26 16:13:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 11:13:59 -0500 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: On May 26, 2010, at 9:03 AM, Dave Messina wrote: > > On May 26, 2010, at 15:27, Hilmar Lapp wrote: > >> That would be my understanding too. Maybe some overzealous Bioperl gitizens at work who weren't going to wait for this? ;) > > > That would be me. :) > > His commits were sitting in the fork queue, which I mistakenly understood to mean a pull request had been made. Turns out that's not the case (See http://github.com/blog/270-the-fork-queue). > > > Dave We can clarify that in the docs on the bioperl site, maybe in a github-specific section. chris From cjfields at illinois.edu Wed May 26 16:17:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 11:17:50 -0500 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: <3826604E-CD90-42A5-A0B2-004D9922B6AA@illinois.edu> On May 26, 2010, at 10:25 AM, Thomas Sharpton wrote: >> ... >> I did not follow the IRC discussion, so I confess I'm not totally clear on what "integrate the hmmer.pm parser" means. I'm taking it to mean combining the code that parses HMMER2 with the code that parses HMMER3.= > >> But then "modularizing this a bit and loading the correct parser depending on the requested format" seems to contradict that assumption. >> >> Perhaps you (or someone) could clarify a bit what the HMMER2 - HMMER3 integration would look like (and the goal of doing so) ? >> > > I was not a part of that conversation either and I'm also operating under a similar assumption about what "integrating the hmmer.pm parser" means. I too am confused about the statement regarding modularization; I assume Kai meant that next_result would leverage the HMMER version number (which it already grabs) to guide the appropriate parsing of the datafile. Not thinking about this too carefully, it might be a simple as: > > next_result{ > version = get_hmmer_version > if version == 2 > parse V2 report file > if version == 3 > parse V3 report file > } > > to make the code a bit more manageable, the various version parsers could be appropriated to independent subroutines. > > Kai, is this along the lines of what you were thinking? > > If this is correct (that is, merging the H2 and H3 parsers into a single hmmer.pm module), I see one primary benefit - the end user need not specify which HMMER module they want to implement, just use Bio::SearchIO::hmmer - and one secondary benefit - there's enough similarity between H2 and H3 reports that some from the H2 parser redundantly appears in the H3 parser. There are certainly other benefits that I'm overlooking. > > The only real downside I see at the moment is that the hmmer.pm parser becomes a bit more complicated and bloated. But I suspect this can be remedied with careful partitioning of the code into appropriate subroutines and thorough documentation. I am a bit concerned about how the aforementioned H3 specific methods are incorporated, but that should be manageable. > > I wonder if anyone involved in the IRC discussion cares to weigh in? > > Regardless, I'd advocate getting the H3 version fully flushed out to deal with the issues brought up in the first half of this message prior to an attempt to merge the two modules, as the merging process may be affected by the structure of the H3 parser. > > Best, > Tom That's essentially the idea, though it can be cleaner than that if we're expecting the entire stream of reports will be of the same version (set the proper next_result method at instantiation). SearchIO::infernal does something like this. Or it can call out to a handler, like SearchIO::blastxml. YMMV. chris From maj at fortinbras.us Wed May 26 17:43:37 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 May 2010 13:43:37 -0400 Subject: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) In-Reply-To: <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail><9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net><87zkznb4nz.fsf@topper.koldfront.dk><1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: <85C731A2326D45FB903FB1B0D5C5DEBF@NewLife> No zeal is is overweening that is on the side of the Right. ----- Original Message ----- From: "Dave Messina" To: "Hilmar Lapp" Cc: "Adam Sj?gren" ; Sent: Wednesday, May 26, 2010 10:03 AM Subject: Re: [Bioperl-l] return type of $feature->seq() (comments on acommit [bioperl/bioperl-live fcd90e0]) > > On May 26, 2010, at 15:27, Hilmar Lapp wrote: > >> That would be my understanding too. Maybe some overzealous Bioperl gitizens >> at work who weren't going to wait for this? ;) > > > That would be me. :) > > His commits were sitting in the fork queue, which I mistakenly understood to > mean a pull request had been made. Turns out that's not the case (See > http://github.com/blog/270-the-fork-queue). > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Wed May 26 19:03:21 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:03:21 +0200 Subject: [Bioperl-l] new to this group In-Reply-To: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: Hi Kanna, Welcome! We're always happy to have more people jump in the deep end of the pool and help out. >From my reading of the project priority page, the OBO file parsing stuff has been done: > (This appears to be basically solved with the new OBOEngine, Sohel will need to comment if it is indeed finished). --jason stajich 20:10, 19 June 2006 (EDT) ( see http://www.bioperl.org/wiki/Project_priority_list#Ontology_file_parsing ) Can anyone (Hilmar?) who knows where we're at with this verify that our OBO parser is in good shape? I did notice this open bug, Kanna: bp_load_ontology ISBN title parsing error in OBO format http://bugzilla.open-bio.org/show_bug.cgi?id=2730 Is that something you might be interested in? > I visited the project priority list and the page seems to have been modified around 6 months ago. Agreed, it's probably time for someone to go through and update it. I'll post to the list separately about this. > If it is already completed could anyone suggest modules I can contribute to? But even though the project priority list is outdated, the open bugs list is not: http://bugzilla.open-bio.org/buglist.cgi?product=Bioperl&bug_status=NEW I would recommend you look for something relatively small to start with and submit a patch for that. And then as you go along we'll get a better idea of how to direct you as you get a better idea of what needs to be done. Dave From David.Messina at sbc.su.se Wed May 26 19:22:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:22:40 +0200 Subject: [Bioperl-l] project priority list Message-ID: <0DC6E827-8855-4463-8C58-79CC26BDF42D@sbc.su.se> So, as pointed out by Kanna in another thread, our Project Priority list is getting a little stale. http://www.bioperl.org/wiki/Project_priority_list There are lot of things on there that have been crossed off for years now. I propose that we do some housecleaning, including deleting long-finished projects from the list. (They'll still live on in the wiki history of the page.) Unless someone objects, I'll start poking at it a bit, but if other core devs with relevant knowledge of various projects could take a moment to peruse and edit too, that would be great. Dave From jay at jays.net Wed May 26 19:27:01 2010 From: jay at jays.net (Jay Hannah) Date: Wed, 26 May 2010 14:27:01 -0500 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: <1D273263-F9B4-4612-961B-E2B0F480FBC3@jays.net> On May 26, 2010, at 2:03 PM, Dave Messina wrote: > I would recommend you look for something relatively small to start with and submit a patch for that. Ideally "submit a patch" means create a github.com account, click "fork" on the bioperl-live repo, commit your changes into your fork, then send us a "pull request". :) Jay Hannah http://biodoc.ist.unomaha.edu/wiki/User:Jhannah From scott at scottcain.net Wed May 26 19:36:16 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 26 May 2010 15:36:16 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git Message-ID: Hi all, For GBrowse on the 1.X branch there is a network install script that people can download and execute and it will install all of the prerequisites and then install GBrowse. For this script, we also support a -d(eveloper) option, to get GBrowse and BioPerl from their repositories. Now that BioPerl has moved to git, I have a question: does anybody know if there is a way (preferably via url) to get bioperl from git in a non-interactive way? The read-only url on the bioperl-live git page, http://github.com/bioperl/bioperl-live.git, leads to a 404 error, and even if it didn't, I have a feeling that it would take a click or two to get to downloading source. Does anybody with more git-fu than me (which isn't a hard thing to have, since I don't have much) have any suggestions? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From David.Messina at sbc.su.se Wed May 26 19:41:10 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 21:41:10 +0200 Subject: [Bioperl-l] return type of $feature->seq() (comments on a commit [bioperl/bioperl-live fcd90e0]) In-Reply-To: References: <4bfc2cd094654_19c63fd441d592ec4b0@fe3.rs.github.com.tmail> <9B378E57-CB44-4C3B-88C0-F65B78B83F28@gmx.net> <87zkznb4nz.fsf@topper.koldfront.dk> <1883ACBA-085B-47F9-A1AF-7BBF2274EA40@drycafe.net> <8C56FFB1-1F29-4C71-9385-5298500CC435@sbc.su.se> Message-ID: <1F539D4E-D352-4F93-AF1E-E9324B970D34@sbc.su.se> > We can clarify that in the docs on the bioperl site, maybe in a github-specific section. I've stubbed it in on Using Git http://www.bioperl.org/wiki/Using_Git Please modify or expand as you see fit. Dave From scott at scottcain.net Wed May 26 19:57:21 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 26 May 2010 15:57:21 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: Also on the bioperl git page is a "download master" link, which pops up a cute javascript window offering me a choice of zip or tar files. If I copy the url of the tar file, I get a page that says: You are being redirected. where presumably, the digits after "bioperl-release" will change on a regular basis (right?), so that doesn't help much either (yes, I know I could parse the redirect message and get that url, but really, is there such a thing as a HEAD url?) Thanks, Scott On Wed, May 26, 2010 at 3:36 PM, Scott Cain wrote: > Hi all, > > For GBrowse on the 1.X branch there is a network install script that > people can download and execute and it will install all of the > prerequisites and then install GBrowse. ?For this script, we also > support a -d(eveloper) option, to get GBrowse and BioPerl from their > repositories. ?Now that BioPerl has moved to git, I have a question: > does anybody know if there is a way (preferably via url) to get > bioperl from git in a non-interactive way? > > The read-only url on the bioperl-live git page, > http://github.com/bioperl/bioperl-live.git, leads to a 404 error, and > even if it didn't, I have a feeling that it would take a click or two > to get to downloading source. ?Does anybody with more git-fu than me > (which isn't a hard thing to have, since I don't have much) have any > suggestions? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From kai.blin at biotech.uni-tuebingen.de Wed May 26 20:07:02 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Wed, 26 May 2010 22:07:02 +0200 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: <1274904422.3019.2.camel@gonzo.home.kblin.org> On Wed, 2010-05-26 at 15:36 -0400, Scott Cain wrote: Hi Scott, > For GBrowse on the 1.X branch there is a network install script that > people can download and execute and it will install all of the > prerequisites and then install GBrowse. For this script, we also > support a -d(eveloper) option, to get GBrowse and BioPerl from their > repositories. Now that BioPerl has moved to git, I have a question: > does anybody know if there is a way (preferably via url) to get > bioperl from git in a non-interactive way? A quick look on the "BioPerl moved to git" announcement (http://news.open-bio.org/news/2010/05/bioperl-has-moved-to-github/) you can find the following link: http://github.com/bioperl/bioperl-live/archives/master This page gives links to a zip and a tar version of BioPerl's master repository, which seems to be what you want. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From David.Messina at sbc.su.se Wed May 26 20:09:22 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 May 2010 22:09:22 +0200 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: Message-ID: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Hi Scott, I think the URLs you want are these http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots snapshots of the current repository. If you want instead to grab a static version of a repository, say a tagged revision, you can do like this: http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 (where "for_gmod_0_003" is the tag). By the way, I am getting these URLs on GitHub by: 1. going to the GitHub page for the relevant repository e.g. http://github.com/bioperl/bioperl-live 2. navigating to the tag or branch of interest using the "Switch Branches" or "Switch Tags" pulldowns 3. clicking on the Download Source button 4. right-clicking on the big TAR icon to copy the link underlying it Dave From rmb32 at cornell.edu Wed May 26 20:48:13 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 26 May 2010 13:48:13 -0700 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: <4BFD890D.4080205@cornell.edu> Sigh .... once we get our house in order to the point where it's easy to and quick to make releases with bugfixes, you'll be able to just get the most recent copies of the parts you need from CPAN. That'll be the day. Rob From hlapp at drycafe.net Wed May 26 22:05:36 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 26 May 2010 16:05:36 -0600 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: On May 26, 2010, at 1:03 PM, Dave Messina wrote: > Can anyone (Hilmar?) who knows where we're at with this verify that > our OBO parser is in good shape? The obo parser should be working. It's not wrapping the go-perl parser though. I should revisit the code I've written for that, I know ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Wed May 26 23:27:27 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 May 2010 18:27:27 -0500 Subject: [Bioperl-l] new to this group In-Reply-To: References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> Message-ID: <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> On May 26, 2010, at 5:05 PM, Hilmar Lapp wrote: > > On May 26, 2010, at 1:03 PM, Dave Messina wrote: > >> Can anyone (Hilmar?) who knows where we're at with this verify that our OBO parser is in good shape? > > > The obo parser should be working. It's not wrapping the go-perl parser though. I should revisit the code I've written for that, I know ... > > -hilmar So, that might be an area for someone to work on? chris From hlapp at drycafe.net Thu May 27 13:30:05 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 27 May 2010 07:30:05 -0600 Subject: [Bioperl-l] new to this group In-Reply-To: <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> References: <2514e8ac-318a-49bd-bd50-c610e94b0635@i31g2000vbt.googlegroups.com> <578E55E2-FFC8-45CA-88AA-591E389D2A44@illinois.edu> Message-ID: <292C7384-2EF0-45F7-85F9-BB173FE2B6E5@drycafe.net> On May 26, 2010, at 5:27 PM, Chris Fields wrote: >> The obo parser should be working. It's not wrapping the go-perl >> parser though. I should revisit the code I've written for that, I >> know ... >> > > So, that might be an area for someone to work on? Certainly if you want to start from scratch. The code I've written isn't committed (yes, shame on me). That said, I suppose I could now easily commit it to a branch and not cause any harm, right :-) It's not a very good target for a newcomer at all, though. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From kai.blin at biotech.uni-tuebingen.de Thu May 27 14:50:40 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 27 May 2010 16:50:40 +0200 Subject: [Bioperl-l] hmmer3/hmmscan parser In-Reply-To: References: <1274478997.1997.4.camel@gonzo.home.kblin.org> <2500BF18-ED75-44FE-B5BA-F0E0C2B6D1DB@illinois.edu> <1274767107.2271.11.camel@gonzo.home.kblin.org> <213C059E-3631-4E4D-8DB8-47B082ECC870@gmail.com> <1274824229.2271.60.camel@gonzo.home.kblin.org> Message-ID: <1274971840.9545.316.camel@mikropc7.biotech.uni-tuebingen.de> On Wed, 2010-05-26 at 08:25 -0700, Thomas Sharpton wrote: > > Having not considered it too much, I'm not sure how to accomplish > > this without breaking the SearchIO idiom. But presumably a way could > > be found. > > > > I'll see if I can't hit the drawing board and come up with a naming > scheme for additional H3 methods that retrieve some of the extra data > encoded in the new reports. It *probably* makes most sense, at least > from the standpoint of the user's perspective, to adopt the full- > length report values as the standard hit->significance and hit- > >raw_score while having something like hit->best_significance and hit- > >best_score as H3 methods that return the best-domain report values. > Again, this could use some thought/discussion. My reasoning for the change was that you can get at the best sequence score by (at worst) iterating over the top sequences. Without the change there was no way to get at the overall profile score, so that data was lost. Arguably this is just one way to try and make the data from the HMMer results accessible via the SearchIO interface. > I was not a part of that conversation either and I'm also operating > under a similar assumption about what "integrating the hmmer.pm > parser" means. I too am confused about the statement regarding > modularization; I assume Kai meant that next_result would leverage the > HMMER version number (which it already grabs) to guide the appropriate > parsing of the datafile. Not thinking about this too carefully, it > might be a simple as: > > next_result{ > version = get_hmmer_version > if version == 2 > parse V2 report file > if version == 3 > parse V3 report file > } > > to make the code a bit more manageable, the various version parsers > could be appropriated to independent subroutines. > > Kai, is this along the lines of what you were thinking? Yes, this is more or less what I meant. But I agree that we first want to get the hmmer3 parser sorted out and working nicely. More test cases for the parser would be nice, I just got sidetracked by another bug affecting my code. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Interfakult?res Institut f?r Mikrobiologie und Infektionsmedizin Abteilung Mikrobiologie/Biotechnologie Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From scott at scottcain.net Thu May 27 15:29:42 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 27 May 2010 11:29:42 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: Hi All, Thanks for pointing out the links. It's weird: using curl on those urls retrieves a "redirect" page, whereas LWP::Simple::mirror gets the tarball. Anyway, the script works again :-) Scott On Wed, May 26, 2010 at 4:09 PM, Dave Messina wrote: > Hi Scott, > > I think the URLs you want are these > > ? ? ? ?http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots > > snapshots of the current repository. > > > If you want instead to grab a static version of a repository, say a tagged revision, you can do like this: > > http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 > > (where "for_gmod_0_003" is the tag). > > > By the way, I am getting these URLs on GitHub by: > > 1. ?going to the GitHub page for the relevant repository > > ? ? ? ?e.g. http://github.com/bioperl/bioperl-live > > 2. ?navigating to the tag or branch of interest using the "Switch Branches" or "Switch Tags" pulldowns > > 3. ?clicking on the Download Source button > > 4. ?right-clicking on the big TAR icon to copy the link underlying it > > > > Dave > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Thu May 27 15:40:37 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 May 2010 11:40:37 -0400 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Message-ID: Chris, Removed all erroneous references to Subversion except for these pages, which require detailed editing and/or a familiarity with Git: http://www.bioperl.org/wiki/Emacs_bioperl-mode http://www.bioperl.org/wiki/HOWTO:Wrappers http://www.bioperl.org/wiki/Making_a_BioPerl_release http://www.bioperl.org/w/index.php/HOWTO:BlastPlus One issue now is the references to pedigree, microarray, GUI, pipeline, and ext, which only exist in SVN. Also GUI, pipeline, and microarray are unsupported, and have been unsupported for many years. Yet they are still listed in pages like: http://www.bioperl.org/wiki/Getting_BioPerl They shouldn't be listed alongside bioperl-live or -run, or they should not be listed at all. Should they be removed? or put into their own "unsupported" section? Brian O. On May 20, 2010, at 11:37 AM, Chris Fields wrote: > Yes, if you have time. I have started along that path already, but I'm sure there are lingering spots where links point to the wrong place, or subversion/svn is mentioned. > > chris > > On May 20, 2010, at 10:34 AM, Brian Osborne wrote: > >> Chris, >> >> Done, easy. Should I remove all references to SVN from the Wiki? >> >> Brian O. >> >> On May 18, 2010, at 2:04 PM, Chris Fields wrote: >> >>> Yes. >>> >>> chris >>> >>> On May 18, 2010, at 11:06 AM, Brian Osborne wrote: >>> >>>> bioperl-l, >>>> >>>> Just noticed that the links on http://www.bioperl.org/wiki/Bioperl_scripts point to http://code.open-bio.org/svnweb/. >>>> >>>> We want these to point to github, yes? I'll fix it if the answer is 'yes'. >>>> >>>> Brian O. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From cjfields at illinois.edu Thu May 27 15:58:06 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 May 2010 10:58:06 -0500 Subject: [Bioperl-l] Re-point "Bioperl scripts"? In-Reply-To: References: <503F990B-FD09-4FE5-8D39-9CE68C97C5A2@verizon.net> <8822DF59-46E8-453C-9ABB-D9D845640ADB@illinois.edu> <04E1221B-CE9E-41C5-BAC8-5992260EBB6B@verizon.net> <47B54CAA-6418-4EBA-B58B-ED413EBDE708@illinois.edu> Message-ID: On May 27, 2010, at 10:40 AM, Brian Osborne wrote: > Chris, > > Removed all erroneous references to Subversion except for these pages, which require detailed editing and/or a familiarity with Git: > > http://www.bioperl.org/wiki/Emacs_bioperl-mode > > http://www.bioperl.org/wiki/HOWTO:Wrappers > > http://www.bioperl.org/wiki/Making_a_BioPerl_release > > http://www.bioperl.org/w/index.php/HOWTO:BlastPlus Okay, looks good so far. I know the emacs mode stuff will be handled by Mark (I'm assuming the others will follow suit). I'll have to go in and clean up the 'making a release' page myself to update it. > One issue now is the references to pedigree, microarray, GUI, pipeline, and ext, which only exist in SVN. By 'only existing in svn', do you mean they are only found there? I moved everything over for archiving: http://github.com/bioperl/bioperl-gui http://github.com/bioperl/bioperl-microarray http://github.com/bioperl/bioperl-pedigree http://github.com/bioperl/bioperl-pipeline > Also GUI, pipeline, and microarray are unsupported, and have been unsupported for many years. Yet they are still listed in pages like: > > http://www.bioperl.org/wiki/Getting_BioPerl > > They shouldn't be listed alongside bioperl-live or -run, or they should not be listed at all. > > Should they be removed? or put into their own "unsupported" section? I think to an 'unsupported' or 'unmaintained' section; could add the corba and pise ones as well (just noticed that the pise repo was missing from github, so just added it for archiving). > Brian O. Thanks brian! chris From sdavis2 at mail.nih.gov Thu May 27 16:04:04 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 27 May 2010 12:04:04 -0400 Subject: [Bioperl-l] Automatic download of bioperl from git In-Reply-To: References: <18356B85-CE41-4FD8-A2AB-72ECA022B7FD@sbc.su.se> Message-ID: On Thu, May 27, 2010 at 11:29 AM, Scott Cain wrote: > Hi All, > > Thanks for pointing out the links. It's weird: using curl on those > urls retrieves a "redirect" page, whereas LWP::Simple::mirror gets the > tarball. Anyway, the script works again :-) > > Hi, Scott. For curl, try: curl -L .... The -L follows redirects. Sean > > On Wed, May 26, 2010 at 4:09 PM, Dave Messina > wrote: > > Hi Scott, > > > > I think the URLs you want are these > > > > http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots > > > > snapshots of the current repository. > > > > > > If you want instead to grab a static version of a repository, say a > tagged revision, you can do like this: > > > > http://github.com/bioperl/bioperl-live/tarball/for_gmod_0_003 > > > > (where "for_gmod_0_003" is the tag). > > > > > > By the way, I am getting these URLs on GitHub by: > > > > 1. going to the GitHub page for the relevant repository > > > > e.g. http://github.com/bioperl/bioperl-live > > > > 2. navigating to the tag or branch of interest using the "Switch > Branches" or "Switch Tags" pulldowns > > > > 3. clicking on the Download Source button > > > > 4. right-clicking on the big TAR icon to copy the link underlying it > > > > > > > > Dave > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From remi.planel at free.fr Fri May 28 10:29:50 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 12:29:50 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult Message-ID: <4BFF9B1E.10500@free.fr> Hi all, I would like to get a clone of a Bio::Search::Result::GenericResult object and I'm not sure of what I'm doing ... I've tried something like : /my $searchIn = Bio::SearchIO->new( -file => 'result.bls', -format => 'blastxml', ); my $result = $searchIn->next_result; my $result_copy = $result->new($result); /It seems to work but I'm not sure to understand how. So I would like to know if I'll get in trouble using this code and if all the fields are copied one by one. Thank you, R?mi // From David.Messina at sbc.su.se Fri May 28 11:32:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 28 May 2010 13:32:40 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFF9B1E.10500@free.fr> References: <4BFF9B1E.10500@free.fr> Message-ID: Hi R?mi, As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). So I don't think the code you showed will work. However, there are modules such as Clone::More and Clone::Fast that can do it. http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. Dave From remi.planel at free.fr Fri May 28 12:17:01 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 14:17:01 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: References: <4BFF9B1E.10500@free.fr> Message-ID: <4BFFB43D.50409@free.fr> You're right, it's not working there is some missing fields ... Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : -Display Result object as HTML -Ask for filter criteria -Filter Result object -Display filtered Result object as HTML. ... etc And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. I'll have a look to the modules you've mentioned, thanks. Dave Messina wrote: > Hi R?mi, > > As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). > > So I don't think the code you showed will work. > > However, there are modules such as Clone::More and Clone::Fast that can do it. > > http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm > http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm > > > Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. > > Dave > > > From cjfields at illinois.edu Fri May 28 13:25:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 28 May 2010 08:25:54 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFFB43D.50409@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> Message-ID: <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> Remi, Using the constructor that way is not supported. But it's completely unnecessary. Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. Something like the following should work (of course completely untested :) my $result = $in->next_result; # filter on HSP write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); # rewind the result to go back to the beginning $result->rewind; # open a new filehandle here for second report output # filter on hit and HSP write_html('result2.html', $result, { 'HIT' => \&hit_filter, 'HSP' => \&hsp_filter }); # rewind the result to go back to the beginning $result->rewind; # and so on.... sub write_html { my ($file, $result, $filters) = @_; # note that $filter is a hash ref above my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new (-filters => $filters ); my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); $out->write_result($result); } sub hsp_filter { my $hsp = shift; return 1 if $hsp->length('total') > 100; } sub hit_filter { my $hit = shift; return 1 if $hit->significance < 1e-5; } chris On May 28, 2010, at 7:17 AM, Remi wrote: > You're right, it's not working there is some missing fields ... > > Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : > > -Display Result object as HTML > -Ask for filter criteria > -Filter Result object > -Display filtered Result object as HTML. > ... etc > > And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. > > I'll have a look to the modules you've mentioned, thanks. > > > > > Dave Messina wrote: >> Hi R?mi, >> >> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >> >> So I don't think the code you showed will work. >> >> However, there are modules such as Clone::More and Clone::Fast that can do it. >> >> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >> >> >> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >> >> Dave >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri May 28 14:34:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 28 May 2010 09:34:13 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4BFFD3D5.2000409@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> Message-ID: Let us know how it goes, and if you run into any bugs. chris On May 28, 2010, at 9:31 AM, Remi wrote: > Thank you very much !!!! > I'm gonna try it right away > > Chris Fields wrote: >> Remi, >> >> Using the constructor that way is not supported. But it's completely unnecessary. >> >> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >> >> Something like the following should work (of course completely untested :) >> >> my $result = $in->next_result; >> >> # filter on HSP >> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >> >> # rewind the result to go back to the beginning >> $result->rewind; >> >> # open a new filehandle here for second report output >> # filter on hit and HSP >> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >> 'HSP' => \&hsp_filter }); >> >> # rewind the result to go back to the beginning >> $result->rewind; >> >> # and so on.... >> >> sub write_html { >> my ($file, $result, $filters) = @_; >> # note that $filter is a hash ref above >> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >> (-filters => $filters ); >> >> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >> $out->write_result($result); >> } >> >> sub hsp_filter { >> my $hsp = shift; >> return 1 if $hsp->length('total') > 100; >> } >> >> sub hit_filter { >> my $hit = shift; >> return 1 if $hit->significance < 1e-5; >> } >> >> chris >> >> >> On May 28, 2010, at 7:17 AM, Remi wrote: >> >> >> >>> You're right, it's not working there is some missing fields ... >>> >>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>> >>> -Display Result object as HTML >>> -Ask for filter criteria >>> -Filter Result object >>> -Display filtered Result object as HTML. >>> ... etc >>> >>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>> >>> I'll have a look to the modules you've mentioned, thanks. >>> >>> >>> >>> >>> Dave Messina wrote: >>> >>> >>>> Hi R?mi, >>>> >>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>> >>>> So I don't think the code you showed will work. >>>> >>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>> >>>> >>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>> >>>> >>>> >>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>> >>>> Dave >>>> >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> > From remi.planel at free.fr Fri May 28 14:31:49 2010 From: remi.planel at free.fr (Remi) Date: Fri, 28 May 2010 16:31:49 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> Message-ID: <4BFFD3D5.2000409@free.fr> An HTML attachment was scrubbed... URL: From fij at elte.hu Sun May 30 09:32:58 2010 From: fij at elte.hu (Farkas, Illes) Date: Sun, 30 May 2010 11:32:58 +0200 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) Message-ID: Hi, I've ran across a relatively simple, but specific task. I would like to put interaction (, , ) data from many sources (databases) into a single list containing the following in each record: , , , . (I am aware that there will be some loss during the ID conversion.) I have found so far the following possibilities: (1) BioMart perl API. Seems to be much smarter (and more complex) than what I would need. Also, I would need to parse input and output just as much as with newly written subroutines/modules. (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and KEGG IDs, but I could not find them on the "From" list. (3) Synergizer. I cannot run it in remote batch mode. From what I would need I could not find BioGrid, ENSP and KEGG identifiers. (4) Writing it all with ID mapping files downloaded from each database and contributing it to BioPerl. How can I contribute? How do I find the best place within BioPerl to add a particular module? Whom do I need to ask for approval? Thanks in advance for any comments. Illes -- http://hal.elte.hu/fij From maj at fortinbras.us Sun May 30 13:42:50 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 30 May 2010 09:42:50 -0400 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: Message-ID: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Illes-- no approval necessary (or, if you like, I approve). What you can do is describe what you want to do as an enhancement request at http://bugzilla.bioperl.org, and then attach your new code to that request. We can review it from there. cheers MAJ ----- Original Message ----- From: "Farkas, Illes" To: Sent: Sunday, May 30, 2010 5:32 AM Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > Hi, > > I've ran across a relatively simple, but specific task. I would like to put > interaction (, , ) data from many sources > (databases) into a single list containing the following in each record: > , , , > . (I am aware that there will be some loss during the ID > conversion.) > > I have found so far the following possibilities: > > (1) BioMart perl API. Seems to be much smarter (and more complex) than what > I would need. Also, I would need to parse input and output just as much as > with newly written subroutines/modules. > > (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and > KEGG IDs, but I could not find them on the "From" list. > > (3) Synergizer. I cannot run it in remote batch mode. From what I would need > I could not find BioGrid, ENSP and KEGG identifiers. > > (4) Writing it all with ID mapping files downloaded from each database and > contributing it to BioPerl. How can I contribute? How do I find the best > place within BioPerl to add a particular module? Whom do I need to ask for > approval? > > Thanks in advance for any comments. > Illes > > -- > http://hal.elte.hu/fij > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sun May 30 15:00:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 30 May 2010 10:00:09 -0500 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> References: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Message-ID: Another couple of options: 1) for code changes, fork the code on GitHub, add your code there, then make a push request 2) for adding code, create a repo on github with the code, chris On May 30, 2010, at 8:42 AM, Mark A. Jensen wrote: > Illes-- no approval necessary (or, if you like, I approve). What you can do is describe what you want to do as an enhancement request at http://bugzilla.bioperl.org, and then attach your new code to that request. We can review it from there. > cheers MAJ > ----- Original Message ----- From: "Farkas, Illes" > To: > Sent: Sunday, May 30, 2010 5:32 AM > Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > > >> Hi, >> >> I've ran across a relatively simple, but specific task. I would like to put >> interaction (, , ) data from many sources >> (databases) into a single list containing the following in each record: >> , , , >> . (I am aware that there will be some loss during the ID >> conversion.) >> >> I have found so far the following possibilities: >> >> (1) BioMart perl API. Seems to be much smarter (and more complex) than what >> I would need. Also, I would need to parse input and output just as much as >> with newly written subroutines/modules. >> >> (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and >> KEGG IDs, but I could not find them on the "From" list. >> >> (3) Synergizer. I cannot run it in remote batch mode. From what I would need >> I could not find BioGrid, ENSP and KEGG identifiers. >> >> (4) Writing it all with ID mapping files downloaded from each database and >> contributing it to BioPerl. How can I contribute? How do I find the best >> place within BioPerl to add a particular module? Whom do I need to ask for >> approval? >> >> Thanks in advance for any comments. >> Illes >> >> -- >> http://hal.elte.hu/fij >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun May 30 15:05:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 30 May 2010 10:05:37 -0500 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: Message-ID: <84D300DB-C22D-494E-ABAF-EBC10FEE0E7C@illinois.edu> On May 30, 2010, at 4:32 AM, Farkas, Illes wrote: > Hi, > > I've ran across a relatively simple, but specific task. I would like to put > interaction (, , ) data from many sources > (databases) into a single list containing the following in each record: > , , , > . (I am aware that there will be some loss during the ID > conversion.) > > I have found so far the following possibilities: > > (1) BioMart perl API. Seems to be much smarter (and more complex) than what > I would need. Also, I would need to parse input and output just as much as > with newly written subroutines/modules. Or, wondering whether you could create a set of BioPerl<->BioMart bridge modules. > (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and > KEGG IDs, but I could not find them on the "From" list. I added an id_mapper to Bio::DB::SwissProt that calls to this. It hasn't been broadly tested yet, but you are welcome to add more to it. Might also be useful to have a DB wrapper around a locally-built ID mapping database, which would give you more flexibility than the web interface. > (3) Synergizer. I cannot run it in remote batch mode. From what I would need > I could not find BioGrid, ENSP and KEGG identifiers. > > (4) Writing it all with ID mapping files downloaded from each database and > contributing it to BioPerl. How can I contribute? How do I find the best > place within BioPerl to add a particular module? Whom do I need to ask for > approval? > > Thanks in advance for any comments. > Illes A generalized ID mapping interface would be nice. You could also incorporate some of NCBI's eutils stuff along these lines, or their gi2acc mappings. chris From maj at fortinbras.us Sun May 30 23:59:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 30 May 2010 19:59:38 -0400 Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) In-Reply-To: References: <88DFA90AA8A448E1969C77F73D8B5ECB@NewLife> Message-ID: <6553B9DFF86F472B8B2D0D8A72171056@NewLife> Yes, that's definitely the Way to Do It post-git- MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Farkas, Illes" ; Sent: Sunday, May 30, 2010 11:00 AM Subject: Re: [Bioperl-l] ID mapping (or: contributing to BioPerl) Another couple of options: 1) for code changes, fork the code on GitHub, add your code there, then make a push request 2) for adding code, create a repo on github with the code, chris On May 30, 2010, at 8:42 AM, Mark A. Jensen wrote: > Illes-- no approval necessary (or, if you like, I approve). What you can do is > describe what you want to do as an enhancement request at > http://bugzilla.bioperl.org, and then attach your new code to that request. We > can review it from there. > cheers MAJ > ----- Original Message ----- From: "Farkas, Illes" > To: > Sent: Sunday, May 30, 2010 5:32 AM > Subject: [Bioperl-l] ID mapping (or: contributing to BioPerl) > > >> Hi, >> >> I've ran across a relatively simple, but specific task. I would like to put >> interaction (, , ) data from many sources >> (databases) into a single list containing the following in each record: >> , , , >> . (I am aware that there will be some loss during the ID >> conversion.) >> >> I have found so far the following possibilities: >> >> (1) BioMart perl API. Seems to be much smarter (and more complex) than what >> I would need. Also, I would need to parse input and output just as much as >> with newly written subroutines/modules. >> >> (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and >> KEGG IDs, but I could not find them on the "From" list. >> >> (3) Synergizer. I cannot run it in remote batch mode. From what I would need >> I could not find BioGrid, ENSP and KEGG identifiers. >> >> (4) Writing it all with ID mapping files downloaded from each database and >> contributing it to BioPerl. How can I contribute? How do I find the best >> place within BioPerl to add a particular module? Whom do I need to ask for >> approval? >> >> Thanks in advance for any comments. >> Illes >> >> -- >> http://hal.elte.hu/fij >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 31 13:23:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 31 May 2010 08:23:13 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4C037F22.3090209@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> Message-ID: <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> That sounds like a bug. Does filtering at the hit level work around this? sub hit_filter { my $hit = shift; # filter hsps here my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; @passing_hsps; } sub hsp_filter { # original filter } chris On May 31, 2010, at 4:19 AM, Remi wrote: > Hi, > > Everything is working well but there is still one point that giving me some trouble. > When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. > Is there a way to get rid of this description line ? > Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? > > Thanks, > > R?mi > > > Chris Fields wrote: >> Let us know how it goes, and if you run into any bugs. >> >> chris >> >> On May 28, 2010, at 9:31 AM, Remi wrote: >> >> >> >>> Thank you very much !!!! >>> I'm gonna try it right away >>> >>> Chris Fields wrote: >>> >>> >>>> Remi, >>>> >>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>> >>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>> >>>> Something like the following should work (of course completely untested :) >>>> >>>> my $result = $in->next_result; >>>> >>>> # filter on HSP >>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>> >>>> # rewind the result to go back to the beginning >>>> $result->rewind; >>>> >>>> # open a new filehandle here for second report output >>>> # filter on hit and HSP >>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>> 'HSP' => \&hsp_filter }); >>>> >>>> # rewind the result to go back to the beginning >>>> $result->rewind; >>>> >>>> # and so on.... >>>> >>>> sub write_html { >>>> my ($file, $result, $filters) = @_; >>>> # note that $filter is a hash ref above >>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>> (-filters => $filters ); >>>> >>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >>>> $out->write_result($result); >>>> } >>>> >>>> sub hsp_filter { >>>> my $hsp = shift; >>>> return 1 if $hsp->length('total') > 100; >>>> } >>>> >>>> sub hit_filter { >>>> my $hit = shift; >>>> return 1 if $hit->significance < 1e-5; >>>> } >>>> >>>> chris >>>> >>>> >>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>> >>>> >>>> >>>> >>>> >>>>> You're right, it's not working there is some missing fields ... >>>>> >>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>> >>>>> -Display Result object as HTML >>>>> -Ask for filter criteria >>>>> -Filter Result object >>>>> -Display filtered Result object as HTML. >>>>> ... etc >>>>> >>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>> >>>>> I'll have a look to the modules you've mentioned, thanks. >>>>> >>>>> >>>>> >>>>> >>>>> Dave Messina wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Hi R?mi, >>>>>> >>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>> >>>>>> So I don't think the code you showed will work. >>>>>> >>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>> >>>>>> >>>>>> >>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>> >>>>>> Dave >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> >>>>> >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >> >> >> > From remi.planel at free.fr Mon May 31 13:47:40 2010 From: remi.planel at free.fr (Remi) Date: Mon, 31 May 2010 15:47:40 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> Message-ID: <4C03BDFC.5050109@free.fr> Yes, at the hit level everything works fine. Actually, at the hsp level, the alignment part is not written to the HTML file but the description before the alignment and the description of the hit at the beginning of the file are written. I had a quick look to the code and I'm not sure this is a bug. Chris Fields wrote: > That sounds like a bug. Does filtering at the hit level work around this? > > sub hit_filter { > my $hit = shift; > # filter hsps here > my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; > @passing_hsps; > } > > sub hsp_filter { > # original filter > } > > chris > > On May 31, 2010, at 4:19 AM, Remi wrote: > > >> Hi, >> >> Everything is working well but there is still one point that giving me some trouble. >> When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. >> Is there a way to get rid of this description line ? >> Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? >> >> Thanks, >> >> R?mi >> >> >> Chris Fields wrote: >> >>> Let us know how it goes, and if you run into any bugs. >>> >>> chris >>> >>> On May 28, 2010, at 9:31 AM, Remi wrote: >>> >>> >>> >>> >>>> Thank you very much !!!! >>>> I'm gonna try it right away >>>> >>>> Chris Fields wrote: >>>> >>>> >>>> >>>>> Remi, >>>>> >>>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>>> >>>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>>> >>>>> Something like the following should work (of course completely untested :) >>>>> >>>>> my $result = $in->next_result; >>>>> >>>>> # filter on HSP >>>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>>> >>>>> # rewind the result to go back to the beginning >>>>> $result->rewind; >>>>> >>>>> # open a new filehandle here for second report output >>>>> # filter on hit and HSP >>>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>>> 'HSP' => \&hsp_filter }); >>>>> >>>>> # rewind the result to go back to the beginning >>>>> $result->rewind; >>>>> >>>>> # and so on.... >>>>> >>>>> sub write_html { >>>>> my ($file, $result, $filters) = @_; >>>>> # note that $filter is a hash ref above >>>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>>> (-filters => $filters ); >>>>> >>>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); >>>>> $out->write_result($result); >>>>> } >>>>> >>>>> sub hsp_filter { >>>>> my $hsp = shift; >>>>> return 1 if $hsp->length('total') > 100; >>>>> } >>>>> >>>>> sub hit_filter { >>>>> my $hit = shift; >>>>> return 1 if $hit->significance < 1e-5; >>>>> } >>>>> >>>>> chris >>>>> >>>>> >>>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> You're right, it's not working there is some missing fields ... >>>>>> >>>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>>> >>>>>> -Display Result object as HTML >>>>>> -Ask for filter criteria >>>>>> -Filter Result object >>>>>> -Display filtered Result object as HTML. >>>>>> ... etc >>>>>> >>>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>>> >>>>>> I'll have a look to the modules you've mentioned, thanks. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Dave Messina wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Hi R?mi, >>>>>>> >>>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>>> >>>>>>> So I don't think the code you showed will work. >>>>>>> >>>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>>> >>>>>>> Dave >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> >>>>>> >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> > > From cjfields at illinois.edu Mon May 31 13:54:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 31 May 2010 08:54:22 -0500 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: <4C03BDFC.5050109@free.fr> References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> <4C037F22.3090209@free.fr> <706ECC08-E633-464C-9E6C-73ACB8244C18@illinois.edu> <4C03BDFC.5050109@free.fr> Message-ID: <454FE98D-4EE5-4DFB-A877-6DE7822C4DA4@illinois.edu> My concern is to ensure we aren't filtering twice as much (one at the hit level, one pass at the HSP level). It should be one pass. chris On May 31, 2010, at 8:47 AM, Remi wrote: > Yes, at the hit level everything works fine. > Actually, at the hsp level, the alignment part is not written to the HTML file but the description before the alignment and the description of the hit at the beginning of the file are written. > > I had a quick look to the code and I'm not sure this is a bug. > > Chris Fields wrote: >> That sounds like a bug. Does filtering at the hit level work around this? >> >> sub hit_filter { >> my $hit = shift; >> # filter hsps here >> my @passing_hsps = grep { hsp_filter($_) } $hit->hsps; >> @passing_hsps; >> } >> >> sub hsp_filter { >> # original filter >> } >> >> chris >> >> On May 31, 2010, at 4:19 AM, Remi wrote: >> >> >>> Hi, >>> >>> Everything is working well but there is still one point that giving me some trouble. >>> When I filter the hsps and all the hsps of a given hit are removed, the description line of the hit is still present in the HTML file. >>> Is there a way to get rid of this description line ? >>> Is the only solution to inherit from Bio::SearchIO::Writer::HTMLWriter and overriding the "to_string" method ? >>> >>> Thanks, >>> >>> R?mi >>> >>> >>> Chris Fields wrote: >>> >>>> Let us know how it goes, and if you run into any bugs. >>>> >>>> chris >>>> >>>> On May 28, 2010, at 9:31 AM, Remi wrote: >>>> >>>> >>>> >>>>> Thank you very much !!!! >>>>> I'm gonna try it right away >>>>> >>>>> Chris Fields wrote: >>>>> >>>>> >>>>>> Remi, >>>>>> >>>>>> Using the constructor that way is not supported. But it's completely unnecessary. >>>>>> Are you using Bio::SearchIO::Writer::HTMLWriter? It filters results/hits/HSPs as it writes the HTML, no need to clone. That in combination with GenericResult::rewind() should work. You can use that module, or inherit and override whatever methods are necessary. Or just use it as a reference on how to do what you need. >>>>>> Something like the following should work (of course completely untested :) >>>>>> >>>>>> my $result = $in->next_result; >>>>>> >>>>>> # filter on HSP >>>>>> write_html('result1.html', $result, { 'HSP' => \&hsp_filter }); >>>>>> >>>>>> # rewind the result to go back to the beginning >>>>>> $result->rewind; >>>>>> >>>>>> # open a new filehandle here for second report output >>>>>> # filter on hit and HSP >>>>>> write_html('result2.html', $result, { 'HIT' => \&hit_filter, >>>>>> 'HSP' => \&hsp_filter }); >>>>>> >>>>>> # rewind the result to go back to the beginning >>>>>> $result->rewind; >>>>>> >>>>>> # and so on.... >>>>>> >>>>>> sub write_html { >>>>>> my ($file, $result, $filters) = @_; >>>>>> # note that $filter is a hash ref above >>>>>> my $writer = Bio::SearchIO::Writer::HTMLResultWriter->new >>>>>> (-filters => $filters ); >>>>>> >>>>>> my $out = Bio::SearchIO->new(-writer => $writer, -file => $file); $out->write_result($result); >>>>>> } >>>>>> >>>>>> sub hsp_filter { my $hsp = shift; >>>>>> return 1 if $hsp->length('total') > 100; >>>>>> } >>>>>> >>>>>> sub hit_filter { my $hit = shift; >>>>>> return 1 if $hit->significance < 1e-5; >>>>>> } >>>>>> >>>>>> chris >>>>>> >>>>>> >>>>>> On May 28, 2010, at 7:17 AM, Remi wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> You're right, it's not working there is some missing fields ... >>>>>>> >>>>>>> Actually, I'm writing a script that filter Result Object based on some criteria and I want the script to be kind of interactive like : >>>>>>> >>>>>>> -Display Result object as HTML >>>>>>> -Ask for filter criteria >>>>>>> -Filter Result object >>>>>>> -Display filtered Result object as HTML. >>>>>>> ... etc >>>>>>> >>>>>>> And I would like to make a copy of the Result object before each filtering step in order to be able to redo it. >>>>>>> >>>>>>> I'll have a look to the modules you've mentioned, thanks. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Dave Messina wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi R?mi, >>>>>>>> >>>>>>>> As far as I know, cloning objects is not natively supported in BioPerl (or Perl itself, for that matter). >>>>>>>> >>>>>>>> So I don't think the code you showed will work. >>>>>>>> >>>>>>>> However, there are modules such as Clone::More and Clone::Fast that can do it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://search.cpan.org/~wazzuteke/Clone-More-0.90.2/lib/Clone/More.pm >>>>>>>> http://search.cpan.org/~wazzuteke/Clone-Fast-0.93/lib/Clone/Fast.pm >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Out of curiosity, what are you trying to do with the cloned objects? Someone might be able to suggest another way to accomplish the same goal. >>>>>>>> >>>>>>>> Dave >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> >>>>>>> >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From remi.planel at free.fr Mon May 31 09:19:30 2010 From: remi.planel at free.fr (Remi) Date: Mon, 31 May 2010 11:19:30 +0200 Subject: [Bioperl-l] Cloning Bio::Search::Result::GenericResult In-Reply-To: References: <4BFF9B1E.10500@free.fr> <4BFFB43D.50409@free.fr> <441CAD42-5B02-4606-BF41-A3DC2233F285@illinois.edu> <4BFFD3D5.2000409@free.fr> Message-ID: <4C037F22.3090209@free.fr> An HTML attachment was scrubbed... URL: