From n.haigh at sheffield.ac.uk Fri Dec 1 02:47:03 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 07:47:03 +0000 Subject: [Bioperl-l] Upgrading my BioPerl RC via ppm? In-Reply-To: <519167.29410.qm@web50804.mail.yahoo.com> References: <519167.29410.qm@web50804.mail.yahoo.com> Message-ID: <456FDDF7.1080403@sheffield.ac.uk> Caitlin wrote: > Hi all. > > I'm currently using BioPerl 1.5.2 RC2 but I've seen multiple references > to 1.5.2 RC5. Can anyone tell me how to upgrade to the latest version? > The ppm GUI (ActivePerl Build 819) doesn't include any BioPerl packages > among those deemed upgradable. > > Thanks, > > ~Katie > > > Hi Katie, Currently there is not an RC5 PPM package available - we are hoping to have the official 1.5.2 release out pretty soon and there will definitely be a PPM package for that! Are you experiencing any problems with your current version of bioperl? If not, there is no need to worry, once we've released an updated PPM package your PPM GUI should then be able to see it as an upgrade - hopefully! :o) Sendu, I know you were working on automatically generating PPM packages - what is the current situation with regards to this? Nath --- avast! Antivirus: Inbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 07:46:58 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 07:47:04 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 04:00:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:00:18 +0000 Subject: [Bioperl-l] BLASTing with a seqio/seq object... In-Reply-To: <456F27E9.70205@york.ac.uk> References: <01ba01c714a2$b9659c10$15327e82@pyrimidine> <456F27E9.70205@york.ac.uk> Message-ID: <456FEF22.4090004@sendu.me.uk> Samantha Thompson wrote: You missed a step... > use strict; > use Bio::Perl; > use Bio::Seq; > use Bio::SeqIO; > > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > > #seq bit > > #$seq_obj = Bio::Seq->new(-format => 'fasta'); > > my $seqio_obj = Bio::SeqIO->new(-file => > "/biol/people/mres/st537/MalEfasta.txt", -format => 'fasta'); > > my $seq_obj = $seqio_obj->next_seq; > > > > #blast bit > > my $remote_blast = Bio::Tools::Run::RemoteBlast->new ( > -prog => 'blastp', -db => 'nr', -expect => '1e-15' ); > > my $blast_report = $remote_blast->submit_blast($seq_obj); Go back to the Bptutorial: http://www.bioperl.org/wiki/Bptutorial.pl#Running_BLAST_.28using_RemoteBlast.pm.29 And you'll see that submit_blast doesn't return a SearchIO object. For a complete working example see the synopsis for RemoteBlast: http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/RemoteBlast.html > #new part for SearchIO... > > while( my $result = $blast_report->next_result ) { > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > if( $hsp->length('total') > 100 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Hit= ", $hit->name, > ",Length=", $hsp->length('total'), > ",Percent_id=", $hsp->percent_identity, "\n"; > } > } > } > } > } From bix at sendu.me.uk Fri Dec 1 04:03:13 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:03:13 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> Message-ID: <456FEFD1.4070704@sendu.me.uk> pelikan at cs.pitt.edu wrote: > Hello all, > > I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, > without Cygwin. The "make test"s have all completed without error. This > is my first time dealing with bioperl, so bear with me. > > I've successfully loaded the most recent taxonomy information using the > biosql-schema scripts. After this, I attempted to load the most recent > release of the uniprot flat file dataset with the following command: > > load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass > ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat > > I am subsequently greeted by many of the following errors: > > Could not store Q7N3Q6: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: The supplied lineage does not start near 'Photorhabdus luminescens > subsp. laumondii' In your uniprot_sprot.dat file there'll be some kind of entry with that Photorhabdus species. Can you post that entry (sans sequence if it has one) so I can take a look at it? Maybe post a few that cause problems, and a few that don't. From bix at sendu.me.uk Fri Dec 1 04:19:09 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:19:09 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <000301c714b4$7846e790$15327e82@pyrimidine> References: <000301c714b4$7846e790$15327e82@pyrimidine> Message-ID: <456FF38D.3070508@sendu.me.uk> Chris Fields wrote: >> Nathan S. Haigh wrote: >>> More updates: >>> >>> After the failed install I updating Module::Build, and re-ran the >>> install, I get: >>> >>> -- snip -- >>> Creating new 'Build' script for 'bioperl' version '1.005002005' >>> Warning: while trying to determine prerequisites for >>> S/SE/SENDU/bioperl-1.5.2_005-RCb.tar.gz wi th the help of >>> Module::Build the following error occurred: 'Failed to re-load >>> 'ModuleBuildBiope >>> rl': Can't locate ModuleBuildBioperl.pm in @INC (@INC contains: >>> _build\lib C:\Perl\site\lib C:\ >>> Perl\lib C:\Documents and Settings\test) at (eval 105) line 1. >>> ' >>> >>> Falling back to META.yml for prerequisites 'YAML' not installed, >>> cannot parse 'C:\Perl\cpan\build\bioperl-1.5.2_005-RC\META.yml' >>> -- snip -- >> I had that problem fleetingly and it drove me crazy because >> later I couldn't reproduce it. Is it reproducible on your end? > > During Module::Build installation I see this: > > ... > t\metadata........ok > 8/43 skipped: YAML_support feature is not enabled You were pointing out the YAML issue? I think I'm less concerned with that (solution: install YAML) and much more concerned with why it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The module in question is in the same dir as the Build script, so it should be found automatically. The only thing I can think of is that CPAN doesn't manage to chdir to the directory. Hopefully I'll be able to reproduce this and then I can investigate further. From n.haigh at sheffield.ac.uk Fri Dec 1 04:26:22 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 09:26:22 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF233.6040704@sendu.me.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> Message-ID: <456FF53E.90907@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: >> >> I know that setting up the PPM is a pain, but I have to say it is >> much faster, and all required PPMs are available. Which makes me >> curious: why bother with trying out a CPAN installation process at >> this point, especially when you have to use PPM to install some of >> the prereqs properly anyway? > > Firstly, problems discovered and resulting fixes will help all > platforms, not just Windows. So thanks for trying it out and reporting > back. Secondly, the PPM method, like Bundle::BioPerl, is > all-or-nothing. The CPAN installation method allows an interactive > choice of which optional things to install. > > If what you say about DB_File is true, then that's a great shame! > > > So I can do further trouble-shooting of my own, what is the sure-fire > way to completely clean-out an ActivePerl install, including any > modules you might have installed with PPMs or CPAN? > > In addition, using CPAN allows you to run the test suite easily without the need to download it separately and run it after a PPM install. I don't know of a way to clean out ActivePerl - I use VMWare Workstation and have a virtual machine with a fresh install of WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas? Nath --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 09:26:23 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 04:13:23 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:13:23 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> Message-ID: <456FF233.6040704@sendu.me.uk> Chris Fields wrote: > > I know that setting up the PPM is a pain, but I have to say it is much > faster, and all required PPMs are available. Which makes me curious: > why bother with trying out a CPAN installation process at this point, > especially when you have to use PPM to install some of the prereqs > properly anyway? Firstly, problems discovered and resulting fixes will help all platforms, not just Windows. So thanks for trying it out and reporting back. Secondly, the PPM method, like Bundle::BioPerl, is all-or-nothing. The CPAN installation method allows an interactive choice of which optional things to install. If what you say about DB_File is true, then that's a great shame! So I can do further trouble-shooting of my own, what is the sure-fire way to completely clean-out an ActivePerl install, including any modules you might have installed with PPMs or CPAN? From cjfields at uiuc.edu Fri Dec 1 09:08:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 08:08:55 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF233.6040704@sendu.me.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> Message-ID: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote: > Chris Fields wrote: >> I know that setting up the PPM is a pain, but I have to say it is >> much faster, and all required PPMs are available. Which makes me >> curious: why bother with trying out a CPAN installation process at >> this point, especially when you have to use PPM to install some of >> the prereqs properly anyway? > > Firstly, problems discovered and resulting fixes will help all > platforms, not just Windows. So thanks for trying it out and > reporting back. Secondly, the PPM method, like Bundle::BioPerl, is > all-or-nothing. The CPAN installation method allows an interactive > choice of which optional things to install. Yes, I understand that. My point is, you are generally forced to use PPM anyway due to several modules not installing properly (all the 'trouble' distributions, like DB_File, are available via PPM). I can see using CPAN as an alternative way of installing Bioperl for a distribution, or as the primary method via CVS or manually, but not for distributions. At least not until the kinks are worked out for Windows users. What are the significant issues for a bioperl PPM installation, based on the last PPM Nathan set up? If there is a redirection problem, could we just modify the installation docs to address that ('due to problem X, you must install the following modules prior to installing BioPerl 1.5.2...'). > If what you say about DB_File is true, then that's a great shame! We need to go through the various prereqs to see which ones need PPM vs CPAN. In general, anything that requires C code compilation (and thus needs a recent VC++) will likely be an issue. > So I can do further trouble-shooting of my own, what is the sure- > fire way to completely clean-out an ActivePerl install, including > any modules you might have installed with PPMs or CPAN? Not sure, beyond uninstalling and cleaning out the Perl directory (I think you might be able to delete the site/ directory, but I haven't tried it). ActivePerl comes preloaded with a number of non-core modules which makes it tricky to uninstall them one-by-one. chris From cjfields at uiuc.edu Fri Dec 1 09:10:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 08:10:34 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <456FF38D.3070508@sendu.me.uk> References: <000301c714b4$7846e790$15327e82@pyrimidine> <456FF38D.3070508@sendu.me.uk> Message-ID: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote: > You were pointing out the YAML issue? I think I'm less concerned > with that (solution: install YAML) and much more concerned with why > it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The > module in question is in the same dir as the Build script, so it > should be found automatically. > > The only thing I can think of is that CPAN doesn't manage to chdir > to the directory. Hopefully I'll be able to reproduce this and then > I can investigate further. My thought was the two were related in some way. I'm not sure to tell the truth. -chris From bix at sendu.me.uk Fri Dec 1 09:17:41 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 14:17:41 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> Message-ID: <45703985.5050203@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> I know that setting up the PPM is a pain, but I have to say it is >>> much faster, and all required PPMs are available. Which makes me >>> curious: why bother with trying out a CPAN installation process at >>> this point, especially when you have to use PPM to install some of >>> the prereqs properly anyway? >> >> Firstly, problems discovered and resulting fixes will help all >> platforms, not just Windows. So thanks for trying it out and reporting >> back. Secondly, the PPM method, like Bundle::BioPerl, is >> all-or-nothing. The CPAN installation method allows an interactive >> choice of which optional things to install. > > Yes, I understand that. My point is, you are generally forced to use > PPM anyway due to several modules not installing properly (all the > 'trouble' distributions, like DB_File, are available via PPM). I can > see using CPAN as an alternative way of installing Bioperl for a > distribution, or as the primary method via CVS or manually, but not for > distributions. At least not until the kinks are worked out for Windows > users. CPAN isn't being suggested as the primary or preferred installation method for Windows. That will still be PPM. I'm mentioning CPAN / manual installation in the Windows INSTALL docs for the benefit of anyone who wants a simple install and test environment when checking out from CVS. > What are the significant issues for a bioperl PPM installation None that I'm aware of - I just need to find the time to start looking into generating an appropriate PPD. Hopefully Nathan's wiki page on the subject will be all I need. From bix at sendu.me.uk Fri Dec 1 09:18:43 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 14:18:43 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> References: <000301c714b4$7846e790$15327e82@pyrimidine> <456FF38D.3070508@sendu.me.uk> <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> Message-ID: <457039C3.30907@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote: > >> You were pointing out the YAML issue? I think I'm less concerned with >> that (solution: install YAML) and much more concerned with why it >> can't reload ModuleBuildBioperl (claiming it isn't in @INC). The >> module in question is in the same dir as the Build script, so it >> should be found automatically. >> >> The only thing I can think of is that CPAN doesn't manage to chdir to >> the directory. Hopefully I'll be able to reproduce this and then I can >> investigate further. > > My thought was the two were related in some way. I'm not sure to tell > the truth. They weren't, using YAML is the fall-back position incase of earlier failure. I've fixed it now in any case. From gwu at molbio.mgh.harvard.edu Fri Dec 1 10:19:42 2006 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Fri, 01 Dec 2006 10:19:42 -0500 Subject: [Bioperl-l] One more load_seqdatabase.pl question In-Reply-To: <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net> References: <4a9ad8800611270907x64a4a4c0jad92bff6641e300@mail.gmail.com> <53C6D534-6E36-4061-B955-E74537839265@gmx.net> <456CA667.6010609@molbio.mgh.harvard.edu> <456F5648.6070207@molbio.mgh.harvard.edu> <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net> Message-ID: <4570480E.1020701@molbio.mgh.harvard.edu> Thanks Hilmar. I did include the -lookup switch on the command line. The warning messages say that the code failed to "INSERT" instead of "UPDATE", which sounds like a match was not found. But I was just loading the same Genbank file for the second time. To test if it actually updated the records, I made a minor modification on one of the COMMENT feature. Unfortunately it's not updated. By the way, the test genbank file has four "COMMENT" features but they are different. Any idea what's happening there? I wonder if it's a bad idea to "UPDATE" a sequence. Say I got a new sequence version with 5 features removed, 5 features modified and 5 features new. If only --lookup is included, according to the POD, the 5 new features will be inserted, the 5 modified features will be updated and the 5 removed features will be in the database untouched. This rendered the new sequence records a mixture of old and new versions. I did not see a reason anyone would like to have a sequence like this. Either include -remove to replace the old version if only one version is needed, or put the new version under a different name space if multiple versions are needed. Do I have the correct understanding of these issues? I deeply appreciate your help. Gang Hilmar Lapp wrote: > Right. You need to tell it to lookup sequences first if you know that > you are loading sequences which may be in the database already (see > the POD of load_seqdatabase.pl, switch --lookup; there are several > other command line options that control what will happen if a sequence > entry is already present in the database.). > > The messages in you report are warnings, not errors. It looks like > some of the comments are duplicated for a sequence, it doesn't look > like reason for concern. Is not so good if you get errors thrown. > > -hilmar > > On Nov 30, 2006, at 5:08 PM, gang wu wrote: > >> Thanks Hilmar. Do you mean the NVL() clause will make >> load_seqdatabase.pl not work when update? >> >> I have problem with updating. Seems load_seqdatabase.pl only tries to >> insert instead of update. I used one of the test genbank file coming >> whith bioperl-db. Please take a look at the attached output. >> >> Thanks. >> >> Gang >> >> ========================================= >> >perl load_seqdatabase.pl -lookup -host elegans -driver Oracle >> -dbname sparc -dbuser biosqldb-sgowner -dbpass PASS -format genbank >> -namespace test >> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb >> Loading >> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb >> ... >> >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, >> values were ("This sequence was reannotated via the Ensembl system. >> Please visit the Ensembl web site, http://www.ensembl.org/ for more >> information. ","1") FKs (389109) >> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated >> (DBD ERROR: OCIStmtExecute) >> --------------------------------------------------- >> >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, >> values were ("The /gene indicates a unique id for a gene, /cds a >> unique id for a translation and a /exon a unique id for an exon. >> These ids are maintained wherever possible between versions. For more >> information on how to interpret the feature table, please visit >> http://www.ensembl.org/Docs/embl.html. ","2") FKs (389109) >> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated >> (DBD ERROR: OCIStmtExecute) >> --------------------------------------------------- >> ... >> ... >> ========================================================== >> Hilmar Lapp wrote: >>> These are the protein translations stored in the feature table as >>> tags of features, right? You can change the type of the column >>> (although there may be some issues when you update the column >>> because the NVL() clause won't work if I recall that correctly), but >>> doing so will deprive you of any 'normal' searches against that >>> column. (You can still use functions >from the DBMS_LOB package, but >>> they will be much slower and are completely non-standard.) It is up >>> to you whether that is too big of a price to pay for having some >>> redundant protein translations (translating the feature's DNA >>> sequence should give you the same) in the database. I always trimmed >>> those feature tags off (using a custom SeqProcessor). An alternative >>> is to convert these feature tags into actual bioentries (i.e., >>> Bio::Seq objects; again, a custom SeqProcessor will allow you to do >>> that). -hilmar On Nov 28, 2006, at 4:13 PM, gang wu wrote: >>>> Hi everyone, I'm using load_seqdatabase.pl to upload some Genbank >>>> genome sequences to my Oracle BioSQL database. I saw some >>>> errors(See attached warning message) related to >>>> seqfeature_qualifier_value (SG_SEQFEATURE_QUALIFIER_ASSOC.VALUE >>>> column), which has Varchar2 data type of maximum 4000 bytes. Did >>>> anybody mention this issue before? Should I just modify the column >>>> to a type being able store more data such as LONG or CLOB? Thanks. >>>> Gang Log information: ============================================ >>>> load_seqdatabase.pl -host elegans -driver Oracle -dbname sparc >>>> -dbuser biosqldb-sgowner -dbpass PASS -format genbank -namespace >>>> genbank /genomeseq/arabidopsis//NC_003070.gbk Loading >>>> /genomeseq/arabidopsis//NC_003070.gbk ... -------------------- >>>> WARNING --------------------- MSG: SimpleValueAdaptor::add_assoc: >>>> unexpected failure of statement execution: ORA-01461: can bind a >>>> LONG value only for insert into a LONG column (DBD ERROR: error >>>> possibly near <*> indicator at char 12 in 'INSERT INTO >>>> <*>seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) >>>> VALUES (:p1, :p2, :p3, :p4)') name: INSERT ASSOC [2] >>>> Bio::SeqFeature::Generic;Bio::Annotation::SimpleValue values: >>>> FK[Bio::SeqFeature::Generic]:14898, >>>> FK[Bio::Annotation::SimpleValue]:800, >>>> value:"MVAVTGEVLHLLRRYLGEYVHGLSTEALRISVWKGDVVLKDLKLKAEALNSLKLPVAVKSGFV >>>> GTITLKVPWKSLGKEPVIVLIDRVFVLAYPAPDDRTLKFFTLVGTEFAYTNYIPGGRQGKASRNQASADR >>>> GTSYFWLMELHGYEAETATLEARAKSKLGSPPQGNSWLGSIIATIIGNLKVSISNVHIRYEDSTRDSSEI >>>> LASFFSYFNNICSSNPGHPFAAGITLAKLAAVTMDEEGNETFDTSGALDKLRKSLQLERLALYHDSNSFP >>>> WEIEKQWDNITPEEWIEMFEDGIKEQTEHKIKSKWALNRHYLLSPINGSLKYHRLGNQERNNPEIPFERA >>>> SVILNDVNVTITEEQYHDWIKLVEVVSRYKTYIEISHLRPMVPVSEAPRLWWRFAAQASLQQKRLWYTRY >>>> IQLYANFLQQSSDVNYPEMREIEKDLDSKVILLWRLLAHAKVESVKSKEAAEQRKLKKGGWFSFNWRTEA >>>> EDDPEVDSVAGGSKLMEERLTKDEWKAINKLLSHQPDEEMNLYSGKDMQNMTHFLVTVSIGQGAARIVDI >>>> NQTEVLCGRFEQLDVTTKFRHRSTQCDVSLRFYGLSAPEGSLAQSVSSERKTNALMASFVNAPIGENIDW >>>> RLSATISPCHATIWTESYDRVLEFVKRSNAVSPTVALETAAVLQMKLEEVTRRAQEQLQIVLEEQSRFAL >>>> DIDIDAPKVRIPLRASGSSKCSSHFLLDFGNFTLTTMDTRSEEQRQNLYSRFCISGRDIAAFFTDCGSDN >>>> QGCSLVMEDFTNQPILSPILEKADNVYSLIDRCGMAVIVDQIKVPHPSYPSTRISIQVPNIGVHFSPTRY >>>> MRIMQLFDILYGAMKTYSQAPVDHMPDGIQPWSPTDLASDARILVWKGIGNSVATWQSCRLVLSGLYLYT >>>> FESEKSLDYQRYLCMAGRQVFEVPPANIGGSPYCLAVGVRGTDLKKALESSSTWIIEFQGEEKAAWLRGL >>>> VQATYQASA! >>>> PLSGDVLGQTSDGDGDFHEPQTRNMKAADLVITGALVETKLYLYGKIKNECDEQVEEVLLLKVLASGGKV >>>> HLISSESGLTVRTKLHSLKIKDELQQQQSGSAQYLAYSVLKNEDIQESLGTCDSFDKEMPVGHADDEDAY >>>> TDALPEFLSPTEPGTPDMDMIQCSMMMDSDEHVGLEDTEGGFHEKDTSQGKSLCDEVFYEVQGGEFSDFV >>>> SVVFLTRSSSSHDYNGIDTQMSIRMSKLEFFCSRPTVVALIGFGFDLSTASYIENDKDANTLVPEKSDSE >>>> KETNDESGRIEGLLGYGKDRVVFYLNMNVDNVTVFLNKEDGSQLAMFVQERFVLDIKVHPSSLSVEGTLG >>>> NFKLCDKSLDSGNCWSWLCDIRDPGVESLIKFKFSSYSAGDDDYEGYDYSLSGKLSAVRIVFLYRFVQEV >>>> TAYFMGLATPHSEEVIKLVDKVGGFEWLIQKDEMDGATAVKLDLSLDTPIIVVPRDSLSKDYIQLDLGQL >>>> EVSNEISWHGCPEKDATAVRVDVLHAKILGLNMSVGINGSIGKPMIREGQGLDIFVRRSLRDVFKKVPTL >>>> SVEVKIDFLHAVMSDKEYDIIVSCTSMNLFEEPKLPPDFRGSSSGPKAKMRLLADKVNLNSQMIMSRTVT >>>> ILAVDINYALLELRNSVNEESSLAHVAVRASEPNSSISWMTSLSETDLYVSVPKVSVLDIRPNTKPEMRL >>>> MLGSSVDASKQASSESLPFSLNKGSFKRANSRAVLDFDAPCSTMLLMDYRWRASSQSCVLRVQQPRILAV >>>> PDFLLAVGEFFVPALRAITGRDETLDPTNDPITRSRGIVLSEPLYKQTEDVVHLSPRRQLVADSLGIDEY >>>> TYDGCGKVISLSEQGEKDLNVGRLEPIIIVGHGKKLRFVNVKIKNGSLLSKCIYLSNDSSCLFSPEDGVD >>>> ISMLENASSNPENVLSNAHKSSDVSDTCQYDSKSGQSFTFEAQVVSPEFTFFDGTKSSLDDSSAVEKLLR >>>> VKLDFNFM! >>>> YASKEKDIWVRALLKNLVVETGSGLIILDPVDISGGYTSVKEKTNMSLTSTDIYMHLSLSALSLLLNLQS >>>> QVTGALQSGNAIPLASCTNFDRIWVSPKENGPRNNLTIWRPQAPSNYVILGDCVTSRAIPPTQAVMAVSN >>>> TYGRVRKPIGFNRIGLFSVIQGLEGDNVQHSHNSNECSLWMPVAPVGYTAMGCVANIGSEQPPDHIVYCL >>>> SIWRADNVLGAFYAHTSTAAPSKKYSPGLSHCLLWNPLQSKTSSSSDPSSTSGSRSEQSSDQTGNSSGWD >>>> ILRSISKATSYHVSTPNFERIWWDKGGDLRRPVSIWRPVPRPGFAILGDSITEGLEPPALGILFKADDSE >>>> IAAKPVQFNKVAHIVGKGFDEVFCWFPVAPPGYVSLGCVLSKFDEAPHVDSFCCPRIDLVNQANIYEASV >>>> TRSSSSKSSQLWSIWKVDNQACTFLARSDLKRPPSRMAFAVGESVKPKTQENVNAEIKLRCFSLTLLDGL >>>> HGMMTPLFDTTVTNIKLATHGRPEAMNAVLISSIAASTFNPQLEAWEPLLEPFDGIFKLETYDTALNQSS >>>> KPGKRLRIAATNILNINVSAANLETLGDAVVSWRRQLELEERAAKMKEESAASRESGDLSAFSALDEDDF >>>> QTIVVENKLGRDIYLKKLEENSDVVVKLCHDENTSVWVPPPRFSNRLNVADSSREARNYMTVQILEAKGL >>>> HIIDDGNSHSFFCTLRLVVDSQGAEPQKLFPQSARTKCVKPSTTIVNDLMECTSKWNELFIFEIPRKGVA >>>> RLEVEVTNLAAKAGKGEVVGSLSFPVGHGESTLRKVASVRMLHQSSDAENISSYTLQRKNAEDKHDNGCL >>>> LISTSYFEKTTIPNTLRNMESKDFVDGDTGFWIGVRPDDSWHSIRSLLPLCIAPKSLQNDFIAMEVSMRN >>>> GRKHATFRCLATVVNDSDVNLEISISSDQNVSSGVSNHNAVIASRSSYVLPWGCLSKDNEQCLHIRPKVE >>>> NSHHSYAWGYCIAVSSGCGKDQPFVDQGLLTRQNTIKQSSRASTFFLRLNQLEKKDMLFCCQPSTGSKPL >>>> WLSVGADAS! >>>> VLHTDLNTPVYDWKISISSPLKLENRLPCPVKFTVWEKTKEGTYLERQHGVVSSRKSAHVYSADIQRPVY >>>> LTLAVHGGWALEKDPIPVLDISSNDSVSSFWFVHQQSKRRLRVSIERDVGETGAAPKTIRFFVPYWITND >>>> SYLPLSYRVVEIEPSENVEAGSPCLTRASKSFKKNPVFSMERRHQKKNVRVLESIEDTSPMPSMLSPQES >>>> AGRSGVVLFPSQKDSYVSPRIGIAVAARDSDSYSPGISLLELEKKERIDVKAFCKDASYYMLSAVLNMTS >>>> DRTKVIHLQPHTLFINRVGVSICLQQCDCQTEEWINPSDPPKLFGWQSSTRLELLKLRVKGYRWSTPFSV >>>> FSEGTMRVPVPKEDGTDQLQLRVQVRSGTKNSRYEVIFRPNSISGPYRIENRSMFLPIRYRQVEGVSESW >>>> QFLPPNAAASFYWENLGRRHLFELLVDGNDPSNSEKFDIDKIGDYPPRSESGPTRPIRVTILKEDKKNIV >>>> RISDWMPAIEPTSSISRRLPASSLSELSGNESQQSHLLASEDSEFHVIVELAELGISVIDHAPEEILYMS >>>> VQNLFVAYSTGLGSGLSRFKLRMQGIQVDNQLPLAPMPVLFRPQRTGDKADYILKFSVTLQSNAGLDLRV >>>> YPYIDFQGRENTAFLINIHEPIIWRIHEMIQQANLSRLSDPNSTAVSVDPFIQIGVLNFSEVRFRVSMAM >>>> SPSQRPRGVLGFWSSLMTALGNTENMPVRISERFHENISMRQSTMINNAIRNVKKDLLGQPLQLLSGVDI >>>> LGNASSALGHMSQGIAALSMDKKFIQSRQRQENKGVEDFGDIIREGGGALAKGLFRGVTGILTKPLEGAK >>>> SSGVEGFVSGFGKGIIGAAAQPVSGVLDLLSKTTEGANAMRMKIAAAITSDEQLLRRRLPRAVGADSLLR >>>> PYNDYRAQGQVILQLAESGSFLGQVDLFKVRGKFALTDAYESHFILPKGKVLMITHRRVILLQQPSNIMG >>>> QRKFIPAK! >>>> DACSIQWDILWNDLVTMELSDGKKDPPNSPPSRLILYLKAKPHDPKEQFRVVKCIPNSKQAFDVYSAIDQ >>>> AINLYGQNALKGMVKNKVTRPYSPISESSWAEGASQQMPASVTPSSTFGTSPTTSSS", >>>> rank:"1" -------------------------------------------------- >>>> ============================================= >>>> _______________________________________________ Bioperl-l mailing >>>> list Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From bosborne11 at verizon.net Fri Dec 1 09:55:18 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 01 Dec 2006 09:55:18 -0500 Subject: [Bioperl-l] An announcement Message-ID: bioperl-l, I would like to call your attention to a job posting and in doing so I realize that I?m probably breaking a rule of this list. I apologize and and acknowledge that I?ve transgressed. The reason I do this is because this is an interesting job that is relevant to a lot of what we do in this mailing list, and some of you might want to consider it. The posting is here: http://www.nescent.org/main/employment.html#gmodhelpdesk I encourage you to pass this on to anyone who you think might be interested. Thanks again, Brian O. From cjfields at uiuc.edu Fri Dec 1 11:49:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 10:49:32 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF53E.90907@sheffield.ac.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> <456FF53E.90907@sheffield.ac.uk> Message-ID: On Dec 1, 2006, at 3:26 AM, Nathan S. Haigh wrote: ... > In addition, using CPAN allows you to run the test suite easily > without the need to download it separately and run it after a PPM > install. A PPM, by design, is supposed to imply that the distribution passes tests for the specified platform, at that point in time, after all prereqs are installed and any additional postinstall operations (install C libraries, modify config files, etc) are complete. The ActiveState automated PPM building process dictates that; if it fails any test, it will not be made into a PPM. It's sort of a 'stamp of approval' that all tests pass, so you don't need to run them. However, a test may fail (and a PPM may not get generated) for pretty superficial reasons, such as the makefile not specifying that a module is needed, server issues, etc, so the automated process isn't fullproof. That's why Kobes and the other repositories are available, where the PPM/PPD is manually generated and made to work specifically for Windows (or whatever other platform). Saying that, it is completely up to the person packaging the distribution to follow those rules if one were to make a PPM manually. You don't even have to run tests prior to using 'nmake ppd'. We can currently state, though, that all tests pass when all prereqs are installed for this distribution. At least at this point in time! > I don't know of a way to clean out ActivePerl - I use VMWare > Workstation and have a virtual machine with a fresh install of > WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas? I haven't tried it that way. I have Parallels on Mac OS X (I run a SigmaPlot/Excel combo off it). My tests were using a native WinXP installation (i.e. not virtually) on my old Dell. It shouldn't make a difference; VMWare, Parallels, and the like should all run ActivePerl for WinXP since it's a virtual machine. Windows Vista, on the other hand... I think with PPM4 you can install to a custom directory. It may be possible to install all new modules to that directory, then you would at least have an idea of what was there (though I don't think you can delete it directly w/o screwing up the PPM database). chris From bix at sendu.me.uk Fri Dec 1 12:12:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 17:12:49 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> Message-ID: <45706291.80201@sendu.me.uk> pelikan at cs.pitt.edu wrote: > Hello all, > > I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, > without Cygwin. The "make test"s have all completed without error. This > is my first time dealing with bioperl, so bear with me. > > I've successfully loaded the most recent taxonomy information using the > biosql-schema scripts. After this, I attempted to load the most recent > release of the uniprot flat file dataset with the following command: > > load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass > ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat > > I am subsequently greeted by many of the following errors: > > Could not store Q7N3Q6: I extracted just Q7N3Q6 from ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz and was able to load it in using load_seqdatabase.pl under linux with no errors. If you make a file with just that sequence do you still get the error? Is anyone else able to reproduce the problem? From cjfields at uiuc.edu Fri Dec 1 12:57:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 11:57:18 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <45703985.5050203@sendu.me.uk> Message-ID: <006301c71572$24be8830$15327e82@pyrimidine> > Chris Fields wrote: > PPM). I can > > see using CPAN as an alternative way of installing Bioperl for a > > distribution, or as the primary method via CVS or manually, but not > > for distributions. At least not until the kinks are worked out for > > Windows users. > > CPAN isn't being suggested as the primary or preferred > installation method for Windows. That will still be PPM. I'm > mentioning CPAN / manual installation in the Windows INSTALL > docs for the benefit of anyone who wants a simple install and > test environment when checking out from CVS. That's fine by me. I think the focus is making sure the PPM works, but that shouldn't hold up the final 1.5.2 release. The PPM for previous releases was never released concurrently with the distribution (if at all); it generally followed by a few weeks to a few months past a final release. > > What are the significant issues for a bioperl PPM installation > > None that I'm aware of - I just need to find the time to > start looking into generating an appropriate PPD. Hopefully > Nathan's wiki page on the subject will be all I need. I'll try testing it out today and next week (the more people we have looking into the issue the better). I'm sure that Module::Build hasn't updated to using PPM4 XML formatting, but the tags are similar enough. I can always create a local PPM database using a similar directory structure to bioperl.org/DIST and test an installation from it. chris From n.haigh at sheffield.ac.uk Fri Dec 1 13:52:55 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 18:52:55 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine> References: <006301c71572$24be8830$15327e82@pyrimidine> Message-ID: <45707A07.7000106@sheffield.ac.uk> Chris Fields wrote: >> Chris Fields wrote: >> PPM). I can >> >>> see using CPAN as an alternative way of installing Bioperl for a >>> distribution, or as the primary method via CVS or manually, but not >>> for distributions. At least not until the kinks are worked out for >>> Windows users. >>> >> CPAN isn't being suggested as the primary or preferred >> installation method for Windows. That will still be PPM. I'm >> mentioning CPAN / manual installation in the Windows INSTALL >> docs for the benefit of anyone who wants a simple install and >> test environment when checking out from CVS. >> > > That's fine by me. I think the focus is making sure the PPM works, but that > shouldn't hold up the final 1.5.2 release. The PPM for previous releases > was never released concurrently with the distribution (if at all); it > generally followed by a few weeks to a few months past a final release. > > >>> What are the significant issues for a bioperl PPM installation >>> >> None that I'm aware of - I just need to find the time to >> start looking into generating an appropriate PPD. Hopefully >> Nathan's wiki page on the subject will be all I need. >> > > I'll try testing it out today and next week (the more people we have looking > into the issue the better). I'm sure that Module::Build hasn't updated to > using PPM4 XML formatting, but the tags are similar enough. I can always > create a local PPM database using a similar directory structure to > bioperl.org/DIST and test an installation from it. > > chris > To clarify a few things about PPM4 XML and to highlight the main differences: 1) The use of PROVIDE and REQUIRE tags 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules. 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma separated tuples like PPM3 XML 4) the VERSION in PROVIDE and REQUIRE are used internally to do version comparisons for upgrades and solving prereqs etc 5) Module names should all contain '::' either natively according their namespace, if it doesn't have one natively, then one is appended to the end e.g. "GD::" 6) the VERSION in the SOFTPKG key is for human readability only 7) the NAME in SOFTPKG is used to identify which packages are actually the same. Nath --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 18:52:57 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 13:52:44 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 18:52:44 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: <457079FC.7010209@sendu.me.uk> Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: [snip] >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux with no > errors. If you make a file with just that sequence do you still get the > error? > > Is anyone else able to reproduce the problem? In fact, if I just try and load it again I reproduce the problem. The situation is similar to http://bugzilla.bioperl.org/show_bug.cgi?id=2092 And I have a tentative fix that extends Brian's fix there. Committed to HEAD only atm. I don't know anything about bioperl-db and don't have the faintest clue why this is happening, nor the time to figure it out. Can someone please have a proper look at this and decide if my fix is sane? All I can say is the the test suites for bioperl-live and bioperl-db continue to pass, but that isn't really saying much. PS. having used load_seqdatabase.pl to load a sequence, how do I remove it afterwards? From cjfields at uiuc.edu Fri Dec 1 14:00:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:00:13 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: >> Hello all, >> >> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >> without Cygwin. The "make test"s have all completed without error. >> This >> is my first time dealing with bioperl, so bear with me. >> >> I've successfully loaded the most recent taxonomy information >> using the >> biosql-schema scripts. After this, I attempted to load the most >> recent >> release of the uniprot flat file dataset with the following command: >> >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - >> dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/ > knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux > with no > errors. If you make a file with just that sequence do you still get > the > error? > > Is anyone else able to reproduce the problem? I can reproduce on both WinXP and Mac OS X using the latest bioperl- db/bioperl-live and a BioSQL database preloaded with taxonomy. Notably the bug doesn't show up with a database lacking taxonomy, where no lookup is used (I guess). Here's some overly verbose debugging (apologies): Loading saved.flat ... attempting to load adaptor class for Bio::Seq::RichSeq attempting to load module Bio::DB::BioSQL::RichSeqAdaptor attempting to load adaptor class for Bio::Seq attempting to load module Bio::DB::BioSQL::SeqAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor attempting to load adaptor class for Bio::Species attempting to load module Bio::DB::BioSQL::SpeciesAdaptor instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor attempting to load adaptor class for Bio::Tree::Tree attempting to load module Bio::DB::BioSQL::TreeAdaptor attempting to load adaptor class for Bio::Root::Root attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::Root::RootI attempting to load module Bio::DB::BioSQL::RootIAdaptor attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::Tree::TreeI attempting to load module Bio::DB::BioSQL::TreeIAdaptor attempting to load module Bio::DB::BioSQL::TreeAdaptor attempting to load adaptor class for Bio::Tree::NodeI attempting to load module Bio::DB::BioSQL::NodeIAdaptor attempting to load module Bio::DB::BioSQL::NodeAdaptor attempting to load adaptor class for Bio::Tree::TreeFunctionsI attempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor attempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor no adaptor found for class Bio::Tree::Tree attempting to load adaptor class for Bio::DB::Taxonomy::list attempting to load module Bio::DB::BioSQL::listAdaptor attempting to load adaptor class for Bio::DB::Taxonomy attempting to load module Bio::DB::BioSQL::TaxonomyAdaptor no adaptor found for class Bio::DB::Taxonomy::list attempting to load adaptor class for Bio::Annotation::Collection attempting to load module Bio::DB::BioSQL::CollectionAdaptor attempting to load adaptor class for Bio::AnnotationCollectionI attempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor attempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor attempting to load adaptor class for Bio::Annotation::TypeManager attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for Bio::Annotation::SimpleValue attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor attempting to load adaptor class for Bio::Annotation::Reference attempting to load module Bio::DB::BioSQL::ReferenceAdaptor instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor attempting to load adaptor class for Bio::Annotation::Comment attempting to load module Bio::DB::BioSQL::CommentAdaptor instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor attempting to load adaptor class for Bio::Annotation::DBLink attempting to load module Bio::DB::BioSQL::DBLinkAdaptor instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor attempting to load adaptor class for Bio::PrimarySeq attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor attempting to load adaptor class for Bio::SeqFeature::Generic attempting to load module Bio::DB::BioSQL::GenericAdaptor attempting to load adaptor class for Bio::SeqFeatureI attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor attempting to load adaptor class for Bio::Location::Simple attempting to load module Bio::DB::BioSQL::SimpleAdaptor attempting to load adaptor class for Bio::Location::Atomic attempting to load module Bio::DB::BioSQL::AtomicAdaptor attempting to load adaptor class for Bio::LocationI attempting to load module Bio::DB::BioSQL::LocationIAdaptor attempting to load module Bio::DB::BioSQL::LocationAdaptor instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for BioNamespace attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load driver for adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BioNamespaceAdaptor preparing UK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ? BioNamespaceAdaptor: binding UK column 1 to "Swiss-Prot" (namespace) preparing INSERT statement: INSERT INTO biodatabase (name, authority) VALUES (?, ?) BioNamespaceAdaptor::insert: binding column 1 to "Swiss- Prot" (namespace) BioNamespaceAdaptor::insert: binding column 2 to "" (authority) no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for Bio::DB::BioSQL::SpeciesAdaptor preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND ncbi_taxon_id = ? SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) SpeciesAdaptor: binding UK column 2 to "141679" (ncbi_taxid) prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM taxon node, taxon taxon, taxon_name name WHERE name.taxon_id = node.taxon_id AND taxon.left_value BETWEEN node.left_value AND node.right_value AND taxon.taxon_id = ? AND name.name_class = 'scientific name' ORDER BY node.left_value attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqAdaptor Could not store Q7N3Q6: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The supplied lineage does not start near 'Photorhabdus luminescens subsp. laumondii' STACK: Error::throw STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ Root/Root.pm:359 STACK: Bio::Species::classification /Users/cjfields/src/bioperl-live/ Bio/Species.pm:166 STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:552 STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /Library/ Perl/5.8.6/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:271 STACK: load_seqdatabase.pl:620 ----------------------------------------------------------- at load_seqdatabase.pl line 633 chris From cjfields at uiuc.edu Fri Dec 1 14:01:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:01:59 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <45707A07.7000106@sheffield.ac.uk> References: <006301c71572$24be8830$15327e82@pyrimidine> <45707A07.7000106@sheffield.ac.uk> Message-ID: On Dec 1, 2006, at 12:52 PM, Nathan S. Haigh wrote: > Chris Fields wrote: >>> Chris Fields wrote: >>> PPM). I can >>>> see using CPAN as an alternative way of installing Bioperl for a >>>> distribution, or as the primary method via CVS or manually, but >>>> not for distributions. At least not until the kinks are worked >>>> out for Windows users. >>>> >>> CPAN isn't being suggested as the primary or preferred >>> installation method for Windows. That will still be PPM. I'm >>> mentioning CPAN / manual installation in the Windows INSTALL docs >>> for the benefit of anyone who wants a simple install and test >>> environment when checking out from CVS. >>> >> >> That's fine by me. I think the focus is making sure the PPM >> works, but that >> shouldn't hold up the final 1.5.2 release. The PPM for previous >> releases >> was never released concurrently with the distribution (if at all); it >> generally followed by a few weeks to a few months past a final >> release. >> >> >>>> What are the significant issues for a bioperl PPM installation >>>> >>> None that I'm aware of - I just need to find the time to start >>> looking into generating an appropriate PPD. Hopefully Nathan's >>> wiki page on the subject will be all I need. >>> >> >> I'll try testing it out today and next week (the more people we >> have looking >> into the issue the better). I'm sure that Module::Build hasn't >> updated to >> using PPM4 XML formatting, but the tags are similar enough. I can >> always >> create a local PPM database using a similar directory structure to >> bioperl.org/DIST and test an installation from it. >> >> chris >> > > To clarify a few things about PPM4 XML and to highlight the main > differences: > > 1) The use of PROVIDE and REQUIRE tags > 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules. > 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma > separated tuples like PPM3 XML > 4) the VERSION in PROVIDE and REQUIRE are used internally to do > version comparisons for upgrades and solving prereqs etc > 5) Module names should all contain '::' either natively according > their namespace, if it doesn't have one natively, then one is > appended to the end e.g. "GD::" > 6) the VERSION in the SOFTPKG key is for human readability only > 7) the NAME in SOFTPKG is used to identify which packages are > actually the same. > > Nath Okay. Maybe place this in the wiki (PPM page) for future reference? Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Fri Dec 1 14:05:38 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 19:05:38 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine> References: <006301c71572$24be8830$15327e82@pyrimidine> Message-ID: <45707D02.9070504@sheffield.ac.uk> Chris Fields wrote: >> Chris Fields wrote: >> PPM). I can >> >>> see using CPAN as an alternative way of installing Bioperl for a >>> distribution, or as the primary method via CVS or manually, but not >>> for distributions. At least not until the kinks are worked out for >>> Windows users. >>> >> CPAN isn't being suggested as the primary or preferred >> installation method for Windows. That will still be PPM. I'm >> mentioning CPAN / manual installation in the Windows INSTALL >> docs for the benefit of anyone who wants a simple install and >> test environment when checking out from CVS. >> > > That's fine by me. I think the focus is making sure the PPM works, but that > shouldn't hold up the final 1.5.2 release. The PPM for previous releases > was never released concurrently with the distribution (if at all); it > generally followed by a few weeks to a few months past a final release. > > Forgot to say, one really annoying thing about PPM is that it seems to display all the versions of Bioperl defined in the XML file. An addition, I think a bug in PPM4 means that if a package is available in ActiveStates repo PPM4 always want to install it rather than a more recent version in a different repo (this includes upgrades). This results in this annoying behaviour: 1) If activestate and bioperl repos are active, searching for bioperl lists several versions 2) If you are using PPM4 GUI, and have installed a non activestate version, then it says you can upgrade to the version in activestates repo (even if it's actually a downgrade). 3) Using ppm-shell, if you choose "install bioperl" or "upgrade bioperl" it will always install the version in the activestate repo. 4) I'm sure there are also some other annoyances. In the end, it means the best way to install and upgrade bioperl, is to search for bioperl packages and install the latest version by eye rather than relying in the "upgrade feature" (at least for the time being). You may also need to remove an old version of bioperl before installing a more recent version. NOTE: using "upgrade" runs the risk of installing bioperl 1.2.3 from activestate and not the latest version in any other repo! I'll update the wiki when I have time. Nath >>> What are the significant issues for a bioperl PPM installation >>> >> None that I'm aware of - I just need to find the time to >> start looking into generating an appropriate PPD. Hopefully >> Nathan's wiki page on the subject will be all I need. >> > > I'll try testing it out today and next week (the more people we have looking > into the issue the better). I'm sure that Module::Build hasn't updated to > using PPM4 XML formatting, but the tags are similar enough. I can always > create a local PPM database using a similar directory structure to > bioperl.org/DIST and test an installation from it. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0652-4, 30/11/2006 > Tested on: 01/12/2006 18:29:23 > avast! - copyright (c) 1988-2006 ALWIL Software. > http://www.avast.com > > > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 19:05:39 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From cjfields at uiuc.edu Fri Dec 1 14:06:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:06:53 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: >> Hello all, >> >> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >> without Cygwin. The "make test"s have all completed without error. >> This >> is my first time dealing with bioperl, so bear with me. >> >> I've successfully loaded the most recent taxonomy information >> using the >> biosql-schema scripts. After this, I attempted to load the most >> recent >> release of the uniprot flat file dataset with the following command: >> >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - >> dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/ > knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux > with no > errors. If you make a file with just that sequence do you still get > the > error? > > Is anyone else able to reproduce the problem? Okay, just updated to get your latest CVS fixes for bioperl-live and it passes now for both Mac OS X and WinXP. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Dec 1 14:09:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:09:15 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457079FC.7010209@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk> Message-ID: On Dec 1, 2006, at 12:52 PM, Sendu Bala wrote: > > PS. having used load_seqdatabase.pl to load a sequence, how do I > remove > it afterwards? There's not much documentation on it, but it demonstrated several times in the test suite. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Dec 1 14:39:17 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 19:39:17 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> Message-ID: <457084E5.2050300@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > >> pelikan at cs.pitt.edu wrote: >>> Hello all, >>> >>> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >>> without Cygwin. The "make test"s have all completed without error. This >>> is my first time dealing with bioperl, so bear with me. >>> >>> I've successfully loaded the most recent taxonomy information >>> using the >>> biosql-schema scripts. After this, I attempted to load the most recent >>> release of the uniprot flat file dataset with the following command: >>> >>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass >>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >>> >>> I am subsequently greeted by many of the following errors: >>> >>> Could not store Q7N3Q6: >> >> I extracted just Q7N3Q6 from >> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz >> >> and was able to load it in using load_seqdatabase.pl under linux with no >> errors. If you make a file with just that sequence do you still get the >> error? >> >> Is anyone else able to reproduce the problem? > > Okay, just updated to get your latest CVS fixes for bioperl-live and it > passes now for both Mac OS X and WinXP. Can you confirm if it is actually working correctly though? Like, having stored a previously-problem sequence, can you get it back out from the database and is its ->species() correct? From cjfields at uiuc.edu Fri Dec 1 14:52:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:52:13 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457084E5.2050300@sendu.me.uk> Message-ID: <000001c71582$329d4d50$15327e82@pyrimidine> > > > > Okay, just updated to get your latest CVS fixes for > bioperl-live and > > it passes now for both Mac OS X and WinXP. > > Can you confirm if it is actually working correctly though? > Like, having stored a previously-problem sequence, can you > get it back out from the database and is its ->species() correct? I would assume so, if we can trust the species tests. I will have to try it again over the weekend. I planned on loading a ton of protein sequences in anyway, most of which are bacterial; if anything breaks it will probably be with those. I think Jason and Hilmar were going to get together about the BioSQL paper at the hackathon. That may be a good place to bring some of the species issues, if they persist. chris From hlapp at gmx.net Fri Dec 1 20:42:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Dec 2006 20:42:05 -0500 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457079FC.7010209@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk> Message-ID: <8414723F-BA02-4936-8F53-781276C3B526@gmx.net> Either using SQL: -- theoretically you should convince yourself first that there -- is only one such record (the UK is over acc,version,namespace) SQL> DELETE FROM bioentry WHERE accession = 'Q7N3Q6'; or through bioperl-db (see the delete test for examples): my $db = Bio::DB::BioDB->new(....); my $seq = Bio::PrimarySeq->new(-accession_number=>'Q7N3Q6', -namespace=>'whatever you used when loading'); my $adp = $db->get_persistence_adaptor($seq); my $pseq = $adp->find_by_unique_key($seq); $pseq->remove(); $pseq->commit(); -hilmar On Dec 1, 2006, at 1:52 PM, Sendu Bala wrote: > PS. having used load_seqdatabase.pl to load a sequence, how do I > remove > it afterwards? -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From chhalling at verizon.net Sun Dec 3 20:56:51 2006 From: chhalling at verizon.net (Conrad Halling) Date: Sun, 03 Dec 2006 20:56:51 -0500 Subject: [Bioperl-l] BioPerl Wiki is down Message-ID: <45738063.1070504@verizon.net> When I attempted to navigate to http://www.bioperl.org/, I got the following message: A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "MediaWikiBagOStuff::_doquery". MySQL returned error "1205: Lock wait timeout exceeded; try restarting transaction (localhost)". -- Conrad Halling chhalling at verizon.net From rbirnie at totalise.co.uk Sun Dec 3 16:38:02 2006 From: rbirnie at totalise.co.uk (richard) Date: Sun, 3 Dec 2006 21:38:02 +0000 Subject: [Bioperl-l] confused by Bio::Graphics Message-ID: <200612032138.02522.rbirnie@totalise.co.uk> Hi all, I'm having a little trouble getting Bio::Graphics to give me the correct output and I'm looking for some help. I am trying to extend from example 5 of the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. Eventually I intend the script to follow example 6 but I thought I'd try the simpler version first. The basic aim of the script is that it takes as input a file containing a list of GenBank IDs plus some other info for alternative transcripts of a gene. This information is stored in a hash and the GenBank IDs are used to retrieve the appropriate entries from GenBank. I then want to use Bio::Graphics to generate a figure from the feature tables showing the CDSs from the alternative transcripts. So far I have managed to retrieve the GenBank entries extract the feature tables and store a reference to these in the hash mentioned above. I've also got Bio::Graphics to draw a basic image but some of the details aren't right and I don't understand why. I have attached the code I have so far, the input file and the output image to this mail. I didn't want to display it all in the main message but I'm not actually sure which bit is causing the problem. The code is very rough and in need of polishing but I need to get it to work correctly first. These are the problems: 1) As I understand it this: my $wholeseq = Bio::SeqFeature::Generic->new ( -start => 1, -end => $refseq->length, -display_name =>$refseq->display_name ); should display the name of the gene (CD133/Prominin1) near the top of image. It doesn't, am I misunderstanding or is there an error in the code? 2) In the quoted example the CDS is broken up into smaller regions which are then linked together in example 6. This isn't happening in my code and I think it should be, I get one solid block for the CDS. I don't understand why this is because I'm not clear which parts of the feature table are used to define where the CDS should be split. I think this is the relevant bit of code: foreach my $alt_trans (keys %main) { foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { my $feature = $main{$alt_trans}{'features'}{$tag}; $panel->add_track($feature, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'black', -key => $alt_trans, -bump => +1, -height => 8, -label => 1, -description => 1, ) if ($tag eq 'CDS'); } } Can anyone tell me what I am doing wrong? RefSeq entry for the gene of interest is here: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386 If I understand correctly the example file used in the HOWTO is this gene: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=116805320 Final question, does bioperl come with example scripts and is so where whould they normally be found on a Linux system? If anyone is still reading this thanks for your patience. Any clarification will be appreciated. regards, Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: CD133_graphic_code Type: application/x-perl Size: 2702 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061203/d5bd52ae/attachment-0001.bin -------------- next part -------------- sequence_ID Exon_Boundary Assay_location Amplicon_length NM_006017 9 - 10 1118 106 AF027208.1 9 - 10 1118 106 AK027420.1 9 - 10 1312 106 AK027422.1 9 - 10 1334 106 BC012089.1 9 - 10 1289 106 AY449689.1 8 - 9 1054 106 AY449690.1 8 - 9 1054 106 AY449691.1 8 - 9 1054 106 AY449692.1 9 - 10 1081 106 AY449693.1 9 - 10 1081 106 AF507034.1 8 - 9 1091 106 AK075411.1 9 - 10 1289 106 AF117225.1 9 - 10 1334 106 AK226033.1 - 1312 106 DQ895452.1 - 1054 106 -------------- next part -------------- A non-text attachment was scrubbed... Name: CD133.png Type: image/png Size: 4322 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061203/d5bd52ae/attachment-0001.png From cjfields at uiuc.edu Sun Dec 3 22:35:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 3 Dec 2006 21:35:17 -0600 Subject: [Bioperl-l] BioPerl Wiki is down In-Reply-To: <45738063.1070504@verizon.net> References: <45738063.1070504@verizon.net> Message-ID: <41422FC7-B579-4B45-B8CC-341B8F462BCB@uiuc.edu> On Dec 3, 2006, at 7:56 PM, Conrad Halling wrote: > When I attempted to navigate to http://www.bioperl.org/, I got the > following message: > > A database query syntax error has occurred. This may indicate a bug in > the software. The last attempted database query was: > > (SQL query hidden) > > from within function "MediaWikiBagOStuff::_doquery". MySQL returned > error "1205: Lock wait timeout exceeded; try restarting transaction > (localhost)". > > -- Conrad Halling > chhalling at verizon.net This has been an ongoing problem with the server; I have reported it previously to open-bio support. There have been a few attempts to fix it which seem to work short-term but something else must be wrong. Jason? Chris D? For my part, Googling found the following link, which indicates that this error may be due to heavy server load: http://tibia.erig.net/TibiaWiki:Bug_reports Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Derek.Fairley at bll.n-i.nhs.uk Mon Dec 4 05:18:37 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Mon, 4 Dec 2006 10:18:37 -0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk> Message-ID: Richard, You can find instructions for installing the example scripts directory here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_SCRIPTS or you can get individual scripts from here: http://www.bioperl.org/wiki/Bioperl_scripts11 Derek. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of richard Sent: 03 December 2006 21:38 To: Bioperl list Subject: [Bioperl-l] confused by Bio::Graphics Hi all, I'm having a little trouble getting Bio::Graphics to give me the correct output and I'm looking for some help. I am trying to extend from example 5 of the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. Eventually I intend the script to follow example 6 but I thought I'd try the simpler version first. The basic aim of the script is that it takes as input a file containing a list of GenBank IDs plus some other info for alternative transcripts of a gene. This information is stored in a hash and the GenBank IDs are used to retrieve the appropriate entries from GenBank. I then want to use Bio::Graphics to generate a figure from the feature tables showing the CDSs from the alternative transcripts. So far I have managed to retrieve the GenBank entries extract the feature tables and store a reference to these in the hash mentioned above. I've also got Bio::Graphics to draw a basic image but some of the details aren't right and I don't understand why. I have attached the code I have so far, the input file and the output image to this mail. I didn't want to display it all in the main message but I'm not actually sure which bit is causing the problem. The code is very rough and in need of polishing but I need to get it to work correctly first. These are the problems: 1) As I understand it this: my $wholeseq = Bio::SeqFeature::Generic->new ( -start => 1, -end => $refseq->length, -display_name =>$refseq->display_name ); should display the name of the gene (CD133/Prominin1) near the top of image. It doesn't, am I misunderstanding or is there an error in the code? 2) In the quoted example the CDS is broken up into smaller regions which are then linked together in example 6. This isn't happening in my code and I think it should be, I get one solid block for the CDS. I don't understand why this is because I'm not clear which parts of the feature table are used to define where the CDS should be split. I think this is the relevant bit of code: foreach my $alt_trans (keys %main) { foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { my $feature = $main{$alt_trans}{'features'}{$tag}; $panel->add_track($feature, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'black', -key => $alt_trans, -bump => +1, -height => 8, -label => 1, -description => 1, ) if ($tag eq 'CDS'); } } Can anyone tell me what I am doing wrong? RefSeq entry for the gene of interest is here: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386 If I understand correctly the example file used in the HOWTO is this gene: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=1168053 20 Final question, does bioperl come with example scripts and is so where whould they normally be found on a Linux system? If anyone is still reading this thanks for your patience. Any clarification will be appreciated. regards, Richard From rbirnie at totalise.co.uk Mon Dec 4 04:30:36 2006 From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk) Date: 04 Dec 2006 09:30:36 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/551f1442/attachment.html From bix at sendu.me.uk Mon Dec 4 09:37:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 14:37:16 +0000 Subject: [Bioperl-l] BLASTing with a seqio/seq object... In-Reply-To: <45706671.9000201@york.ac.uk> References: <01ba01c714a2$b9659c10$15327e82@pyrimidine> <456F27E9.70205@york.ac.uk> <456FEF22.4090004@sendu.me.uk> <45706671.9000201@york.ac.uk> Message-ID: <4574329C.2030905@sendu.me.uk> Samantha Thompson wrote: > Hi, > Thanks for all your help so far, I am still trying to understand a > couple of things... You should make sure your replies are sent to the list, as you're likely to get a faster response. [where $blast_report is the value returned by Bio::Tools::Run::RemoteBlast->submit_blast($seq_object)] > when I run this line.. > > $searchio = Bio::SearchIO->new(-format => 'blast', > -file => $blast_report); > > between submitting the blast search and trying to to process the searchio object like I was attempting before I get the following errors back: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open 1: No such file or directory [snip] > Does this mean that my BLAST is failing when I submit it? No, the -file option of SearchIO->new() takes, unsurprisingly, a filename. I'd tell you to pay careful attention to the docs, but sadly the RemoteBlast docs are currently wrong. submit_blast() claims to return 'Blast report object' (which in any case certainly wouldn't be a filename) when in fact it returns, as you discovered, a (for our purposes) meaningless number. As I suggested before, you need to look at the synopsis for Bio::Tools::Run::RemoteBlast instead. (having called submit_blast you must do the each_rid loop) Does anyone care to go through the POD for RemoteBlast and update it to an accurate state? From bix at sendu.me.uk Mon Dec 4 09:40:27 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 14:40:27 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: References: Message-ID: <4574335B.805@sendu.me.uk> rbirnie at totalise.co.uk wrote: > Hi all, > > I've just seen my previous mail come through on the digest and I noticed > that the code I attached has been scrubbed which means that the message > won't make much sense. If I've contravened list rules by posting > attachments then apologies, I did look for a posting guide but couldn't > see one on the wiki. I deliberatley didn't put the whole code in the > main message because it's quite long. I'm not sure which part is wrong > so I don't know which part to post I'm just not seeing the output I > would expect from the example. What is the best thing for me to do? I saw a few attachments on your post (including your code example), so I think what you did was fine. From cjfields at uiuc.edu Mon Dec 4 10:40:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 09:40:20 -0600 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <4574335B.805@sendu.me.uk> Message-ID: <002001c717ba$823c1500$15327e82@pyrimidine> > rbirnie at totalise.co.uk wrote: > > Hi all, > > > > I've just seen my previous mail come through on the digest and I > > noticed that the code I attached has been scrubbed which means that > > the message won't make much sense. If I've contravened list > rules by > > posting attachments then apologies, I did look for a > posting guide but > > couldn't see one on the wiki. I deliberatley didn't put the > whole code > > in the main message because it's quite long. I'm not sure > which part > > is wrong so I don't know which part to post I'm just not seeing the > > output I would expect from the example. What is the best > thing for me to do? > > I saw a few attachments on your post (including your code > example), so I think what you did was fine. Same here. I received a PNG file and two text files (a script and a data file). chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From rbirnie at totalise.co.uk Mon Dec 4 11:06:51 2006 From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk) Date: 04 Dec 2006 16:06:51 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <002001c717ba$823c1500$15327e82@pyrimidine> References: <002001c717ba$823c1500$15327e82@pyrimidine> Message-ID: An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/22c3c5e0/attachment.html From dmessina at wustl.edu Mon Dec 4 11:46:16 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 4 Dec 2006 10:46:16 -0600 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk> References: <200612032138.02522.rbirnie@totalise.co.uk> Message-ID: Hi Richard, > [richard] > > These are the problems: > 1) As I understand it this: > > my $wholeseq = Bio::SeqFeature::Generic->new ( > -start => 1, > -end => $refseq->length, > -display_name =>$refseq->display_name > ); > > should display the name of the gene (CD133/Prominin1) near the top > of image. > It doesn't, am I misunderstanding or is there an error in the code? The contents of a sequence object's display_name varies depending on the type of sequence record; for a sequence object created from a Genbank record, it's the value of the LOCUS field on the first line of the record. If you want the gene name, you'll have to dig it out of the feature table. If you look at the Genbank record for your first sequence, you'll see that under both the gene and CDS primary features, the HUGO gene abbreviation is stored under the "gene" secondary tag, and various synonyms are under the "note" and "product" secondary tags. LOCUS NM_006017 3794 bp mRNA linear PRI 17-NOV-2006 DEFINITION Homo sapiens prominin 1 (PROM1), mRNA. ACCESSION NM_006017 VERSION NM_006017.1 GI:5174386 [...skipping irrelevant part of the Genbank record...] FEATURES Location/Qualifiers source 1..3794 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /chromosome="4" /map="4p15.32" gene 1..3794 /gene="PROM1" /note="prominin 1; synonyms: AC133, CD133, PROML1, MSTP061" /db_xref="GeneID:8842" /db_xref="HGNC:9454" /db_xref="HPRD:HPRD_05079" /db_xref="MIM:604365" CDS 38..2635 /gene="PROM1" /go_component="integral to plasma membrane [pmid 9389720]; membrane" /go_process="response to stimulus; visual perception" /note="hProminin; prominin (mouse)-like 1; hematopoietic stem cell antigen" /codon_start=1 /product="prominin 1" /protein_id="NP_006008.1" /db_xref="GI:5174387" /db_xref="GeneID:8842" /db_xref="HGNC:9454" /db_xref="HPRD:HPRD_05079" /db_xref="MIM:604365" [....more...] In your script, you grab the primary features between lines 34-60. You can grab the secondary feature you want with something like: [cribbed from the Feature-Annotation HOWTO] for my $feat_object ($seq_object->get_SeqFeatures) { push @ids, $feat_object->get_tag_values("gene") if ($feat_object- >has_tag("gene")); } > 2) In the quoted example the CDS is broken up into smaller regions > which are > then linked together in example 6. This isn't happening in my code > and I > think it should be, I get one solid block for the CDS. I don't > understand why > this is because I'm not clear which parts of the feature table are > used to > define where the CDS should be split. I think this is the relevant > bit of > code: > > foreach my $alt_trans (keys %main) { > foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { > > my $feature = $main{$alt_trans}{'features'}{$tag}; > > $panel->add_track($feature, > -glyph => 'generic', > -bgcolor => $colors[$idx++ % @colors], > -fgcolor => 'black', > -font2color => 'black', > -key => $alt_trans, > -bump => +1, > -height => 8, > -label => 1, > -description => 1, > ) if ($tag eq 'CDS'); > > } > } The problem here is that RefSeq mRNA records don't contain intron- exon boundary information. I think you'll have to get that from an assembly record. From the Entrez gene page for PROM1, I obtained a Genbank record for the PROM1 genomic locus: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? val=NC_000004.10&from=15578955&to=15686664&strand=2&dopt=gb Saving that as 'PROM1.gb' (the suffix is important), and running the bp_embl2picture.pl script on it, I got an image similar to Figure 6 (attached). Hope this helps, Dave ? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/4add2cbc/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: PROM1.png Type: image/png Size: 8646 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/4add2cbc/attachment.png From bix at sendu.me.uk Mon Dec 4 14:37:13 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 19:37:13 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <000001c717db$3ca7b910$15327e82@pyrimidine> References: <000001c717db$3ca7b910$15327e82@pyrimidine> Message-ID: <457478E9.3060405@sendu.me.uk> Chris Fields wrote: > Sendu, > > Are current plans to still try getting the final 1.5.2 release out > before the hackathon next week? Yes, I seriously hope so. I was kind of hoping to see test results from you and Nathan on the wiki though... > There are a few commits I want to make, but I may wait until after > 1.5.2 is out before I add them. But don't let the release stop you. As long as you don't commit to the 1.5.2 branch it will be fine. From cjfields at uiuc.edu Mon Dec 4 14:34:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 13:34:34 -0600 Subject: [Bioperl-l] Timeline on the 1.5.2 release? Message-ID: <000001c717db$3ca7b910$15327e82@pyrimidine> Sendu, Are current plans to still try getting the final 1.5.2 release out before the hackathon next week? There are a few commits I want to make, but I may wait until after 1.5.2 is out before I add them. chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Dec 4 15:23:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 14:23:45 -0600 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> Message-ID: <000001c717e2$19d18e00$15327e82@pyrimidine> > Chris Fields wrote: > > Sendu, > > > > Are current plans to still try getting the final 1.5.2 release out > > before the hackathon next week? > > Yes, I seriously hope so. I was kind of hoping to see test > results from you and Nathan on the wiki though... Ah, forgot to post those! Working on that now... > > There are a few commits I want to make, but I may wait until after > > 1.5.2 is out before I add them. > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. There are a few things I plan on adding over the next few weeks, including some things for Bio::Location::SplitLocation. However I'm sure some of the latter will break tests, so I'll be adding it in a bit at a time. It all depends when I can squeeze time in to work on them! chris From pelikan at cs.pitt.edu Mon Dec 4 17:34:59 2006 From: pelikan at cs.pitt.edu (pelikan at cs.pitt.edu) Date: Mon, 4 Dec 2006 17:34:59 -0500 (EST) Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries Message-ID: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> Hello, My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, and the latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB memory. "make test"s past fine. The problem is that I'm not getting similar numbers of anything when I load datasets using load_seqdatabase.pl. For instance, if I want to load only protiens from Homo Sapiens, I go to UniProt, use the database search function, do a text search for Homo Sapiens (returns 70914 hits), export the hits to flat file format (--format swiss) using the data set manager, and load it using load_seqdatabase.pl. The result of "select count(*) from bioentry;" results in only 1003 entries. Moreover it seems like the entries don't go past the B's in the alphabet - I can't find bioentry.descriptions like '%cytochrome%' or '%myoglobin%', but I can find apolipoproteins, for example. I know this is an annoying question, but if someone has more experience in dealing with this issue, I would be grateful for any assistance. I don't get any error messages, so it's difficult for me to tell what's going on. -Richard From n.haigh at sheffield.ac.uk Tue Dec 5 01:53:34 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 05 Dec 2006 06:53:34 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> Message-ID: <4575176E.3020906@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: > >> Sendu, >> >> Are current plans to still try getting the final 1.5.2 release out >> before the hackathon next week? >> > > Yes, I seriously hope so. I was kind of hoping to see test results from > you and Nathan on the wiki though... > > > OK, I'll get onto this today. >> There are a few commits I want to make, but I may wait until after >> 1.5.2 is out before I add them. >> > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > A: Yes. >> Q: Are you sure? >> >>> A: Because it reverses the logical flow of conversation. >>> >>>> Q: Why is top posting frowned upon? >>>> Get Thunderbird From n.haigh at sheffield.ac.uk Tue Dec 5 06:43:16 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 05 Dec 2006 11:43:16 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> Message-ID: <45755B54.7080902@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: > >> Sendu, >> >> Are current plans to still try getting the final 1.5.2 release out >> before the hackathon next week? >> > > Yes, I seriously hope so. I was kind of hoping to see test results from > you and Nathan on the wiki though... > > > I've added my test results for Debian to the wiki. Nath >> There are a few commits I want to make, but I may wait until after >> 1.5.2 is out before I add them. >> > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > A: Yes. >> Q: Are you sure? >> >>> A: Because it reverses the logical flow of conversation. >>> >>>> Q: Why is top posting frowned upon? >>>> Get Thunderbird From bix at sendu.me.uk Tue Dec 5 06:47:06 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 05 Dec 2006 11:47:06 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <45755B54.7080902@sheffield.ac.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> <45755B54.7080902@sheffield.ac.uk> Message-ID: <45755C3A.9050903@sendu.me.uk> Nathan S. Haigh wrote: > Sendu Bala wrote: >> Chris Fields wrote: >> >>> Sendu, >>> >>> Are current plans to still try getting the final 1.5.2 release out >>> before the hackathon next week? >>> >> Yes, I seriously hope so. I was kind of hoping to see test results from >> you and Nathan on the wiki though... > > I've added my test results for Debian to the wiki. Thanks (and to Chris as well). I can't tell you how much I loath and despise TCoffee and Tmhmm now ;) From cjfields at uiuc.edu Tue Dec 5 11:04:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Dec 2006 10:04:38 -0600 Subject: [Bioperl-l] Build.PL changes Message-ID: <001b01c71887$10be3160$15327e82@pyrimidine> Sendu, I think the Build.PL commits which force installation of XML::SAX::Expat should be rolled back. XML::Simple works with any XML::SAX backend, not just XML::SAX::Expat, which hasn't been actively maintained since 2003 and is deprecated in favor of XML::SAX::ExpatXS. In fact, forcing XML::SAX::Expat to install as the default XML::SAX backend currently breaks blastxml parsing. Note that forcing this also forces one to install the Expat library (now at v 2), which now has some compatibility problems with XML::SAX::Expat (but not ExpatXS). Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From qetzal at tutopia.com.br Wed Dec 6 10:21:20 2006 From: qetzal at tutopia.com.br (giovani) Date: Wed, 06 Dec 2006 10:21:20 -0500 Subject: [Bioperl-l] Biodiversity graphic Message-ID: An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061206/9d9e4a09/attachment.html From benoit at ebi.ac.uk Wed Dec 6 12:30:12 2006 From: benoit at ebi.ac.uk (Benoit Ballester) Date: Wed, 06 Dec 2006 17:30:12 +0000 Subject: [Bioperl-l] Biodiversity graphic In-Reply-To: References: Message-ID: <4576FE24.1030807@ebi.ac.uk> giovani wrote: > > Hello there. I'm trying to write a programa to set a graphic with two > axis and two data sets to each axis. Anyone know some tool similar to > the GD module to set this graphic, because with GD I'm having troubles. > here is an example of what I want to do: > http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm > using with GD module. It looks to me that the graph you pointing too has been made by gnuplot. Why don't you use gnuplot or R instead ? Ben > > #!/usr/bin/perl -w > > use GD::Graph::mixed; > @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 3, 4, 14, 30, 12, 8, 7, 20, 15], > [ 2, 8, 2, 5, 3, 1, 3, 4, 1], > [ 5, 12, 24, 33, 19, 8, 6, 15, 21], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > ); > > $my_graph = new GD::Graph::mixed( ); > $my_graph->set( > x_label => 'X Label', > y1_label => 'Y1 label', > y2_label => 'Y2 label', > title => 'Using two axes', > y1_max_value => 40, > y2_max_value => 8, > y_tick_number => 8, > y_label_skip => 2, > long_ticks => 1, > two_axes => 1, > use_axis => [1,2,1,2], > legend_placement => 'BR', > x_labels_vertical => 1, > x_label_position => 1/2, > ); > > $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY'); > my $gd = $my_graph->plot(\@data) or die $my_graph->error; > open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n"; > binmode IMG; > print IMG $gd->gif; > close IMG; > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gwu at molbio.mgh.harvard.edu Wed Dec 6 16:12:57 2006 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Wed, 06 Dec 2006 16:12:57 -0500 Subject: [Bioperl-l] Biodiversity graphic In-Reply-To: References: Message-ID: <45773259.3010405@molbio.mgh.harvard.edu> Do you mean the GD code can not run or it does not generate image as you wanted? Gang giovani wrote: > > > Hello there. I'm trying to write a programa to set a graphic with two > axis and two data sets to each axis. Anyone know some tool similar to > the GD module to set this graphic, because with GD I'm having > troubles. here is an example of what I want to do: > http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm > using with GD module. > > #!/usr/bin/perl -w > > use GD::Graph::mixed; > @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 3, 4, 14, 30, 12, 8, 7, 20, 15], > [ 2, 8, 2, 5, 3, 1, 3, 4, 1], > [ 5, 12, 24, 33, 19, 8, 6, 15, 21], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > ); > > $my_graph = new GD::Graph::mixed( ); > $my_graph->set( > x_label => 'X Label', > y1_label => 'Y1 label', > y2_label => 'Y2 label', > title => 'Using two axes', > y1_max_value => 40, > y2_max_value => 8, > y_tick_number => 8, > y_label_skip => 2, > long_ticks => 1, > two_axes => 1, > use_axis => [1,2,1,2], > legend_placement => 'BR', > x_labels_vertical => 1, > x_label_position => 1/2, > ); > > $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY'); > my $gd = $my_graph->plot(\@data) or die $my_graph->error; > open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n"; > binmode IMG; > print IMG $gd->gif; > close IMG; > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Wed Dec 6 17:39:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 06 Dec 2006 22:39:49 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 Release Message-ID: <457746B5.2020006@sendu.me.uk> I am proud to announce the final release of Bioperl 1.5.2. http://www.bioperl.org/wiki/Release_1.5.2 bioperl (core): cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-1.5.2_100.zip bioperl-run: cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip bioperl-db: cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip bioperl-network: cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip http://bioperl.org/DIST/SIGNATURES.md5 (all are also available via CVS, and for Windows users, using the Perl Package Manager - see the wiki for details) The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree and bioperl-pipeline) did not see a unified release for 1.5.2. This release represents a developer release which has been thoroughly tested. We consider it the most stable (in terms of bugs) version of Bioperl and believe it to be suitable for most people. It is marked 'developer' or even 'unstable' because its API may change on short notice. It will also not be maintained or supported beyond the next bioperl release. 1.5.2 introduces the following new (core) features: * Taxonomy (Bio::Species) overhaul * Bio::Map improvements * Bio::SearchIO speedup * Build.PL installation For details, and a complete change log, see the wiki. API documentation is available here: http://doc.bioperl.org/ Acknowledgements: Enumerable thanks are due for the tireless efforts of Christopher Fields (bug fixing, testing, documentation, discussion), Nathan Haigh (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra (testing, documentation, support). Feedback and ideas provided by Hilmar Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and elsewhere proved invaluable. None of this would have been possible without the behind-the-scenes work of the open-bio support team. I'd also like to acknowledge Andreas J. Koenig for his help with CPAN matters. Finally, thank you to everyone who tried out the release candidates, and especially those that took the time to file bug reports or report problems. Remember, Bioperl can only go from strength to strength with /your/ help. If you'd like to experience the fame and fortune that naturally follow becoming a Bioperl developer (?!), become one! http://www.bioperl.org/wiki/Becoming_a_developer On behalf of the Bioperl team, Sendu Bala. From cjfields at uiuc.edu Wed Dec 6 21:30:44 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 6 Dec 2006 20:30:44 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c719a7$b48beb90$15327e82@pyrimidine> Great job Sendu! A bit of icing on the cake: all the WinXP PPMs (core, db, network, run) installed w/o a hitch following normal instructions using PPM4 (GUI and command line shell) using clean ActiveState installations. Looks like all the correct prereqs were installed with shell (only XML::SAX::ExpatXS was left out in the GUI installation for reasons outlined before). I'll run more tests tomorrow to see if tests pass with the installed bioperl (this should catch any prereq issues with PPM installation we missed). chris > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using > the Perl Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, > bioperl-pedigree and bioperl-pipeline) did not see a unified > release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of > Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas > provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the > mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with > CPAN matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or > report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. From hlapp at gmx.net Wed Dec 6 22:20:14 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Dec 2006 22:20:14 -0500 Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries In-Reply-To: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> References: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> Message-ID: <8E15592D-6475-4A4D-BA6D-BD669C4233C3@gmx.net> I seriously doubt that load_seqdatabase.pl would have deliberately stopped loading the file. Either there was an error in loading an entry (which you should see, and you can also ask the script to just keep going by providing the --safe option), or the file only contained 1003 entries. Note that you can get progress logging by using the --logchunk option, which will also give you a final count of the number of sequences loaded. I'm not sure how you ran your search and your download on Uniprot. If I try what you describe I get 70491 hits, and if I try to export them using the data set manager I get the message: This download mechanism only supports 1000 proteins. The first 1000 proteins have been added from the selected. Which perfectly explains what you see. Did you convince yourself that the file contains 70491 entries? If you don't have grep and wc on your windows machine, you can use perl one-liners directly, e.g., perl -n -e '/^ID / && ++$n; END {print "$n entries\n";}' -hilmar On Dec 4, 2006, at 5:34 PM, pelikan at cs.pitt.edu wrote: > Hello, > > My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, > and the > latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB > memory. "make test"s past fine. > > The problem is that I'm not getting similar numbers of anything when I > load datasets using load_seqdatabase.pl. For instance, if I want to > load > only protiens from Homo Sapiens, > I go to UniProt, > use the database search function, > do a text search for Homo Sapiens (returns 70914 hits), > export the hits to flat file format (--format swiss) using the data > set > manager, > and load it using load_seqdatabase.pl. > > The result of "select count(*) from bioentry;" results in only > 1003 entries. > Moreover it seems like the entries don't go past the B's in the > alphabet - > I can't find bioentry.descriptions like '%cytochrome%' or '% > myoglobin%', > but I can find apolipoproteins, for example. > > I know this is an annoying question, but if someone has more > experience in > dealing with this issue, I would be grateful for any assistance. I > don't > get any error messages, so it's difficult for me to tell what's > going on. > > -Richard > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lzhtom at hotmail.com Wed Dec 6 22:13:47 2006 From: lzhtom at hotmail.com (zhihua li) Date: Thu, 07 Dec 2006 03:13:47 +0000 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? Message-ID: Hi netters, Recently I found this: For constructing a new SeqI object, I had to write: $seq_obj=Bio::SeqIO->new( -file => '/home/myfile', -format => 'Fasta'); #Note the dash before the two arguments. If I omitted the dash: $seq_obj=Bio::SeqIO->new( file => '/home/myfile', format => 'Fasta'); I'd get error: MSG: Unknown format given or could not determine it [] STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 So it seems to me that the dashes before the arguments are essential. However, when I tried to build a factory for StandaloneBlast, I found the other way around. If the script had the dash: $blast_obj=Bio::Tools::Run::StandAloneBlast->new( -program => 'blastn', -database => '/home/mydatabase'); I'd get the error message: MSG: Unallowed parameter: - ! STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 If I left out the dash by saying: $blast_obj=Bio::Tools::Run::StandAloneBlast->new( program => 'blastn', database => '/home/mydatabase'); Everyting is fine. Now I'm confused. Why sometimes I have to add the dash, while sometimes I'm not allowed to? Thanks in advance! _________________________________________________________________ ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn From hlapp at gmx.net Wed Dec 6 22:56:44 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Dec 2006 22:56:44 -0500 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: Congrats! Great work, Sendu! Don't forget to celebrate. -hilmar On Dec 6, 2006, at 5:39 PM, Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher > Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by > Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list > and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN > matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or report > problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From arareko at campus.iztacala.unam.mx Wed Dec 6 22:53:21 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 06 Dec 2006 21:53:21 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <45779031.3050202@campus.iztacala.unam.mx> This has been a great effort. Congrats and thanks to everyone involved! Mauricio. Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN matters. > > Finally, thank you to everyone who tried out the release candidates, and > especially those that took the time to file bug reports or report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From jason at bioperl.org Thu Dec 7 00:06:36 2006 From: jason at bioperl.org (Jason Stajich) Date: Wed, 6 Dec 2006 21:06:36 -0800 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <41A863C9-1B69-4C7B-9271-C577EDD011BB@bioperl.org> hear! hear! Excellent work. Thanks for leading the effort on this release and all of the behind the scenes work, attention to detail, and cat herding work it took make this possible. -jason On Dec 6, 2006, at 2:39 PM, Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher > Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by > Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list > and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN > matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or report > problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From n.haigh at sheffield.ac.uk Thu Dec 7 02:23:47 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 07 Dec 2006 07:23:47 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <4577C183.7010501@sheffield.ac.uk> I know I'm very new to Bioperl development and don't know very much yet, so I'm probably not the best person to express the views of the Bioperl developers or users. However, I'm sure I'm safe in saying that on behalf of everyone associated with Bioperl a *huge* thank you must go out to Sendu for the gargantuan effort he has put into this release. Just looking over some of the e-mails he's sent over the past few weeks alone, it's clear that he has devoted a huge amount of time to the effort and in some cases with little sleep. Since there is very little (or should I say no) monetary recognition in such an important and time consuming role as "Release Pumpkin", I hope Sendu has a warm glow, safe in the knowledge that his efforts have helped enormously and are clearly recognised and fully appreciated by the Bioperl community. Therefore, I'd just like to iterate what others have already said.....Well done, excellent work!!! Nath From valiente at lsi.upc.edu Thu Dec 7 03:25:27 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Thu, 7 Dec 2006 09:25:27 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: Message-ID: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> The following popped out when input more the 110 species to taxonomy2tree script version 1.4: (in cleanup) ------------- EXCEPTION ------------- MSG: Must supply a Bio::Taxon STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ flatfile.pm:260 STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 STACK (eval) taxonomy2tree.pl:0 STACK toplevel taxonomy2tree.pl:0 Any clues? Thanks, Gabriel From bix at sendu.me.uk Thu Dec 7 04:24:39 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Dec 2006 09:24:39 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> Message-ID: <4577DDD7.7060208@sendu.me.uk> Gabriel Valiente wrote: > The following popped out when input more the 110 species to > taxonomy2tree script version 1.4: > > (in cleanup) > ------------- EXCEPTION ------------- > MSG: Must supply a Bio::Taxon > STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ > flatfile.pm:260 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 > STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 > STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 > STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 > STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 > STACK (eval) taxonomy2tree.pl:0 > STACK toplevel taxonomy2tree.pl:0 > > Any clues? Thanks, Are you able to narrow the problem down? What was your command line, what species were you using? Does it work with the first 110 species you tried? Is there anything special about the 111th? Do I understand correctly that this was a problem during cleanup only, and didn't affect the correctness and completeness of the result? From bix at sendu.me.uk Thu Dec 7 04:33:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Dec 2006 09:33:18 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> Message-ID: <4577DFDE.6000500@sendu.me.uk> Gabriel Valiente wrote: > The following popped out when input more the 110 species to > taxonomy2tree script version 1.4: > > (in cleanup) > ------------- EXCEPTION ------------- > MSG: Must supply a Bio::Taxon > STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ > flatfile.pm:260 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 > STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 > STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 > STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 > STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 > STACK (eval) taxonomy2tree.pl:0 > STACK toplevel taxonomy2tree.pl:0 > > Any clues? Thanks, Oh, does it work with option -e? Or does it work if you delete your old indexes of the nodes and names files and let it re-create them? From valiente at lsi.upc.edu Thu Dec 7 04:38:03 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Thu, 7 Dec 2006 10:38:03 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4577DDD7.7060208@sendu.me.uk> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> <4577DDD7.7060208@sendu.me.uk> Message-ID: Hi, If you run the attached shell script you should be able to reproduce the problem. It is not about any species in particular, but about the total number of species: it crushes with more than 120 species. The resulting tree is not correct, I'm checking it further now. Thanks, Gabriel -------------- next part -------------- A non-text attachment was scrubbed... Name: fetch-bork.sh Type: application/octet-stream Size: 7378 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/00f0aeda/attachment.obj -------------- next part -------------- On Dec 7, 2006, at 10:24 AM, Sendu Bala wrote: > Gabriel Valiente wrote: >> The following popped out when input more the 110 species to >> taxonomy2tree script version 1.4: >> (in cleanup) >> ------------- EXCEPTION ------------- >> MSG: Must supply a Bio::Taxon >> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ >> flatfile.pm:260 >> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 >> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 >> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 >> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 >> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 >> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 >> STACK (eval) taxonomy2tree.pl:0 >> STACK toplevel taxonomy2tree.pl:0 >> Any clues? Thanks, > > Are you able to narrow the problem down? What was your command > line, what species were you using? Does it work with the first 110 > species you tried? Is there anything special about the 111th? > > Do I understand correctly that this was a problem during cleanup > only, and didn't affect the correctness and completeness of the > result? From cjfields at uiuc.edu Thu Dec 7 10:22:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 09:22:47 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110species In-Reply-To: Message-ID: <000001c71a13$8feec840$15327e82@pyrimidine> > Hi, > > If you run the attached shell script you should be able to > reproduce the problem. It is not about any species in > particular, but about the total number of species: it crushes > with more than 120 species. The resulting tree is not > correct, I'm checking it further now. Thanks, > > Gabriel Gabriel, My guess is this may have to do with using an old taxonomy dump file. I got this to work on winXP using the latest NCBI taxonomy. I had to modify taxonomy2tree and your shell script to get it to play nice with Windows, but I didn't get the error and I did get a tree (abbreviated for brevity): (((((("Agrobacterium tumefaciens str. C58","Sinorhizobium meliloti")Rhizobiaceae,... chris From cjfields at uiuc.edu Thu Dec 7 13:44:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 12:44:32 -0600 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: References: Message-ID: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu> On Dec 6, 2006, at 9:13 PM, zhihua li wrote: > Hi netters, > > Recently I found this: > > For constructing a new SeqI object, I had to write: > $seq_obj=Bio::SeqIO->new( > -file => '/home/myfile', > -format => 'Fasta'); #Note the dash before the > two arguments. > > If I omitted the dash: > $seq_obj=Bio::SeqIO->new( > file => '/home/myfile', > format => 'Fasta'); > I'd get error: > MSG: Unknown format given or could not determine it [] > STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 > > So it seems to me that the dashes before the arguments are > essential. However, when I tried to build a factory for > StandaloneBlast, I found the other way around. > > If the script had the dash: > $blast_obj=Bio::Tools::Run::StandAloneBlast->new( > -program => 'blastn', > -database => '/home/mydatabase'); > > I'd get the error message: MSG: Unallowed parameter: - ! > STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ > site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 > STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ > site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 > > If I left out the dash by saying: > $blast_obj=Bio::Tools::Run::StandAloneBlast->new( > program => 'blastn', > database => '/home/mydatabase'); > > Everyting is fine. > > Now I'm confused. Why sometimes I have to add the dash, while > sometimes I'm not allowed to? > > Thanks in advance! I agree that this should be more consistent. Does anyone know the reasoning for this? Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Thu Dec 7 14:32:21 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 07 Dec 2006 14:32:21 -0500 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu> Message-ID: Chris, The latest StandAloneBlast takes "dashed parameters", as in: @params = (-database => 'swissprot',-outfile => 'blast1.out'); $factory = Bio::Tools::Run::StandAloneBlast->new(@params); Or my $factory = Bio::Tools::Run::StandAloneBlast->new(-program =>"wublastp", -database=>"swissprot", -e => 1e-20); So that's why I asked "what version?" Someone made the change to allow dashes in @params a few months ago and I believe that that someone was you! Brian O. On 12/7/06 1:44 PM, "Chris Fields" wrote: > > On Dec 6, 2006, at 9:13 PM, zhihua li wrote: > >> Hi netters, >> >> Recently I found this: >> >> For constructing a new SeqI object, I had to write: >> $seq_obj=Bio::SeqIO->new( >> -file => '/home/myfile', >> -format => 'Fasta'); #Note the dash before the >> two arguments. >> >> If I omitted the dash: >> $seq_obj=Bio::SeqIO->new( >> file => '/home/myfile', >> format => 'Fasta'); >> I'd get error: >> MSG: Unknown format given or could not determine it [] >> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 >> >> So it seems to me that the dashes before the arguments are >> essential. However, when I tried to build a factory for >> StandaloneBlast, I found the other way around. >> >> If the script had the dash: >> $blast_obj=Bio::Tools::Run::StandAloneBlast->new( >> -program => 'blastn', >> -database => '/home/mydatabase'); >> >> I'd get the error message: MSG: Unallowed parameter: - ! >> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ >> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 >> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ >> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 >> >> If I left out the dash by saying: >> $blast_obj=Bio::Tools::Run::StandAloneBlast->new( >> program => 'blastn', >> database => '/home/mydatabase'); >> >> Everyting is fine. >> >> Now I'm confused. Why sometimes I have to add the dash, while >> sometimes I'm not allowed to? >> >> Thanks in advance! > > I agree that this should be more consistent. Does anyone know the > reasoning for this? > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Dec 7 14:44:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 13:44:19 -0600 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: References: Message-ID: On Dec 7, 2006, at 1:32 PM, Brian Osborne wrote: > Chris, > > The latest StandAloneBlast takes "dashed parameters", as in: > > @params = (-database => 'swissprot',-outfile => 'blast1.out'); > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > Or > > my $factory = Bio::Tools::Run::StandAloneBlast->new(-program > =>"wublastp", > - > database=>"swissprot", > -e => 1e-20); > > So that's why I asked "what version?" > > Someone made the change to allow dashes in @params a few months ago > and I > believe that that someone was you! > > Brian O. Nope, I plead innocent (at least to this!). I haven't made any commits to StandAloneBlast. These were added in by Torsten (see commits 1.59, 1.60), so you'll need to blame/thank him... http://tinyurl.com/y7ym9g So they're now a bit more consistent. That's not to say StandAloneBlast doesn't need some major revisions.... BTW, I didn't see a post from you asking about the version. Chris From akarger at CGR.Harvard.edu Thu Dec 7 16:32:51 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu, 7 Dec 2006 16:32:51 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq Message-ID: I need to know how to get the frame information in exon features (created by Bio::Tools::GFF) into a whole-gene feature that will be translated into a protein. I'm reading in some fungal GFFs generated by Jason Stajich. I - Use Bio::Tools::GFF to create a feature for each exon in a gene - Create a Bio::Location::Split object containing each feature's location - Create a Bio::SeqFeature::Generic object whose location is the above BL::Split - Attach my contig Bio::Seq to the feature - get the protein with feature->spliced_seq->translate->seq (Code below) Unfortunately, I get the wrong result when the GFF features have frame != 0. This happens for only a few percent of the exons, but when it does, I end up translating in the wrong frame. If I read the docs correctly, Location objects don't have a frame. So how do I get the correct spliced_seq, which skips one or two bp at the beginning of certain exons? I suspect the answer to this is that I'm going about this in completely the wrong way, in which case, please tell me how I ought to be doing it. Thanks, - Amir Karger Research Computing Life Sciences Division Harvard University P.S. In case you want to see actual code, here it is. After using Bio::Tools::GFF to create a sorted list of features for each exon (basically stolen from the module POD), I: # Create a new object representing the exons' gene my $coding_loc_obj = new Bio::Location::Split; foreach my $exon (@sorted_exons) { $coding_loc_obj->add_sub_Location($exon->location); } # Build a spliced feature representing the whole gene my $spliced_feat = new Bio::SeqFeature::Generic( -start => $coding_loc_obj->start, -end => $coding_loc_obj->end, -strand => $strand_num, -primary=> "splicedGene", ); $spliced_feat->location($coding_loc_obj); # Attach a contig object containing the sequence $spliced_feat->attach_seq($contig_obj->bioperl_object); # Get the spliced seq and translate to protein: my $coding_seq = $spliced_feat->spliced_seq->seq; my $protein = $spliced_feat->spliced_seq->translate->seq; From bix at sendu.me.uk Thu Dec 7 17:45:32 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 7 Dec 2006 15:45:32 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release Message-ID: <000001c71a51$671a79d0$6400a8c0@CodonSolutions.local> I am proud to announce the final release of Bioperl 1.5.2. http://www.bioperl.org/wiki/Release_1.5.2 bioperl (core): cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-1.5.2_100.zip bioperl-run: cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip bioperl-db: cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip bioperl-network: cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip http://bioperl.org/DIST/SIGNATURES.md5 (all are also available via CVS, and for Windows users, using the Perl Package Manager - see the wiki for details) The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree and bioperl-pipeline) did not see a unified release for 1.5.2. This release represents a developer release which has been thoroughly tested. We consider it the most stable (in terms of bugs) version of Bioperl and believe it to be suitable for most people. It is marked 'developer' or even 'unstable' because its API may change on short notice. It will also not be maintained or supported beyond the next bioperl release. 1.5.2 introduces the following new (core) features: * Taxonomy (Bio::Species) overhaul * Bio::Map improvements * Bio::SearchIO speedup * Build.PL installation For details, and a complete change log, see the wiki. API documentation is available here: http://doc.bioperl.org/ Acknowledgements: Enumerable thanks are due for the tireless efforts of Christopher Fields (bug fixing, testing, documentation, discussion), Nathan Haigh (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra (testing, documentation, support). Feedback and ideas provided by Hilmar Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and elsewhere proved invaluable. None of this would have been possible without the behind-the-scenes work of the open-bio support team. I'd also like to acknowledge Andreas J. Koenig for his help with CPAN matters. Finally, thank you to everyone who tried out the release candidates, and especially those that took the time to file bug reports or report problems. Remember, Bioperl can only go from strength to strength with /your/ help. If you'd like to experience the fame and fortune that naturally follow becoming a Bioperl developer (?!), become one! http://www.bioperl.org/wiki/Becoming_a_developer On behalf of the Bioperl team, Sendu Bala. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From cjfields at uiuc.edu Thu Dec 7 18:00:43 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 16:00:43 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c71a53$85cb4f10$6400a8c0@CodonSolutions.local> Great job Sendu! A bit of icing on the cake: all the WinXP PPMs (core, db, network, run) installed w/o a hitch following normal instructions using PPM4 (GUI and command line shell) using clean ActiveState installations. Looks like all the correct prereqs were installed with shell (only XML::SAX::ExpatXS was left out in the GUI installation for reasons outlined before). I'll run more tests tomorrow to see if tests pass with the installed bioperl (this should catch any prereq issues with PPM installation we missed). chris > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using > the Perl Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, > bioperl-pedigree and bioperl-pipeline) did not see a unified > release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of > Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas > provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the > mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with > CPAN matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or > report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From kaboroev at sfu.ca Thu Dec 7 17:26:35 2006 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Thu, 07 Dec 2006 14:26:35 -0800 Subject: [Bioperl-l] Bio::Graphics xyplot Message-ID: <4578951B.5050206@sfu.ca> Hi everyone, I'm attempting to add an xyplot of the phred quality scores to an Bio::Graphics image, and cannot get it to work. I have the panel with a track for both the scale and the DNA displaying properly. When I attempt to add the xyplot i just get a garbled track of, what looks like, timy xyplots for each datapoint. I have the cvs (updated today) of bioperl-live running. I think what I am missing is the creation of a "Sequence Feature Group" to hold the individual points of the plot. However, I cannot seem to find such an object. This is what I attempted: -------BEGIN---CODE----------- # start panel my $panel = Bio::Graphics::Panel->new(-length => $f_seqlen, -width => $f_seqlen*10, -pad_left => 10, -pad_right => 10, -grid => 1 ); # add scale $panel->add_track(arrow => Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen), -double => 1, -tick => 2, -fgcolor => 'black'); # add DNA ($feature is of type Bio::SeqFeature::Annotated) $panel->add_track(dna => $feature); # get list of quality scores from database my ($pqs_value) = $dbh->selectrow_array($sql); my @pqs_value = split(/\s/,$pqs_value); # create track my $track = $panel->add_track(-glyph => 'xyplot', -graph_type => 'points', -point_symbol => 'point', -max_score => 100, -min_score => 0, -scale => 'none'); # add "subfeatures" to for (my $i=0;$i<$f_seqlen;$i++) { $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i])); } print $panel->png(); $panel->finished; ------END---CODE---------- I also attempted to create an array of the point features and passed that by reference to the panel "add_track" as it describes in the xyplot documentation, but that resulted in the exact same image. keith -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 From arareko at campus.iztacala.unam.mx Thu Dec 7 18:15:53 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 7 Dec 2006 16:15:53 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c71a55$a479da60$6400a8c0@CodonSolutions.local> This has been a great effort. Congrats and thanks to everyone involved! Mauricio. Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN matters. > > Finally, thank you to everyone who tried out the release candidates, and > especially those that took the time to file bug reports or report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From cain at cshl.edu Thu Dec 7 17:46:09 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 07 Dec 2006 17:46:09 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq In-Reply-To: References: Message-ID: <1165531569.2569.49.camel@localhost.localdomain> Amir, I don't know for sure what the problem is, but here is one possibility: the number in column 8 of a GFF file is not the frame, it is the phase. See the GFF3 spec for a description of what the phase is: http://www.sequenceontology.org/gff3.shtml (It doesn't matter if you are using GFF3 or GFF2, as the phase is the same in both). Scott On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > I need to know how to get the frame information in exon features > (created by Bio::Tools::GFF) into a whole-gene feature that will be > translated into a protein. > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > - Create a Bio::Location::Split object containing each feature's > location > - Create a Bio::SeqFeature::Generic object whose location is the above > BL::Split > - Attach my contig Bio::Seq to the feature > - get the protein with feature->spliced_seq->translate->seq > > (Code below) > > Unfortunately, I get the wrong result when the GFF features have frame > != 0. This happens for only a few percent of the exons, but when it > does, I end up translating in the wrong frame. > > If I read the docs correctly, Location objects don't have a frame. So > how do I get the correct spliced_seq, which skips one or two bp at the > beginning of certain exons? > > I suspect the answer to this is that I'm going about this in completely > the wrong way, in which case, please tell me how I ought to be doing it. > > Thanks, > - Amir Karger > Research Computing > Life Sciences Division > Harvard University > > P.S. In case you want to see actual code, here it is. After using > Bio::Tools::GFF to create a sorted list of features for each exon > (basically stolen from the module POD), I: > # Create a new object representing the exons' gene > my $coding_loc_obj = new Bio::Location::Split; > foreach my $exon (@sorted_exons) { > $coding_loc_obj->add_sub_Location($exon->location); > } > > # Build a spliced feature representing the whole gene > my $spliced_feat = new Bio::SeqFeature::Generic( > -start => $coding_loc_obj->start, > -end => $coding_loc_obj->end, > -strand => $strand_num, > -primary=> "splicedGene", > ); > $spliced_feat->location($coding_loc_obj); > > # Attach a contig object containing the sequence > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > # Get the spliced seq and translate to protein: > my $coding_seq = $spliced_feat->spliced_seq->seq; > my $protein = $spliced_feat->spliced_seq->translate->seq; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/913096a5/attachment.bin From cjfields at uiuc.edu Thu Dec 7 21:52:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 20:52:47 -0600 Subject: [Bioperl-l] Using frame info from GFF in gettinga Seq->spliced_seq In-Reply-To: <1165531569.2569.49.camel@localhost.localdomain> Message-ID: <002d01c71a73$f16ecc40$15327e82@pyrimidine> Another issue is the splittype() is not defined, though I don't think that would kill anything as currently implemented. However, one thing we have passingly discussed is having Bio::Location::Split objects possibly exhibit different (but expected) behaviors based upon the splittype() (order, join, or bond). It's one of the things I want to work out for the next release. If Scott's fix doesn't work and the problem persists, you should file a bug report with some sample data for us to test out. chris > Amir, > > I don't know for sure what the problem is, but here is one > possibility: > the number in column 8 of a GFF file is not the frame, it is > the phase. > See the GFF3 spec for a description of what the phase is: > > http://www.sequenceontology.org/gff3.shtml > > (It doesn't matter if you are using GFF3 or GFF2, as the > phase is the same in both). > > Scott > > > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > > I need to know how to get the frame information in exon features > > (created by Bio::Tools::GFF) into a whole-gene feature that will be > > translated into a protein. > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > - Create a Bio::Location::Split object containing each feature's > > location > > - Create a Bio::SeqFeature::Generic object whose location > is the above > > BL::Split > > - Attach my contig Bio::Seq to the feature > > - get the protein with feature->spliced_seq->translate->seq > > > > (Code below) > > > > Unfortunately, I get the wrong result when the GFF features > have frame > > != 0. This happens for only a few percent of the exons, but when it > > does, I end up translating in the wrong frame. > > > > If I read the docs correctly, Location objects don't have a > frame. So > > how do I get the correct spliced_seq, which skips one or > two bp at the > > beginning of certain exons? > > > > I suspect the answer to this is that I'm going about this in > > completely the wrong way, in which case, please tell me how > I ought to be doing it. > > > > Thanks, > > - Amir Karger > > Research Computing > > Life Sciences Division > > Harvard University > > > > P.S. In case you want to see actual code, here it is. After using > > Bio::Tools::GFF to create a sorted list of features for each exon > > (basically stolen from the module POD), I: > > # Create a new object representing the exons' gene > > my $coding_loc_obj = new Bio::Location::Split; > > foreach my $exon (@sorted_exons) { > > $coding_loc_obj->add_sub_Location($exon->location); > > } > > > > # Build a spliced feature representing the whole gene > > my $spliced_feat = new Bio::SeqFeature::Generic( > > -start => $coding_loc_obj->start, > > -end => $coding_loc_obj->end, > > -strand => $strand_num, > > -primary=> "splicedGene", > > ); > > $spliced_feat->location($coding_loc_obj); > > > > # Attach a contig object containing the sequence > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > # Get the spliced seq and translate to protein: > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > my $protein = $spliced_feat->spliced_seq->translate->seq; From jason at bioperl.org Thu Dec 7 21:01:33 2006 From: jason at bioperl.org (Jason Stajich) Date: Thu, 7 Dec 2006 18:01:33 -0800 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq In-Reply-To: References: Message-ID: <866F6CEE-62BB-4880-9B13-6DDE29EAF94E@bioperl.org> This was a problem in the gene prediction output I suspect, more recent versions of the program should have fixed this. I do not currently have free time to deal with the errors in the small number of ORFs where this has happened. I think you just need to do start -= start- (frame*strand) for 1st exons. You can also probably provide the 1st exon's frame to the translate function as another possibility but you should try and get the CDS correct first depending on your downstream analyses. -jason On Dec 7, 2006, at 1:32 PM, Amir Karger wrote: > I need to know how to get the frame information in exon features > (created by Bio::Tools::GFF) into a whole-gene feature that will be > translated into a protein. > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > - Create a Bio::Location::Split object containing each feature's > location > - Create a Bio::SeqFeature::Generic object whose location is the above > BL::Split > - Attach my contig Bio::Seq to the feature > - get the protein with feature->spliced_seq->translate->seq > > (Code below) > > Unfortunately, I get the wrong result when the GFF features have frame > != 0. This happens for only a few percent of the exons, but when it > does, I end up translating in the wrong frame. > > If I read the docs correctly, Location objects don't have a frame. So > how do I get the correct spliced_seq, which skips one or two bp at the > beginning of certain exons? > > I suspect the answer to this is that I'm going about this in > completely > the wrong way, in which case, please tell me how I ought to be > doing it. > > Thanks, > - Amir Karger > Research Computing > Life Sciences Division > Harvard University > > P.S. In case you want to see actual code, here it is. After using > Bio::Tools::GFF to create a sorted list of features for each exon > (basically stolen from the module POD), I: > # Create a new object representing the exons' gene > my $coding_loc_obj = new Bio::Location::Split; > foreach my $exon (@sorted_exons) { > $coding_loc_obj->add_sub_Location($exon->location); > } > > # Build a spliced feature representing the whole gene > my $spliced_feat = new Bio::SeqFeature::Generic( > -start => $coding_loc_obj->start, > -end => $coding_loc_obj->end, > -strand => $strand_num, > -primary=> "splicedGene", > ); > $spliced_feat->location($coding_loc_obj); > > # Attach a contig object containing the sequence > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > # Get the spliced seq and translate to protein: > my $coding_seq = $spliced_feat->spliced_seq->seq; > my $protein = $spliced_feat->spliced_seq->translate->seq; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From neetisomaiya at gmail.com Fri Dec 8 05:21:50 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 8 Dec 2006 15:51:50 +0530 Subject: [Bioperl-l] need help with phrap parser Message-ID: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> Can anyone point me to a Phrap parser which parses the ace file to extract what reads make up each contig (eg. read_a and read_b make contig1; read_d read_e and read_z make contig2, and other information of the reads (like whether the read is complemented or not with respect to the contig, what region of the contig does each read contribute etc), basically the AF and BS lines of the ACE output. -- -Neeti Even my blood says, B positive From pmiguel at purdue.edu Fri Dec 8 09:17:02 2006 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 08 Dec 2006 09:17:02 -0500 Subject: [Bioperl-l] need help with phrap parser In-Reply-To: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> References: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> Message-ID: <457973DE.6050900@purdue.edu> neeti somaiya wrote: > Can anyone point me to a Phrap parser which parses the ace file to extract > what reads make up each contig (eg. read_a and read_b make contig1; read_d > read_e and read_z make contig2, and other information of the reads (like > whether the read is complemented or not with respect to the contig, what > region of the contig does each read contribute etc), basically the AF and BS > lines of the ACE output. > > neeti, To find the reads that went into each contig, you do *not* want the BS tagged records. My understanding is that BS is just what consed uses to populate its consensus line from the ace file. I write this because of an email sent me by David Gordon in 2001 included here without his permission: > > Phrap writes BS lines which > > indicate, for each consensus position, which read phrap uses at that > > position to become the consensus. These BS ("base segments") are > > manipulated by Consed when there are changes to the assembly, such as > > joins, tears, removing reads, or changing the consensus. > The simplest way is: egrep '^CO|AF|RD' acefilename if you are on a unix system. Or with perl while (<>) { print if (/^CO|AF|RD/); } But then you would need to parse the fields of interest. You get the position/strand in the contig from AF, then you get the length of the read from RD. There does look like there is a part of bioperl that meant to perform this task--including Bio::Assembly::IO::ace but it looks like it was started, but never completed. From cjfields at uiuc.edu Fri Dec 8 10:17:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 09:17:31 -0600 Subject: [Bioperl-l] NAR Database Issue Papers Message-ID: <000601c71adb$fdd60490$15327e82@pyrimidine> For those interested, the Nucleic Acids Research Database issue papers have been popping up in the Advance Access section of the NAR website: http://nar.oxfordjournals.org/papbyrecent.dtl Ensembl, UCSC Browser, Entrez Gene, and a number of others of possible are represented. Of particular note are a few mentions of formatting changes to UniProt, EMBL, and other records, which should be taken care of in the latest BioPerl release (fingers crossed!). chris From cjfields at uiuc.edu Fri Dec 8 10:31:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 09:31:19 -0600 Subject: [Bioperl-l] need help with phrap parser In-Reply-To: <457973DE.6050900@purdue.edu> Message-ID: <000001c71add$ec7147d0$15327e82@pyrimidine> ... > But then you would need to parse the fields of interest. You get the > position/strand in the contig from AF, then you get the length of the > read from RD. > > There does look like there is a part of bioperl that meant to perform > this task--including Bio::Assembly::IO::ace but it looks like it was > started, but never completed. ...and if anyone wants to chip in and work on it, let us know! The various Bio::Assembly modules are one of many areas that needs some updating. chris From akarger at CGR.Harvard.edu Fri Dec 8 13:25:47 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 8 Dec 2006 13:25:47 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq Message-ID: > This was a problem in the gene prediction output I suspect, more > recent versions of the program should have fixed this. I do not > currently have free time to deal with the errors in the small number > of ORFs where this has happened. > > I think you just need to do > start -= start- (frame*strand) > for 1st exons. I used if (strand==1) {start += exon->frame} else {end -= exon->frame} This took me from 90 translations that had * within the sequence to just 9, out of 5500 CDS in S bayanus. > You can also probably provide the 1st exon's frame to the translate > function as another possibility but you should try and get the CDS > correct first depending on your downstream analyses. Yes, I think. Scott Cain pointed out that GFF column 8 is the "phase", which I had never heard of before. My current, very limited, understanding is that sometimes you'll have an exon with, say, 31 bp, followed by an exon with 29 bp. When the intron gets spliced out, you eventually get an mRNA of 60 bp, which translates to a protein of 20 aa. But the second exon has a phase of 1, not 0, because you can't just start translating at the first bp of the second exon and expect to get nice amino acids. By the way, whether or not phase is the same thing as frame, when I call the frame() method on the features created by Bio::Tools::GFF, I get the phase info. I assume that's a feature (no pun intended), not a bug? I'm still confused as to why you would have a phase in the first exon, though. Why not just say the CDS starts 1 or 2 bp later? (This is probably a bio question, not a bioperl question, but a quick Google didn't get me an answer. "Phase" isn't a very good search term.) I guess the real question here, which Jason alludes to, is whether SeqFeature->spliced_seq ought to take into account the phase information of the first exon. Right now, it doesn't, so when you call SeqFeature->spliced_seq->translate, you get gibberish. Are there cases where you would want spliced_seq to include the first bp or two? Should there be an option to spliced_seq for whether you want to take phase information into account? I can't submit a bug report until we confirm it's a bug. Thanks, -Amir Karger > -jason > On Dec 7, 2006, at 1:32 PM, Amir Karger wrote: > > > I need to know how to get the frame information in exon features > > (created by Bio::Tools::GFF) into a whole-gene feature that will be > > translated into a protein. > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > - Create a Bio::Location::Split object containing each feature's > > location > > - Create a Bio::SeqFeature::Generic object whose location > is the above > > BL::Split > > - Attach my contig Bio::Seq to the feature > > - get the protein with feature->spliced_seq->translate->seq > > > > (Code below) > > > > Unfortunately, I get the wrong result when the GFF features > have frame > > != 0. This happens for only a few percent of the exons, but when it > > does, I end up translating in the wrong frame. > > > > If I read the docs correctly, Location objects don't have a > frame. So > > how do I get the correct spliced_seq, which skips one or > two bp at the > > beginning of certain exons? > > > > I suspect the answer to this is that I'm going about this in > > completely > > the wrong way, in which case, please tell me how I ought to be > > doing it. > > > > Thanks, > > - Amir Karger > > Research Computing > > Life Sciences Division > > Harvard University > > > > P.S. In case you want to see actual code, here it is. After using > > Bio::Tools::GFF to create a sorted list of features for each exon > > (basically stolen from the module POD), I: > > # Create a new object representing the exons' gene > > my $coding_loc_obj = new Bio::Location::Split; > > foreach my $exon (@sorted_exons) { > > $coding_loc_obj->add_sub_Location($exon->location); > > } > > > > # Build a spliced feature representing the whole gene > > my $spliced_feat = new Bio::SeqFeature::Generic( > > -start => $coding_loc_obj->start, > > -end => $coding_loc_obj->end, > > -strand => $strand_num, > > -primary=> "splicedGene", > > ); > > $spliced_feat->location($coding_loc_obj); > > > > # Attach a contig object containing the sequence > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > # Get the spliced seq and translate to protein: > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > my $protein = $spliced_feat->spliced_seq->translate->seq; > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From akarger at CGR.Harvard.edu Fri Dec 8 13:33:09 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 8 Dec 2006 13:33:09 -0500 Subject: [Bioperl-l] Using frame info from GFF in gettinga Seq->spliced_seq Message-ID: > Another issue is the splittype() is not defined, though I > don't think that > would kill anything as currently implemented. However, one > thing we have > passingly discussed is having Bio::Location::Split objects > possibly exhibit > different (but expected) behaviors based upon the splittype() > (order, join, > or bond). It's one of the things I want to work out for the > next release. Should I be writing -splittype => "JOIN" or some such in my new()? -Amir Karger > > chris > > > Amir, > > > > I don't know for sure what the problem is, but here is one > > possibility: > > the number in column 8 of a GFF file is not the frame, it is > > the phase. > > See the GFF3 spec for a description of what the phase is: > > > > http://www.sequenceontology.org/gff3.shtml > > > > (It doesn't matter if you are using GFF3 or GFF2, as the > > phase is the same in both). > > > > Scott > > > > > > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > > > I need to know how to get the frame information in exon features > > > (created by Bio::Tools::GFF) into a whole-gene feature > that will be > > > translated into a protein. > > > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > > - Create a Bio::Location::Split object containing each feature's > > > location > > > - Create a Bio::SeqFeature::Generic object whose location > > is the above > > > BL::Split > > > - Attach my contig Bio::Seq to the feature > > > - get the protein with feature->spliced_seq->translate->seq > > > > > > (Code below) > > > > > > Unfortunately, I get the wrong result when the GFF features > > have frame > > > != 0. This happens for only a few percent of the exons, > but when it > > > does, I end up translating in the wrong frame. > > > > > > If I read the docs correctly, Location objects don't have a > > frame. So > > > how do I get the correct spliced_seq, which skips one or > > two bp at the > > > beginning of certain exons? > > > > > > I suspect the answer to this is that I'm going about this in > > > completely the wrong way, in which case, please tell me how > > I ought to be doing it. > > > > > > Thanks, > > > - Amir Karger > > > Research Computing > > > Life Sciences Division > > > Harvard University > > > > > > P.S. In case you want to see actual code, here it is. After using > > > Bio::Tools::GFF to create a sorted list of features for each exon > > > (basically stolen from the module POD), I: > > > # Create a new object representing the exons' gene > > > my $coding_loc_obj = new Bio::Location::Split; > > > foreach my $exon (@sorted_exons) { > > > $coding_loc_obj->add_sub_Location($exon->location); > > > } > > > > > > # Build a spliced feature representing the whole gene > > > my $spliced_feat = new Bio::SeqFeature::Generic( > > > -start => $coding_loc_obj->start, > > > -end => $coding_loc_obj->end, > > > -strand => $strand_num, > > > -primary=> "splicedGene", > > > ); > > > $spliced_feat->location($coding_loc_obj); > > > > > > # Attach a contig object containing the sequence > > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > > > # Get the spliced seq and translate to protein: > > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > > my $protein = $spliced_feat->spliced_seq->translate->seq; > > > > From cjfields at uiuc.edu Fri Dec 8 14:04:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 13:04:55 -0600 Subject: [Bioperl-l] Using frame info from GFF ingettinga Seq->spliced_seq In-Reply-To: Message-ID: <000901c71afb$bf504210$15327e82@pyrimidine> > > Another issue is the splittype() is not defined, though I > don't think > > that would kill anything as currently implemented. > However, one thing > > we have passingly discussed is having Bio::Location::Split objects > > possibly exhibit different (but expected) behaviors based upon the > > splittype() (order, join, or bond). It's one of the things > I want to > > work out for the next release. > > Should I be writing -splittype => "JOIN" or some such in my new()? > > -Amir Karger I missed the fact that 'JOIN' is the default splittype() from looking at the constructor in Location::Split, so you actually don't have to explicitly set it; apologies for that. If we make any changes that affect how Location::Split behaves we'll likely leave the default splittype() as 'JOIN' as it's by far the most common join operator. chris From cjfields at uiuc.edu Fri Dec 8 15:03:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 14:03:16 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: Message-ID: <000001c71b03$e6741e90$15327e82@pyrimidine> > Yes, I think. Scott Cain pointed out that GFF column 8 is the > "phase", which I had never heard of before. My current, very > limited, understanding is that sometimes you'll have an exon > with, say, 31 bp, followed by an exon with 29 bp. When the > intron gets spliced out, you eventually get an mRNA of 60 bp, > which translates to a protein of 20 aa. > But the second exon has a phase of 1, not 0, because you > can't just start translating at the first bp of the second > exon and expect to get nice amino acids. I think the use of 'frame' here is meant relative to the DNA sequence (i.e. ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e. translation, three frames). At least I think that's what is meant! > By the way, whether or not phase is the same thing as frame, > when I call the frame() method on the features created by > Bio::Tools::GFF, I get the phase info. I assume that's a > feature (no pun intended), not a bug? > > I'm still confused as to why you would have a phase in the > first exon, though. Why not just say the CDS starts 1 or 2 bp > later? (This is probably a bio question, not a bioperl > question, but a quick Google didn't get me an answer. "Phase" > isn't a very good search term.) It could be b/c the location coordinates delineate the exon coding boundary. It's conceivable the first exon in a sequence record is not the first exon of the mRNA (i.e. there may be one or more exons prior to or past the exon of interest that are in 'remote' sequence records). Like this admittedly extreme example (GB acc AF130134): join(AF130124.1:2563..2964,AF130125.1:21..157,AF130126.1:12..174, AF130127.1:21..112,AF130128.1:21..162,AF130128.1:281..595, AF130128.1:661..842,AF130128.1:916..1030,AF130129.1:21..115, AF130130.1:21..165,AF130131.1:21..125,AF130132.1:21..428, AF130132.1:492..746,AF130133.1:21..168,AF130133.1:232..401, AF130133.1:475..906,AF130133.1:970..1107,AF130133.1:1176..1367,21..128) Also, the ends of the lcoation may be uncertain ('fuzzy'): join(complement(1009..>1260),complement(AF081827.1:<1..177)) > I guess the real question here, which Jason alludes to, is whether > SeqFeature->spliced_seq ought to take into account the phase > information > of the first exon. Right now, it doesn't, so when you call > SeqFeature->spliced_seq->translate, you get gibberish. Are there cases > where you would want spliced_seq to include the first bp or > two? Should there be an option to spliced_seq for whether you > want to take phase information into account? > > I can't submit a bug report until we confirm it's a bug. > > Thanks, > -Amir Karger You can already pass the frame or an offset to PrimarySeqI::translate(). Here are the args: Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 The offset comes from some GenBank seqfeatures which have an '\codon_start' tag indicating which nucleotide to start translation from (1,2,3). This is essentially just the phase+1. We could add a '-phase' argument for convenience which accepts 0,1,2. chris From bobfreemanma at speakeasy.net Fri Dec 8 15:47:15 2006 From: bobfreemanma at speakeasy.net (Bob Freeman) Date: Fri, 8 Dec 2006 15:47:15 -0500 Subject: [Bioperl-l] writing blastxml In-Reply-To: <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com> <000301c6f846$d6227760$15327e82@pyrimidine> <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> Message-ID: Can't seem to find a good post on this to answer my question: Does anyone know a good way to (re)write BLAST reports in XML format? I've got about 30,000 reports I need to rewrite for a (good!) piece of java software that will only import xml formatted BLAST reports. Right now, all mine are plain text. I don't think bioperl can do this yet, correct? If not, any suggestions, besides reblasting all 30,000? I'd like to save a few trees and lumps of coal. TIA, Bob -- ----------------------------------------------------- Bob Freeman, Ph.D. Bioinformatics consultant 51 Downer Avenue, #2 Dorchester, MA 02125 617/699.7057, vox If brains were taxed, he'd get a refund. -- Anonymous From camp_boot at hotmail.com Sun Dec 10 05:00:55 2006 From: camp_boot at hotmail.com (synapse) Date: Sun, 10 Dec 2006 10:00:55 +0000 (UTC) Subject: [Bioperl-l] Driver program for PestFind.pm Message-ID: Dear All, I apologize in advance for my almost total lack of knowledge of perl as a programming language. I need to use PestFind program, part of the biop_run package of bioperl. My understanding is that I will need a simple wrapper program that will read arguments from the command line, and pass them to that module. - Is there such program available that I can just use? - Does anyone know if pestfind can work on multiple sequence files (in fasta format), or does it only process single sequence files? Thanks a lot for the feedback. From cjfields at uiuc.edu Sun Dec 10 13:45:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 12:45:26 -0600 Subject: [Bioperl-l] writing blastxml In-Reply-To: References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com> <000301c6f846$d6227760$15327e82@pyrimidine> <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> Message-ID: <7FB4EBB9-BEDC-4250-BE2F-3F695D36F350@uiuc.edu> On Dec 8, 2006, at 2:47 PM, Bob Freeman wrote: > Can't seem to find a good post on this to answer my question: > > Does anyone know a good way to (re)write BLAST reports in XML format? > I've got about 30,000 reports I need to rewrite for a (good!) piece > of java software that will only import xml formatted BLAST reports. > Right now, all mine are plain text. > > I don't think bioperl can do this yet, correct? If not, any > suggestions, besides reblasting all 30,000? I'd like to save a few > trees and lumps of coal. > > TIA, > Bob The only BioPerl writers for BLAST reports are in BSML and HTML, not BLAST XML. I don't think there there have been any requests for it, and no one has really stepped forward to submit one. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Dec 10 13:55:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 12:55:16 -0600 Subject: [Bioperl-l] Driver program for PestFind.pm In-Reply-To: References: Message-ID: <32B0F15D-4144-43B6-AA81-5ED9BA848F45@uiuc.edu> On Dec 10, 2006, at 4:00 AM, synapse wrote: > Dear All, > > I apologize in advance for my almost total lack of knowledge of > perl as a > programming language. > > I need to use PestFind program, part of the biop_run package of > bioperl. My > understanding is that I will need a simple wrapper program that > will read > arguments from the command line, and pass them to that module. PestFind is part of the EMBOSS suite of programs: http://emboss.sourceforge.net/ The PestFind module in bioperl-run is actually used via Pise. > - Is there such program available that I can just use? See above > - Does anyone know if pestfind can work on multiple sequence > files (in fasta > format), or does it only process single sequence files? > > Thanks a lot for the feedback. No idea there, but the EMBOSS docs should tell you. chris From cjfields at uiuc.edu Mon Dec 11 00:38:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 23:38:32 -0600 Subject: [Bioperl-l] bioperl-run parameter question Message-ID: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> I am writing up a few bioperl-run modules and have a simple question, though I don't know if anyone knows the answer. I was curious as to why parameters for most (all?) bioperl-run modules lack the '-' preceding them. This came up re: StandAloneBlast last week (something Torsten fixed), but I noticed just about every bioperl-run module uses the dashless parameters. chris From n.haigh at sheffield.ac.uk Mon Dec 11 01:44:25 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Mon, 11 Dec 2006 06:44:25 +0000 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> Message-ID: <457CFE49.5010201@sheffield.ac.uk> Chris Fields wrote: > I am writing up a few bioperl-run modules and have a simple question, > though I don't know if anyone knows the answer. I was curious as to > why parameters for most (all?) bioperl-run modules lack the '-' > preceding them. This came up re: StandAloneBlast last week > (something Torsten fixed), but I noticed just about every bioperl-run > module uses the dashless parameters. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > No idea! Is there any reason for/against using dashed/dashless parameters? I suppose dshed parameters allow you to easy see which tokens on the command line are parameters and which are values. Should modules be able to accept both? Should dashed be preferred? Nath From cjfields at uiuc.edu Mon Dec 11 08:06:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 07:06:32 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457CFE49.5010201@sheffield.ac.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457CFE49.5010201@sheffield.ac.uk> Message-ID: On Dec 11, 2006, at 12:44 AM, Nathan S. Haigh wrote: > Chris Fields wrote: >> I am writing up a few bioperl-run modules and have a simple question, >> though I don't know if anyone knows the answer. I was curious as to >> why parameters for most (all?) bioperl-run modules lack the '-' >> preceding them. This came up re: StandAloneBlast last week >> (something Torsten fixed), but I noticed just about every bioperl-run >> module uses the dashless parameters. >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > No idea! > > Is there any reason for/against using dashed/dashless parameters? I > suppose dshed parameters allow you to easy see which tokens on the > command line are parameters and which are values. Should modules be > able > to accept both? Should dashed be preferred? > > Nath > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l I'm thinking about it from the point of consistency. When using a mix of core and run modules it can be a bit confusing, particularly when (as pointed out in the previous thread on StandAloneBlast) you can use only dashed parameters with core modules, while most (all?) run modules only accept dashless ones (in most cases some exception is thrown). Torsten fixed this in StandAloneBlast so it accepts both, but shouldn't this rule also apply to all run modules? Much of this probably is probably due to the donated nature of much of the bioperl-run code and Jason's 'cat-herding', and I understand that it would be a lot of work to change this for all run modules. However, we could at least try to start enforcing some loose rules with new bioperl-run wrappers (e.g. implement WrapperBase, use core- like parameters, etc). chris From akarger at CGR.Harvard.edu Mon Dec 11 11:20:03 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 11 Dec 2006 11:20:03 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq Message-ID: Chris Fields wrote: > > > Yes, I think. Scott Cain pointed out that GFF column 8 is the > > "phase", which I had never heard of before. My current, very > > limited, understanding is that sometimes you'll have an exon > > with, say, 31 bp, followed by an exon with 29 bp. When the > > intron gets spliced out, you eventually get an mRNA of 60 bp, > > which translates to a protein of 20 aa. > > But the second exon has a phase of 1, not 0, because you > > can't just start translating at the first bp of the second > > exon and expect to get nice amino acids. > > I think the use of 'frame' here is meant relative to the DNA > sequence (i.e. > ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e. > translation, three frames). At least I think that's what is meant! I agree. By the way, I'd love a reference to a simple bio-explanation of what's happening here. Google searches for "coding sequence phase" are not all that relevant. > > I'm still confused as to why you would have a phase in the > > first exon, though. Why not just say the CDS starts 1 or 2 bp > > later? (This is probably a bio question, not a bioperl > > question, but a quick Google didn't get me an answer. "Phase" > > isn't a very good search term.) > > It could be b/c the location coordinates delineate the exon > coding boundary. > It's conceivable the first exon in a sequence record is not > the first exon > of the mRNA (i.e. there may be one or more exons prior to or > past the exon > of interest that are in 'remote' sequence records). That's certainly not the case here, because the files have the entire genomes in them. > Also, the ends of the lcoation may be uncertain ('fuzzy'): > > join(complement(1009..>1260),complement(AF081827.1:<1..177)) Also not the case here. These locations aren't listed as fuzzy. Any other thoughts? > > I guess the real question here, which Jason alludes to, is whether > > SeqFeature->spliced_seq ought to take into account the phase > > information > > of the first exon. Right now, it doesn't, so when you call > > SeqFeature->spliced_seq->translate, you get gibberish. Are > there cases > > where you would want spliced_seq to include the first bp or > > two? Should there be an option to spliced_seq for whether you > > want to take phase information into account? > > You can already pass the frame or an offset to > PrimarySeqI::translate(). > We could add a '-phase' argument for > convenience which accepts 0,1,2. But as Jason pointed out, you should find the problem earlier. What if I want to get the RNA sequence that will become the protein? then having a phase arg to translate() doesn't help. Should there be a phase arg to spliced_seq? Which raises another bio question: at what point are the first 1 or 2 bp dropped when you have a phase of 1 or 2? Do they appear in the mRNA? -Amir Karger From bix at sendu.me.uk Mon Dec 11 13:21:42 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 11 Dec 2006 13:21:42 -0500 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> Message-ID: <457DA1B6.1060706@sendu.me.uk> Chris Fields wrote: > I am writing up a few bioperl-run modules and have a simple question, > though I don't know if anyone knows the answer. I was curious as to > why parameters for most (all?) bioperl-run modules lack the '-' > preceding them. This came up re: StandAloneBlast last week > (something Torsten fixed), but I noticed just about every bioperl-run > module uses the dashless parameters. I didn't follow that particular thread, but from my experience there is a useful distinction between bioperl options using the - as normal for full consistency with core (eg. -verbose), whilst the options that belong to the program the run module is a wrapper for do not take dashes. Again, this seems consistent within the run package. I'd suggest sticking to the current pattern. Cheers, Sendu. From cjfields at uiuc.edu Mon Dec 11 15:07:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 14:07:16 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457DA1B6.1060706@sendu.me.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> Message-ID: On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote: > Chris Fields wrote: >> I am writing up a few bioperl-run modules and have a simple >> question, though I don't know if anyone knows the answer. I was >> curious as to why parameters for most (all?) bioperl-run modules >> lack the '-' preceding them. This came up re: StandAloneBlast >> last week (something Torsten fixed), but I noticed just about >> every bioperl-run module uses the dashless parameters. > > I didn't follow that particular thread, but from my experience > there is a useful distinction between bioperl options using the - > as normal for full consistency with core (eg. -verbose), whilst the > options that belong to the program the run module is a wrapper for > do not take dashes. Again, this seems consistent within the run > package. I respectfully disagree that this is a 'useful' distinction. My main point is consistency. To me, it's counterintuitive to have two Bioperl classes, both which inherit Bio::Root::Root, use two different syntaxes for any parameters passed to the constructor, even if some are 'program' parameters. It's also not consistent with StandAloneBlast or RemoteBlast, both which are considered bioperl-run modules even though they are in core, and both or which use dashed parameters (StandAloneBlast actually allows both). In fact, it isn't consistent within bioperl-run itself. Bio::Tools::Run::EMBOSSApplication uses dashes for parameters in a hashref! Okay, judging by the previous examples, 'consistency' isn't a word I would use to describe bioperl-run as a whole (back to Jason's 'cat- herding' analogy). It would be easier to let it slide for now, especially since changing them would be a serious pain, not to mention an API issue. But shouldn't there be some consistency? And what about new modules? Do we follow the historical (possibly confusing) 'dashless' route, or use the core-like dashed approach (thus breaking from the other run modules)? > I'd suggest sticking to the current pattern. > > > Cheers, > Sendu. I'll allow for both, ala StandAloneBlast. Doesn't hurt to be safe. ; > Have fun at the hackathon! chris From bix at sendu.me.uk Mon Dec 11 16:19:55 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 11 Dec 2006 16:19:55 -0500 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> Message-ID: <457DCB7B.8050500@sendu.me.uk> Chris Fields wrote: > > On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> I am writing up a few bioperl-run modules and have a simple >>> question, though I don't know if anyone knows the answer. I was >>> curious as to why parameters for most (all?) bioperl-run modules >>> lack the '-' preceding them. This came up re: StandAloneBlast last >>> week (something Torsten fixed), but I noticed just about every >>> bioperl-run module uses the dashless parameters. >> >> I didn't follow that particular thread, but from my experience there >> is a useful distinction between bioperl options using the - as normal >> for full consistency with core (eg. -verbose), whilst the options that >> belong to the program the run module is a wrapper for do not take >> dashes. Again, this seems consistent within the run package. > > I respectfully disagree that this is a 'useful' distinction. My main > point is consistency. [snip] We're on the same page in terms of what we think would be a Good Thing, and allowing both ways (dashed and dashless) sounds reasonable. I was just suggesting why bioperl-run might be the way it was. Further to that, there is the practical aspect that it is a lot simpler to figure out which are the program options so they can be farmed out to the AUTOLOAD methods - again something that isn't done in core. If you come up with some generic way of dealing with options and farming to AUTOLOAD, perhaps there's scope for applying it to all the run wrappers (ideally via one of their base classes), so they all instantly gain dashed-mode capability. From cjfields at uiuc.edu Mon Dec 11 17:05:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 16:05:56 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457DCB7B.8050500@sendu.me.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> <457DCB7B.8050500@sendu.me.uk> Message-ID: On Dec 11, 2006, at 3:19 PM, Sendu Bala wrote: ... >> >> I respectfully disagree that this is a 'useful' distinction. My main >> point is consistency. > [snip] > > We're on the same page in terms of what we think would be a Good > Thing, > and allowing both ways (dashed and dashless) sounds reasonable. I was > just suggesting why bioperl-run might be the way it was. Further to > that, there is the practical aspect that it is a lot simpler to figure > out which are the program options so they can be farmed out to the > AUTOLOAD methods - again something that isn't done in core. Maybe b/c AUTOLOAD is frowned upon for a number of reasons, mainly code maintenance. I'm somewhat neutral on the idea of using AUTOLOAD as a short-term solution, though using heredoc and an eval{} block works well for me (and shows up when using $self->can('method') or when checking for methods via Class::Inspector). > If you come up with some generic way of dealing with options and > farming > to AUTOLOAD, perhaps there's scope for applying it to all the run > wrappers (ideally via one of their base classes), so they all > instantly > gain dashed-mode capability. I think that's the crux of the problem; they do not all have the same base class (except Bio::Root::Root). Most use WrapperBase. I thought at one point a Run-specific root module would be a good idea, but WrapperBase already works well. I'll go ahead with my modules and think about it some more. You could ask the powers-that-be (jason, hilmar, etc) what they think as well. chris From bosborne11 at verizon.net Mon Dec 11 17:24:54 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 11 Dec 2006 17:24:54 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: Message-ID: Amir, Google "intron phase", you will see a number of useful links. Brian O. On 12/11/06 11:20 AM, "Amir Karger" wrote: > I agree. By the way, I'd love a reference to a simple bio-explanation of > what's happening here. Google searches for "coding sequence phase" are > not all that relevant. From cjfields at uiuc.edu Mon Dec 11 22:20:06 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 21:20:06 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: On Dec 11, 2006, at 10:20 AM, Amir Karger wrote: >> I think the use of 'frame' here is meant relative to the DNA >> sequence (i.e. >> ORF searching, 6 frames) and the 'phase' is relative to the mRNA >> (i.e. >> translation, three frames). At least I think that's what is meant! > > I agree. By the way, I'd love a reference to a simple bio- > explanation of > what's happening here. Google searches for "coding sequence phase" are > not all that relevant. Ah, Brian found some links I see... >> It could be b/c the location coordinates delineate the exon >> coding boundary. >> It's conceivable the first exon in a sequence record is not >> the first exon >> of the mRNA (i.e. there may be one or more exons prior to or >> past the exon >> of interest that are in 'remote' sequence records). > > That's certainly not the case here, because the files have the entire > genomes in them. > >> Also, the ends of the lcoation may be uncertain ('fuzzy'): >> >> join(complement(1009..>1260),complement(AF081827.1:<1..177)) > > Also not the case here. These locations aren't listed as fuzzy. > > Any other thoughts? Which GFF files did you use? More specifically, which genes in which GFF file? I saw a reference to S. bayanus, but it's hard to work out what could be the problem unless we know a bit more. >>> I guess the real question here, which Jason alludes to, is whether >>> SeqFeature->spliced_seq ought to take into account the phase >>> information >>> of the first exon. Right now, it doesn't, so when you call >>> SeqFeature->spliced_seq->translate, you get gibberish. Are >> there cases >>> where you would want spliced_seq to include the first bp or >>> two? Should there be an option to spliced_seq for whether you >>> want to take phase information into account? >> >> You can already pass the frame or an offset to >> PrimarySeqI::translate(). >> We could add a '-phase' argument for >> convenience which accepts 0,1,2. > > But as Jason pointed out, you should find the problem earlier. What > if I > want to get the RNA sequence that will become the protein? then > having a > phase arg to translate() doesn't help. Should there be a phase arg to > spliced_seq? You'll also note Jason mentioned there were possible errors in the gene prediction programs which produced the output spliced_seq() is supposed to return the DNA sequence of a split location by splicing together the sublocation sequences in their 'join' order. So, if the first exon was out of phase, once spliced they should all be out of phase to the same degree, assuming all exons are joined together correctly. Translating this using the phase should produce the correct amino acid sequence. Note that Jason suggested passing the frame/phase of the first exon to translate(), not spliced_seq(). I also suggested translate(). > Which raises another bio question: at what point are the first 1 or > 2 bp > dropped when you have a phase of 1 or 2? Do they appear in the mRNA? > > -Amir Karger Any sequence present in the sublocations (exons) would be in the spliced sequence. This would have to include those nucleotides in exons skipped b/c of the phase since they are part of the coding region. chris From neetisomaiya at gmail.com Tue Dec 12 07:06:20 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:36:20 +0530 Subject: [Bioperl-l] need help in phredPhrap Message-ID: <764978cf0612120406m796b116dncd3a9e6c82ffe682@mail.gmail.com> Hi, I am running phredPharp, which runs phred, phrap and polyphred. Please refer to the "Using a reference sequence" section of this link http://droog.mbt.washington.edu/poly_doc50.html#REFER. I am using the reference sequence as described in the link above. With this I am getting the SNP positions on the contig sequence as well as on the reference sequence. Does anyone know if there is some output file which can also give me mapping between contig sequence and reference sequence? -- -Neeti Even my blood says, B positive From akarger at CGR.Harvard.edu Tue Dec 12 11:05:43 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 12 Dec 2006 11:05:43 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq Message-ID: (sorry if this thread is boring people) Chris Fields wrote: > > I agree. By the way, I'd love a reference to a simple bio- > > explanation of > > what's happening here. Google searches for "coding sequence > phase" are > > not all that relevant. > > Ah, Brian found some links I see... Thanks, Brian! Amazing how "coding sequence phase" finds nothing but "intron phase" finds a ton. This is why you need to actually learn biology, rather than Googling it. > Which GFF files did you use? More specifically, which genes > in which > GFF file? I saw a reference to S. bayanus, but it's hard to > work out > what could be the problem unless we know a bit more. http://fungal.genome.duke.edu/annotations/sbay/gff/saccharomyces_bayanus .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!) c127 (for example) has two lines in that file: sbay_c127 AUGUSTUS mRNA 263 723 . + . ID=sbay_c127-g1.1 sbay_c127 AUGUSTUS CDS 263 723 . + 1 Parent=sbay_c127-g1.1 Now go to gbrowse page: http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/ Type "sbay_c127:250-300" in the search box. As you can see from the translation track, if you start at bp 263, you hit a stop codon after just a few aas. But if you use frame2/phase 1, you get no stop codons all the way to the end of the contig. > >> You can already pass the frame or an offset to > >> PrimarySeqI::translate(). > >> We could add a '-phase' argument for > >> convenience which accepts 0,1,2. > > > > What if I > > want to get the RNA sequence that will become the protein? then > > having a > > phase arg to translate() doesn't help. Should there be a > phase arg to > > spliced_seq? > > You'll also note Jason mentioned there were possible errors in the > gene prediction programs which produced the output That's certainly possible. No gene prediction program will be perfect. In this case, though, it's clear that it found a large region without stop codons in it, and correctly identified the place to start translating. I guess I'm just surprised that, if it found just one exon in a gene (in the whole contig) why it would say the exon starts at 263 with a phase 1, instead of just saying it starts at 264. > spliced_seq() is supposed to return the DNA sequence of a split > location by splicing together the sublocation sequences in their > 'join' order. So, if the first exon was out of phase, once spliced > they should all be out of phase to the same degree, assuming all > exons are joined together correctly. Translating this using the > phase should produce the correct amino acid sequence. > > Note that Jason suggested passing the frame/phase of the first exon > to translate(), not spliced_seq(). I also suggested translate(). You're right. This brings the number of translated polypeptide sequences that have lots of *s in them to 9 instead of 90. I guess I have two requests here. The first is, if a person wants to see exactly which bps are translated to aas -- a nucelotide sequece of exactly 3N bp starting (usually) with ATG -- then they might want an argument to spliced_seq that skips the first one or two bp when necessary. After all, they might want to study the DNA, not the peptides. The second request is for "intelligent objects". If my SeqFeatures know that they're in phase 1, then when I call spliced_seq I want the resulting objects to know that they're phase one, such that when I call translate, Bioperl automatically skips the first bp or two. Admittedly, there might be big ramifications to this. Both requests of course made in the knowledge that Bioperl is open source & developers have a lot to do with their time. -Amir Karger > > Which raises another bio question: at what point are the > first 1 or > > 2 bp > > dropped when you have a phase of 1 or 2? Do they appear in the mRNA? > > > > -Amir Karger > > Any sequence present in the sublocations (exons) would be in the > spliced sequence. This would have to include those nucleotides in > exons skipped b/c of the phase since they are part of the > coding region. > > chris > From neetisomaiya at gmail.com Tue Dec 12 07:14:10 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:44:10 +0530 Subject: [Bioperl-l] needle parser in bioperl? Message-ID: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Hi, Does anyone know of a bioperl parser for needle output, basically I won't where the target sequence aligns on the template (i.e. coordinate on the template where the taget aligns). -- -Neeti Even my blood says, B positive From cjfields at uiuc.edu Tue Dec 12 11:57:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 10:57:27 -0600 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > Hi, > > Does anyone know of a bioperl parser for needle output, basically I > won't > where the target sequence aligns on the template (i.e. coordinate > on the > template where the taget aligns). > > -- > -Neeti > Even my blood says, B positive I answered this a number of months back: http://tinyurl.com/yzlbx5 Basically, newer versions of EMBOSS have changed the output for the AlignIO::emboss parser (which parses needle). I don't believe the parser has been fixed to deal with that, but Jason has pointed out you can use MSF output when running needle, then parse using AlignIO with the format set to 'msf'. chris From bosborne11 at verizon.net Tue Dec 12 11:51:05 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 12 Dec 2006 11:51:05 -0500 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: Neeti, EMBOSS' needle and water produce alignments in what Bioperl calls 'emboss' format, so you can use AlignIO to get SimpleAlign objects. The best description of how to use SimpleAlign is the documentation in the module. Brian O. On 12/12/06 7:14 AM, "neeti somaiya" wrote: > Hi, > > Does anyone know of a bioperl parser for needle output, basically I won't > where the target sequence aligns on the template (i.e. coordinate on the > template where the taget aligns). From kaboroev at sfu.ca Tue Dec 12 12:14:39 2006 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Tue, 12 Dec 2006 09:14:39 -0800 Subject: [Bioperl-l] BLAST reports Message-ID: <457EE37F.4020000@sfu.ca> Hi everyone, I would like to manipulate my blast results with bioperl but would also like to have the html output of the blast. What would be the best way of going about this, as I don't see any write functions in any of the blast modules I have looked at. Would it be better to create my own html layout from the blast data then attempt to recover this from bioperl? keith p.s. - does anyone know what the most informative blast "alignment view" output is? xml i suppose? -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 From cjfields at uiuc.edu Tue Dec 12 13:45:05 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 12:45:05 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: On Dec 12, 2006, at 10:05 AM, Amir Karger wrote: ... > http://fungal.genome.duke.edu/annotations/sbay/gff/ > saccharomyces_bayanus > .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!) > > c127 (for example) has two lines in that file: > sbay_c127 AUGUSTUS mRNA 263 723 . + > . ID=sbay_c127-g1.1 > sbay_c127 AUGUSTUS CDS 263 723 . + > 1 Parent=sbay_c127-g1.1 > > Now go to gbrowse page: > http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/ > Type "sbay_c127:250-300" in the search box. > > As you can see from the translation track, if you start at bp 263, you > hit a stop codon after just a few aas. But if you use frame2/phase 1, > you get no stop codons all the way to the end of the contig. Yes, but there are two things. First, there is no distinct start codon. Second, this is what the top NCBI BLASTX hit for that particular exon is: >gi|6323195|ref|NP_013267.1| Gene info Essential 100kDa subunit of the exocyst complex (Sec3p, Sec5p, Sec6p, Sec8p, Sec10p, Sec15p, Exo70p, and Exo84p), which has the essential function of mediating polarized targeting of secretory vesicles to active sites of exocytosis; Sec10p [Saccharomyces cerevisiae] gi|2498891|sp|Q06245|SEC10_YEAST Gene info Exocyst complex component SEC10 gi|1234854|gb|AAB67490.1| Gene info L9362.12 gene product gi|1781307|emb|CAA70041.1| Gene info 100 kD exocyst complex component [Saccharomyces cerevisiae] Length=871 Score = 285 bits (728), Expect = 7e-77 Identities = 141/152 (92%), Positives = 149/152 (98%), Gaps = 0/152 (0%) Frame = +2 Query 2 FNDFYSMGKSDIVEQLRLSKNWKFNLKSVILMKNLLILSSKLETNSIPKTINTKLIIEKY 181 +NDFYSMGKSDIVEQLRLSKNWK NLKSV LMKNLLILSSKLET+SIPKTINTKL +IEKY Sbjct 168 YNDFYSMGKSDIVEQLRLSKNWKLNLKSVKLMKNLLILSSKLETSSIPKTINTKLVIEKY 227 Query 182 SEMMENKLLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE 361 SEMMEN +LLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE Sbjct 228 SEMMENELLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE 287 Query 362 NEFENVFIKNVKFKERLVDFESHSVIVEASMQ 457 NEFENVFIKNVKFKE+L+DFE+HSVI+E SMQ Sbjct 288 NEFENVFIKNVKFKEQLIDFENHSVIIETSMQ 319 Note the query start is well into the predicted coding sequence. Both the lack of a start codon and the above BLASTX hit suggest this is not actually the first exon in the coding region. Therefore the sequence retrieved from spliced_seq() is only part of the full coding region (it seems to lack at least one 3' exon as well). >>>> You can already pass the frame or an offset to >>>> PrimarySeqI::translate(). >>>> We could add a '-phase' argument for >>>> convenience which accepts 0,1,2. >>> >>> What if I >>> want to get the RNA sequence that will become the protein? then >>> having a >>> phase arg to translate() doesn't help. Should there be a >> phase arg to >>> spliced_seq? >> >> You'll also note Jason mentioned there were possible errors in the >> gene prediction programs which produced the output > > That's certainly possible. No gene prediction program will be perfect. > In this case, though, it's clear that it found a large region without > stop codons in it, and correctly identified the place to start > translating. I guess I'm just surprised that, if it found just one > exon > in a gene (in the whole contig) why it would say the exon starts at > 263 > with a phase 1, instead of just saying it starts at 264. Maybe the gene prediction didn't find the first exon, or didn't tie the predicted exons together. Not unusual considering the number of predictions made. >> spliced_seq() is supposed to return the DNA sequence of a split >> location by splicing together the sublocation sequences in their >> 'join' order. So, if the first exon was out of phase, once spliced >> they should all be out of phase to the same degree, assuming all >> exons are joined together correctly. Translating this using the >> phase should produce the correct amino acid sequence. >> >> Note that Jason suggested passing the frame/phase of the first exon >> to translate(), not spliced_seq(). I also suggested translate(). > > You're right. This brings the number of translated polypeptide > sequences > that have lots of *s in them to 9 instead of 90. > > I guess I have two requests here. The first is, if a person wants > to see > exactly which bps are translated to aas -- a nucelotide sequece of > exactly 3N bp starting (usually) with ATG -- then they might want an > argument to spliced_seq that skips the first one or two bp when > necessary. After all, they might want to study the DNA, not the > peptides. > > The second request is for "intelligent objects". If my SeqFeatures > know > that they're in phase 1, then when I call spliced_seq I want the > resulting objects to know that they're phase one, such that when I > call > translate, Bioperl automatically skips the first bp or two. > Admittedly, > there might be big ramifications to this. > > Both requests of course made in the knowledge that Bioperl is open > source & developers have a lot to do with their time. > > -Amir Karger You may want to post these as enhancement requests to Bugzilla just so we can keep track. I think passing a phase parameter to spliced_seq() can be easily accomplished; it's just a matter of returning a subseq of the spliced sequence based on the phase if set. In fact, I am testing it out now. The second may be more problematic, since there may be a time when one would want those extra nucleotides, so I don't think we would want removal of said nucleotides to be the default behavior. Chris From dmessina at wustl.edu Tue Dec 12 13:44:29 2006 From: dmessina at wustl.edu (David Messina) Date: Tue, 12 Dec 2006 12:44:29 -0600 Subject: [Bioperl-l] BLAST reports In-Reply-To: <457EE37F.4020000@sfu.ca> References: <457EE37F.4020000@sfu.ca> Message-ID: <083B4D17-CC7A-406C-9037-4DA5DC31AA05@wustl.edu> Hi Keith, Take a look at: http://www.bioperl.org/wiki/HOWTO:SearchIO You can read in a whole bunch of different blast formats (see Table 1), and it is possible to write out in HTML. See: http://www.bioperl.org/wiki/HOWTO:SearchIO#Writing_and_formatting_output I'm not sure what you mean by the most informative blast output. If you mean which one gives the most information, I'm pretty sure the standard Blast report has everything. Dave From neetisomaiya at gmail.com Tue Dec 12 07:09:39 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:39:39 +0530 Subject: [Bioperl-l] problem in running needle Message-ID: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> I am trying to run needle for the attached two sequence files, on a linux machine. It says "Uncaught exception: Assertion failed, raised at ajmem.c :187". Can anyone tell me what this could be coz of? -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: SEQ_1.REF Type: application/octet-stream Size: 44208 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061212/2f733c0d/attachment-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: seq_of_contig11 Type: application/octet-stream Size: 44344 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061212/2f733c0d/attachment-0003.obj From cjfields at uiuc.edu Tue Dec 12 15:55:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 14:55:07 -0600 Subject: [Bioperl-l] problem in running needle In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> Message-ID: On Dec 12, 2006, at 6:09 AM, neeti somaiya wrote: > I am trying to run needle for the attached two sequence files, on a > linux > machine. It says "Uncaught exception: Assertion failed, raised at > ajmem.c > :187". > Can anyone tell me what this could be coz of? > > -- > -Neeti > Even my blood says, B positive > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l This would be an EMBOSS error, not a BioPerl error. Maybe the emboss list is the best place for this question? http://emboss.open-bio.org/mailman/listinfo/emboss Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Dec 12 16:30:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 15:30:30 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: <093AE0FF-3C88-4F97-B33F-836B295E3DE3@uiuc.edu> On Dec 12, 2006, at 10:05 AM, Amir Karger wrote: >> Note that Jason suggested passing the frame/phase of the first exon >> to translate(), not spliced_seq(). I also suggested translate(). > > You're right. This brings the number of translated polypeptide > sequences > that have lots of *s in them to 9 instead of 90. > > I guess I have two requests here. The first is, if a person wants > to see > exactly which bps are translated to aas -- a nucelotide sequece of > exactly 3N bp starting (usually) with ATG -- then they might want an > argument to spliced_seq that skips the first one or two bp when > necessary. After all, they might want to study the DNA, not the > peptides. > > The second request is for "intelligent objects". If my SeqFeatures > know > that they're in phase 1, then when I call spliced_seq I want the > resulting objects to know that they're phase one, such that when I > call > translate, Bioperl automatically skips the first bp or two. > Admittedly, > there might be big ramifications to this. > > Both requests of course made in the knowledge that Bioperl is open > source & developers have a lot to do with their time. > > -Amir Karger ... Amir, I committed some code to CVS where I added a -phase parameter option to SeqFeatureI::spliced_seq(). I also added some tests to SeqFeature.t. If you run the following after creating the SeqFeature object $sf (the seq object is $seq): $sf->attach_seq($seq); for my $phase (-1..3) { my $spliced = $sf->spliced_seq(-phase => $phase); print $spliced->seq,"\n"; print $spliced->translate->seq,"\n"; } You should get warnings for any other value than 0, 1, or 2. I'll also note that the sequence you are having trouble with (sbay_c127) is 712 bp, so it doesn't contain the complete coding region. I used it in the test case in SeqFeature.t. Chris From boris.steipe at utoronto.ca Tue Dec 12 16:26:14 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Tue, 12 Dec 2006 16:26:14 -0500 Subject: [Bioperl-l] problem in running needle In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> Message-ID: Looks like a memory allocation problem. Your whole sequence is in one single line, throwing a few linebreaks in there every 80th character or so will probably do the trick. HTH Boris On 12-Dec-06, at 7:09 AM, neeti somaiya wrote: > I am trying to run needle for the attached two sequence files, on a > linux > machine. It says "Uncaught exception: Assertion failed, raised at > ajmem.c > :187". > Can anyone tell me what this could be coz of? > > -- > -Neeti > Even my blood says, B positive > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Derek.Fairley at bll.n-i.nhs.uk Wed Dec 13 05:00:16 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Wed, 13 Dec 2006 10:00:16 -0000 Subject: [Bioperl-l] BLAST reports In-Reply-To: <457EE37F.4020000@sfu.ca> Message-ID: Hi Keith, >I would like to manipulate my blast results with bioperl but would also >like to have the html output of the blast. What would be the best way >of going about this, as I don't see any write functions in any of the >blast modules I have looked at. Would it be better to create my own >html layout from the blast data then attempt to recover this from bioperl? Take a look at some of the example scripts here: http://www.bioperl.org/wiki/Bioperl_scripts Depending on your Bioperl installation, you may already have these in your /scripts directory or similar. The /examples/searchio/htmlwriter.pl script may be a good starting point. >p.s. - does anyone know what the most informative blast "alignment view" >output is? xml i suppose? Assuming you want to get the HSPs, parsing blastxml reports seems to be the most reliable approach. Again, there's a useful script for this: take a look at /scripts/utilities/search2alnblocks.pls. Derek. -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Dec 13 13:02:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 13 Dec 2006 12:02:14 -0600 Subject: [Bioperl-l] Proposal for Meta data Message-ID: I am working on a few RNA-related things related to structure and have a few questions, specifically about Meta data. This is sort of a proposal, but I would like to get everybody's thoughts about this to gauge what everyone thinks. Jason, sorry to bug you but I thought it might be something that would be of use phylohackathon-wise. Heikki has several modules present which adds meta data to sequences (Bio::Seq::Meta). In this case, the meta data is stored as a string (Bio::Seq::Meta) or an array (Bio::Seq::Meta::Array). In both cases you can have multiple types of meta data for a sequence based on a particular tag. However, this also assumes that the meta data is somehow attached strictly to sequence data of some type. It also doesn't allow for having mixed meta data types for a single sequence, such as attaching array data and string data to the same sequence. Hence, I was thinking of a having a simple, generic meta data type (Bio::Meta), one which could encompass simple strings (Bio::Meta::Simple), arrays (Bio::Meta::Array), or any other structured type of data. This could be used to annotate any PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, maybe in a collection (similar to AnnotationCollection). I thought something like this may be of general use for any PrimarySeq (quality, structure), alignments like NEXUS and Stockholm, SeqFeatures where structure could be stored (tRNA or riboswitches), etc. However, this also seems to fall into the category of sequence annotation. So, would it be better to have a set of Bio::Annotation classes used for this purpose? Flames and jibes welcome; I'm wearing my asbestos suit today.... chris From stewarta at nmrc.navy.mil Wed Dec 13 20:06:14 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Wed, 13 Dec 2006 20:06:14 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects Message-ID: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> I am trying to StandAloneBlast->blastall an array or Bio::Seq objects. The documentation claims that blastall can be passed a file name, a Bio::Seq object, or an array of Bio::Seq objects, while the usage suggests that a reference to an array of Bio::Seq objects is what must be passed to blastall. (from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ Bio/Tools/Run/StandAloneBlast.html#POD5) Usage: $seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects $blast_report = $factory->blastall(\@seq_array); Should this be... $report = $factory->blastall(@seq_array); or $report = $factory->blastall(\@seq_array); ??? And if you are blastall'ing an array of Seq objects, then does blastall just return one big blast report or should I be expecting an array of blast reports? I've tried $report = $factory->blastall(@seq_array); which seems to work ok, except that when I process the results, there are only results for the first Seq object in the array. -Andrew -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From arareko at campus.iztacala.unam.mx Wed Dec 13 20:37:27 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 13 Dec 2006 19:37:27 -0600 Subject: [Bioperl-l] BioPerl page in Wikipedia Message-ID: <4580AAD7.3000900@campus.iztacala.unam.mx> Folks, I've updated a little bit of the BioPerl page in the Wikipedia. I think it would be nice if we expand the article a little bit more since it's tagged as a "stub". Here's the link: http://en.wikipedia.org/wiki/BioPerl Cheers, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From lubapardo at gmail.com Thu Dec 14 05:54:07 2006 From: lubapardo at gmail.com (Luba Pardo) Date: Thu, 14 Dec 2006 11:54:07 +0100 Subject: [Bioperl-l] (no subject) Message-ID: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> Hello, I am new bioperl and I have been trying to run the examples available in bptutorial.pl and other basic literature. I have installed the latest release of bioperl 1.5.2 in a usr/local/src directory. Any time I try to retrieve the SwissProt and EMBL databases it gives me an error. With genbank it seems to be fine. I wonder if the installation was not successful, as I would expect that these databases accesses were included in the modules of BioPerl Core. In addition, I would like to ask whether to run Clustaw within the setting of BioPerl I need to download and install it in the same directory in which I have installed bioperl, or is it included in the module of Bio::Align. I am not sure whether this is the best place to ask these very basic questions. If not, could anyone please refer me to the proper e mail account? Thank you very much in advance. Luba Pardo MD, PhD From bix at sendu.me.uk Thu Dec 14 09:10:43 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 09:10:43 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> Message-ID: <45815B63.1020003@sendu.me.uk> Andrew Stewart wrote: > I am trying to StandAloneBlast->blastall an array or Bio::Seq > objects. The documentation claims that blastall can be passed a file > name, You're referring to 'In addition, sequence input may be in the form of either a Bio::Seq object or or an array of Bio::Seq objects'? I agree its not clear, but supplying a reference to an array is still supplying an array. Anyway, I'll clarify it. In any case, the usage for the method is what you should pay attention to: > Usage: > $seq_array_ref = \@seq_array; # where @seq_array is an array of > Bio::Seq objects > $blast_report = $factory->blastall(\@seq_array); > > Should this be... > $report = $factory->blastall(@seq_array); > or > $report = $factory->blastall(\@seq_array); > ??? It should be exactly what it says. A reference to the array. > And if you are blastall'ing an array of Seq objects, then does > blastall just return one big blast report or should I be expecting an > array of blast reports? Returns : Reference to a Blast object or BPlite object containing the blast report. That means, just one big object, not an array. From bix at sendu.me.uk Thu Dec 14 09:42:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 09:42:18 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> References: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> Message-ID: <458162CA.5030803@sendu.me.uk> Luba Pardo wrote: > Hello, I am new bioperl and I have been trying to run the examples > available in bptutorial.pl and other basic literature. I have > installed the latest release of bioperl 1.5.2 in a usr/local/src > directory. Any time I try to retrieve the SwissProt and EMBL > databases it gives me an error. What exactly are you trying? Paste some relevant code along with the exact error message you get when running that code. > I wonder if the installation was not successful, as I would expect > that these databases accesses were included in the modules of BioPerl > Core. They should work with just core installed. In addition, I would like to ask whether to run Clustaw within > the setting of BioPerl I need to download and install it in the same > directory in which I have installed bioperl, or is it included in the > module of Bio::Align. The ClustalW module is in the bioperl-run package, so install that in the same way you installed bioperl (core). The actual ClustalW program you need to download and install according to its own instructions. You let Bioperl know about where you installed ClustalW by eg. setting an environment variable. See http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/Alignment/Clustalw.html#DESCRIPTION for details. > I am not sure whether this is the best place to ask these very basic > questions. If not, could anyone please refer me to the proper e mail > account? Its certainly the correct place, I hope we can resolve your problems. From neetisomaiya at gmail.com Thu Dec 14 03:02:37 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Thu, 14 Dec 2006 13:32:37 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle). I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: 3.out Type: application/octet-stream Size: 204960 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061214/1416cef5/attachment-0001.obj From stewarta at nmrc.navy.mil Thu Dec 14 11:34:43 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 11:34:43 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <45815B63.1020003@sendu.me.uk> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> Message-ID: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Thanks for the reply, Sendu. So I've tried passing a reference to an array of Seq objects with the following code... push @blast_run, $factory->blastall(\@query); # where @query is an array of Bio::Seq objects (In case you're wondering, I'm pushing the report into an array of reports because I'm running several instances of blastall with different parameters each time.) ....and it throws me the following exception... ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ Bio/Tools/Run/StandAloneBlast.pm:557 STACK: main::run_blastall ./new_blast_script.pl:215 STACK: ./new_blast_script.pl:115 ----------------------------------------------------------- And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm returns... 757 my $status = system($commandstring); 758 759 $self->throw("$executable call crashed: $? $commandstring \n") 760 unless ($status==0) ; So it looks like the system call isn't returning a happy $status. At this point I'm pretty much stuck, though. Blastall works just fine if I only send it a single Seq object. Looking at _setinput, it appears a reference to an array of Seq objects should end up creating a multi-fasta file. The only possibilities I can think of to explain this is... - The -i file isn't be created for some reason when an (ref to) array of Seqs is passed - There is something wrong with the -i file that is created and sent to blastall. - Something else is wrong with the $commandstring being sent to the system call. Does anyone see something here that I don't? Thanks, Andrew On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote: > Andrew Stewart wrote: >> I am trying to StandAloneBlast->blastall an array or Bio::Seq >> objects. The documentation claims that blastall can be passed a >> file name, > > You're referring to 'In addition, sequence input may be in the form > of either a Bio::Seq object or or an array of Bio::Seq objects'? I > agree its not clear, but supplying a reference to an array is still > supplying an array. Anyway, I'll clarify it. > > > In any case, the usage for the method is what you should pay > attention to: > >> Usage: >> $seq_array_ref = \@seq_array; # where @seq_array is an array of >> Bio::Seq objects >> $blast_report = $factory->blastall(\@seq_array); >> Should this be... >> $report = $factory->blastall(@seq_array); >> or >> $report = $factory->blastall(\@seq_array); >> ??? > > It should be exactly what it says. A reference to the array. > > >> And if you are blastall'ing an array of Seq objects, then does >> blastall just return one big blast report or should I be expecting >> an array of blast reports? > > Returns : Reference to a Blast object or BPlite object > containing the blast report. > > That means, just one big object, not an array. -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From cjfields at uiuc.edu Thu Dec 14 12:03:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 11:03:12 -0600 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Message-ID: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote: > Thanks for the reply, Sendu. > > So I've tried passing a reference to an array of Seq objects with the > following code... > > push @blast_run, $factory->blastall(\@query); # where @query is an > array of Bio::Seq objects > > (In case you're wondering, I'm pushing the report into an array of > reports because I'm running several instances of blastall with > different parameters each time.) > > ....and it throws me the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ > common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E > > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 > STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 > STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ > Bio/Tools/Run/StandAloneBlast.pm:557 > STACK: main::run_blastall ./new_blast_script.pl:215 > STACK: ./new_blast_script.pl:115 > ----------------------------------------------------------- > > And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm > returns... > 757 my $status = system($commandstring); > 758 > 759 $self->throw("$executable call crashed: $? $commandstring > \n") > 760 unless ($status==0) ; > > So it looks like the system call isn't returning a happy $status. At > this point I'm pretty much stuck, though. Blastall works just fine > if I only send it a single Seq object. Looking at _setinput, it > appears a reference to an array of Seq objects should end up creating > a multi-fasta file. The only possibilities I can think of to explain > this is... > > - The -i file isn't be created for some reason when an (ref to) array > of Seqs is passed > - There is something wrong with the -i file that is created and sent > to blastall. > - Something else is wrong with the $commandstring being sent to the > system call. > > Does anyone see something here that I don't? The error pops up when the executable returns a bad status, so maybe it's choking on too many input sequences (i.e. Bioperl is doing everything correctly, but you are attempting to BLAST too many sequences in one go). How many sequences are you attempting to use as input? What happens when you use fewer input sequences? chris From stewarta at nmrc.navy.mil Thu Dec 14 12:49:45 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 12:49:45 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> Message-ID: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> > So can you look at the tempfile that is created and see if it is sane? > > Set -save_tempfiles => 1 whene you initialize the factory object or do > $factory->save_tempfiles(1) > before calling the blastall. > > -jason > Jason, I was actually wondering how to do that. Thanks. Odd though, it still doesn't seem to be saving the tempfiles. Might not matter though, because... > The error pops up when the executable returns a bad status, so > maybe it's choking on too many input sequences (i.e. Bioperl is > doing everything correctly, but you are attempting to BLAST too > many sequences in one go). How many sequences are you attempting > to use as input? What happens when you use fewer input sequences? > > chris > I was processing 738 sequences for input. I cut that down to 20 sequences and I'm getting some other exception thrown further downstream, so it appears you may be correct. You don't happen to know what the max number of sequences that blastall allows for input, would ya? ;) I suppose I'll have to break @query down into smaller doses or something. Thanks, Andrew On Dec 14, 2006, at 12:03 PM, Chris Fields wrote: > > On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote: > >> Thanks for the reply, Sendu. >> >> So I've tried passing a reference to an array of Seq objects with the >> following code... >> >> push @blast_run, $factory->blastall(\@query); # where @query is an >> array of Bio::Seq objects >> >> (In case you're wondering, I'm pushing the report into an array of >> reports because I'm running several instances of blastall with >> different parameters each time.) >> >> ....and it throws me the following exception... >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: blastall call crashed: 11 /common/bin/blastall -p blastp - >> d "/ >> common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: >> 328 >> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ >> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 >> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/ >> lib/ >> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 >> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/ >> perl5/5.8.6/ >> Bio/Tools/Run/StandAloneBlast.pm:557 >> STACK: main::run_blastall ./new_blast_script.pl:215 >> STACK: ./new_blast_script.pl:115 >> ----------------------------------------------------------- >> >> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm >> returns... >> 757 my $status = system($commandstring); >> 758 >> 759 $self->throw("$executable call crashed: $? $commandstring >> \n") >> 760 unless ($status==0) ; >> >> So it looks like the system call isn't returning a happy $status. At >> this point I'm pretty much stuck, though. Blastall works just fine >> if I only send it a single Seq object. Looking at _setinput, it >> appears a reference to an array of Seq objects should end up creating >> a multi-fasta file. The only possibilities I can think of to explain >> this is... >> >> - The -i file isn't be created for some reason when an (ref to) array >> of Seqs is passed >> - There is something wrong with the -i file that is created and sent >> to blastall. >> - Something else is wrong with the $commandstring being sent to the >> system call. >> >> Does anyone see something here that I don't? > > The error pops up when the executable returns a bad status, so > maybe it's choking on too many input sequences (i.e. Bioperl is > doing everything correctly, but you are attempting to BLAST too > many sequences in one go). How many sequences are you attempting > to use as input? What happens when you use fewer input sequences? > > chris > -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From Derek.Fairley at bll.n-i.nhs.uk Thu Dec 14 12:58:10 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Thu, 14 Dec 2006 17:58:10 -0000 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> Message-ID: Neeti, >From http://emboss.sourceforge.net/apps/cvs/needle.html: "The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats." Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format. HTH, Derek. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya Sent: 14 December 2006 08:03 To: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle). I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > -- -Neeti Even my blood says, B positive From cjfields at uiuc.edu Thu Dec 14 13:36:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 12:36:09 -0600 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> Message-ID: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote: >> So can you look at the tempfile that is created and see if it is >> sane? >> >> Set -save_tempfiles => 1 whene you initialize the factory object >> or do >> $factory->save_tempfiles(1) >> before calling the blastall. >> >> -jason >> > > Jason, > I was actually wondering how to do that. Thanks. Odd though, it > still doesn't seem to be saving the tempfiles. Might not matter That needs to be checked out. Can anyone verify that? >> The error pops up when the executable returns a bad status, so >> maybe it's choking on too many input sequences (i.e. Bioperl is >> doing everything correctly, but you are attempting to BLAST too >> many sequences in one go). How many sequences are you attempting >> to use as input? What happens when you use fewer input sequences? >> >> chris >> > > I was processing 738 sequences for input. I cut that down to 20 > sequences and I'm getting some other exception thrown further > downstream, so it appears you may be correct. You don't happen to > know what the max number of sequences that blastall allows for input, > would ya? ;) I suppose I'll have to break @query down into smaller > doses or something. > > Thanks, > Andrew It was a shot in the dark, really. The fact that the return status was bad could be due to a number of problems (permissions issues, bad data, etc). The fact that a single sequence worked indicated that permissions and output format likely weren't to blame. The only other thing left was a problem with blastall itself. BTW, the blast docs do not indicate whether there is a maximum number of sequences. There may be a point where available memory becomes the limiting issue. chris From vaughn at cshl.edu Thu Dec 14 14:09:34 2006 From: vaughn at cshl.edu (Matthew Vaughn) Date: Thu, 14 Dec 2006 14:09:34 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking Message-ID: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> Dear all, I'm trying to bring some of my code into compliance with the BioPerl 1.5.2 and am running into some design decisions that I am unclear on. Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of the 'type' against SOFA? It seems to me that this should be optional behavior as is the case with the Bio::FeatureIO family. I'd be happy to write the patch if there is any agreement with me on this case. Thanks, Matt -- Matthew W. Vaughn, Ph.D. Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 phone: (516) 367-8469 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2413 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061214/59a9ac32/attachment.bin From jason at bioperl.org Thu Dec 14 11:59:20 2006 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Dec 2006 11:59:20 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Message-ID: <640E2BB7-33F3-44C9-B903-9DDA54F02D12@bioperl.org> So can you look at the tempfile that is created and see if it is sane? Set -save_tempfiles => 1 whene you initialize the factory object or do $factory->save_tempfiles(1) before calling the blastall. -jason On Dec 14, 2006, at 11:34 AM, Andrew Stewart wrote: > Thanks for the reply, Sendu. > > So I've tried passing a reference to an array of Seq objects with the > following code... > > push @blast_run, $factory->blastall(\@query); # where @query is an > array of Bio::Seq objects > > (In case you're wondering, I'm pushing the report into an array of > reports because I'm running several instances of blastall with > different parameters each time.) > > ....and it throws me the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ > common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E > > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 > STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 > STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ > Bio/Tools/Run/StandAloneBlast.pm:557 > STACK: main::run_blastall ./new_blast_script.pl:215 > STACK: ./new_blast_script.pl:115 > ----------------------------------------------------------- > > And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm > returns... > 757 my $status = system($commandstring); > 758 > 759 $self->throw("$executable call crashed: $? $commandstring > \n") > 760 unless ($status==0) ; > > So it looks like the system call isn't returning a happy $status. At > this point I'm pretty much stuck, though. Blastall works just fine > if I only send it a single Seq object. Looking at _setinput, it > appears a reference to an array of Seq objects should end up creating > a multi-fasta file. The only possibilities I can think of to explain > this is... > > - The -i file isn't be created for some reason when an (ref to) array > of Seqs is passed > - There is something wrong with the -i file that is created and sent > to blastall. > - Something else is wrong with the $commandstring being sent to the > system call. > > Does anyone see something here that I don't? > > > Thanks, > Andrew > > > > On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote: > >> Andrew Stewart wrote: >>> I am trying to StandAloneBlast->blastall an array or Bio::Seq >>> objects. The documentation claims that blastall can be passed a >>> file name, >> >> You're referring to 'In addition, sequence input may be in the form >> of either a Bio::Seq object or or an array of Bio::Seq objects'? I >> agree its not clear, but supplying a reference to an array is still >> supplying an array. Anyway, I'll clarify it. >> >> >> In any case, the usage for the method is what you should pay >> attention to: >> >>> Usage: >>> $seq_array_ref = \@seq_array; # where @seq_array is an array of >>> Bio::Seq objects >>> $blast_report = $factory->blastall(\@seq_array); >>> Should this be... >>> $report = $factory->blastall(@seq_array); >>> or >>> $report = $factory->blastall(\@seq_array); >>> ??? >> >> It should be exactly what it says. A reference to the array. >> >> >>> And if you are blastall'ing an array of Seq objects, then does >>> blastall just return one big blast report or should I be expecting >>> an array of blast reports? >> >> Returns : Reference to a Blast object or BPlite object >> containing the blast report. >> >> That means, just one big object, not an array. > > > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From stewarta at nmrc.navy.mil Thu Dec 14 16:23:07 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 16:23:07 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> Message-ID: > It was a shot in the dark, really. The fact that the return status > was bad could be due to a number of problems (permissions issues, > bad data, etc). The fact that a single sequence worked indicated > that permissions and output format likely weren't to blame. The > only other thing left was a problem with blastall itself. > > BTW, the blast docs do not indicate whether there is a maximum > number of sequences. There may be a point where available memory > becomes the limiting issue. > > chris Interesting. I ran the 738-sequence dataset through blastall manually and the report only returned 198 of the 738 expected results. Not only that, it seems to have just cut off right in the middle of the 198th result and a Segmentation fault was reported. I removed the 198th sequence, wondering if it might be some issue with the input, and the segmentation fault occured again with the results ending on the 210th result. I stuck the 198th sequence back in, but at the start of the file and sure enough the Segmentation error occurred earlier. I think we can rule out the size of the input or number of sequences as the source of error here. I'm more inclined to think it has something to do with the blast databases being queried against. I found an old discussion on a problem that sounds fairly similar to this one, for anyone interested. http://bioinformatics.org/pipermail/bioclusters/2004-June/001742.html I think I'll try to work around the problem for now. andrew On Dec 14, 2006, at 1:36 PM, Chris Fields wrote: > > On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote: > >>> So can you look at the tempfile that is created and see if it is >>> sane? >>> >>> Set -save_tempfiles => 1 whene you initialize the factory object >>> or do >>> $factory->save_tempfiles(1) >>> before calling the blastall. >>> >>> -jason >>> >> >> Jason, >> I was actually wondering how to do that. Thanks. Odd though, it >> still doesn't seem to be saving the tempfiles. Might not matter > > That needs to be checked out. Can anyone verify that? > >>> The error pops up when the executable returns a bad status, so >>> maybe it's choking on too many input sequences (i.e. Bioperl is >>> doing everything correctly, but you are attempting to BLAST too >>> many sequences in one go). How many sequences are you attempting >>> to use as input? What happens when you use fewer input sequences? >>> >>> chris >>> >> >> I was processing 738 sequences for input. I cut that down to 20 >> sequences and I'm getting some other exception thrown further >> downstream, so it appears you may be correct. You don't happen to >> know what the max number of sequences that blastall allows for input, >> would ya? ;) I suppose I'll have to break @query down into smaller >> doses or something. >> >> Thanks, >> Andrew > > It was a shot in the dark, really. The fact that the return status > was bad could be due to a number of problems (permissions issues, > bad data, etc). The fact that a single sequence worked indicated > that permissions and output format likely weren't to blame. The > only other thing left was a problem with blastall itself. > > BTW, the blast docs do not indicate whether there is a maximum > number of sequences. There may be a point where available memory > becomes the limiting issue. > > chris -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From lincoln.stein at gmail.com Thu Dec 14 15:24:56 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 14 Dec 2006 15:24:56 -0500 Subject: [Bioperl-l] Bio::Graphics xyplot In-Reply-To: <4578951B.5050206@sfu.ca> References: <4578951B.5050206@sfu.ca> Message-ID: <6dce9a0b0612141224r1ef7cce2s6e6123461c3827d8@mail.gmail.com> Hi, The way it works is that you create a single feature that spans the entire range of the xyplot. It contains subfeatures, each of which has a score. The graph points correspond to each of the subfeatures. Lincoln On 12/7/06, Keith Anthony Boroevich wrote: > > Hi everyone, > > I'm attempting to add an xyplot of the phred quality scores to an > Bio::Graphics image, and cannot get it to work. > I have the panel with a track for both the scale and the DNA displaying > properly. When I attempt to add the xyplot i just get a garbled track > of, what looks like, timy xyplots for each datapoint. I have the cvs > (updated today) of bioperl-live running. I think what I am missing is > the creation of a "Sequence Feature Group" to hold the individual points > of the plot. However, I cannot seem to find such an object. This is > what I attempted: > > -------BEGIN---CODE----------- > # start panel > my $panel = Bio::Graphics::Panel->new(-length => $f_seqlen, > -width => $f_seqlen*10, > -pad_left => 10, > -pad_right => 10, > -grid => 1 > ); > # add scale > $panel->add_track(arrow => > Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen), > -double => 1, > -tick => 2, > -fgcolor => 'black'); > # add DNA ($feature is of type Bio::SeqFeature::Annotated) > $panel->add_track(dna => $feature); > # get list of quality scores from database > my ($pqs_value) = $dbh->selectrow_array($sql); > my @pqs_value = split(/\s/,$pqs_value); > # create track > my $track = $panel->add_track(-glyph => 'xyplot', > -graph_type => 'points', > -point_symbol => 'point', > -max_score => 100, > -min_score => 0, > -scale => 'none'); > # add "subfeatures" to > for (my $i=0;$i<$f_seqlen;$i++) { > > > $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i])); > > } > print $panel->png(); > $panel->finished; > ------END---CODE---------- > > I also attempted to create an array of the point features and passed > that by reference to the panel "add_track" as it describes in the xyplot > documentation, but that resulted in the exact same image. > > keith > > -- > ><)))?> -cGRASP- < > Keith Anthony Boroevich > Davidson Lab > Dept of Molecular Biology > Simon Fraser University > Tel: 604-268-7276 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bix at sendu.me.uk Thu Dec 14 17:15:07 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 17:15:07 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> Message-ID: <4581CCEB.20206@sendu.me.uk> Matthew Vaughn wrote: > Dear all, > > I'm trying to bring some of my code into compliance with the BioPerl > 1.5.2 and am running into some design decisions that I am unclear on. > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of > the 'type' against SOFA? It seems to me that this should be optional > behavior as is the case with the Bio::FeatureIO family. I'd be happy to > write the patch if there is any agreement with me on this case. Lots of people seem to have worked on it over the years, but perhaps Scott Cain is the person to talk to? revision 1.4 date: 2004/09/25 11:41:29; author: scain; state: Exp; lines: +1 -1 two things: * adding SOFA as an available ontology to DocumentRegistry.pm * modifying FeatureIO::gff to use SOFA to validate, and to parse Ontology_term From lincoln.stein at gmail.com Thu Dec 14 16:56:41 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 14 Dec 2006 16:56:41 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: References: Message-ID: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> Hi All, I'm afraid that the xyplot glyph that is in the recent bioperl release has an error that causes the content to be printed to the right of the correct position. Unfortunately this wasn't caught before the release because the glyph was only tested on very large (whole genome) features. You will need to do a CVS update to get a fixed version from bioperl-live. A future bugfix release of gbrowse will patch this glyph for you automatically. Lincoln On 12/12/06, Kara Dolinski wrote: > > Hi, > I'm having a problem getting features and an xyplot properly aligned in > Gbrowse. For example, see this page: > > http://tinyurl.com/ylbq3q > > The feature in the "CENPK SNPs" track should actually be around the peak > of the graph in the "CENPK prediction signal" xyplot ie. the SNP feature > is at position 79, and the xyplot axes and data should span from 61 - 95. > However, as you can see, the data in the xyplot are oddly separated from > the axes (which seem to be in the correct place), with the data shifted over > to about position 120-155. > This occurs elsewhere, not just at the ends of the chromosomes. > > When I zoom to ~80 bp, all is well, see: > > http://tinyurl.com/yzav8k > > The relevant snippets from the GFF and the config files are below. > > Thanks! > Kara > > GFF: > > chrI SNPScanner CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score > is 2.24506 > chrI SNPScanner CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score > is 3.26837 > chrI SNPScanner CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score > is 1.39938 > chrI SNPScanner CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score > is 1.4039 > chrI SNPScanner CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score > is 9.16134 > chrI SNPScanner CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score > is 10.1413 > chrI SNPScanner CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score > is 12.9256 > chrI SNPScanner CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score > is 13.195 > chrI SNPScanner CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score > is 22.7127 > chrI SNPScanner CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score > is 23.8289 > chrI SNPScanner CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score > is 21.9123 > chrI SNPScanner CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score > is 28.3344 > chrI SNPScanner CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score > is 35.0436 > chrI SNPScanner CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score > is 37.361 > chrI SNPScanner CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score > is 39.5408 > chrI SNPScanner CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score > is 28.2008 > chrI SNPScanner CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score > is 32.6254 > chrI SNPScanner CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score > is 36.0832 > chrI SNPScanner CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score > is 32.1205 > chrI SNPScanner CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score > is 41.3048 > chrI SNPScanner CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score > is 30.7975 > chrI SNPScanner CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score > is 29.4282 > chrI SNPScanner CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score > is 35.3586 > chrI SNPScanner CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score > is 34.1426 > chrI SNPScanner CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score > is 30.2966 > chrI SNPScanner CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score > is 17.8402 > chrI SNPScanner CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score > is 15.2637 > chrI SNPScanner CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score > is 12.657 > chrI SNPScanner CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score > is 10.2033 > chrI SNPScanner CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score > is 9.40143 > chrI SNPScanner CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score > is 6.56273 > chrI SNPScanner CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score > is 3.66211 > chrI SNPScanner CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score > is 0.394194 > > CONFIG: > > > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} > > [CENPK_all_scores_graph] > feature = GRAPH_CENPK:SNPScanner > glyph = xyplot > graph_type = boxes > fgcolor = purple > bgcolor = purple > height = 100 > min_score = 0 > max_score = 110 > label = 0 > key = CENPK prediction signal > link = > category = SNPs: signal graphs > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dmessina at wustl.edu Thu Dec 14 20:45:24 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 19:45:24 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: References: Message-ID: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> Hey Chris, My thoughts below. > [Chris] > This could be used to annotate any > PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, > maybe in a collection (similar to AnnotationCollection). I thought > something like this may be of general use for any PrimarySeq > (quality, structure), alignments like NEXUS and Stockholm, > SeqFeatures where structure could be stored (tRNA or riboswitches), > etc. > > However, this also seems to fall into the category of sequence > annotation. So, would it be better to have a set of Bio::Annotation > classes used for this purpose? To me, all meta data is equal. That is, your classic Genbank feature annotation and a user's arbitrary meta-tag like "Bob thinks this is a kinase domain" aren't different in kind even if they are different in content. As resequencing projects multiply, the ability to create arbitrary meta tags, attach them to different types of objects, and use those tags to link them together will become desirable, if not essential. Keeping a common interface to all of these meta data types would be advantageous, plus new users won't have to determine whether they need to use Bio::Meta objects or Bio::Annotation objects. So I would argue for all of the meta data types to live "under one roof". Which roof isn't as important. Bio::Annotation, since it already exists for today's meta data, seems like a reasonable choice. (assuming Annotation objects are flexible enough to be extended as you propose) There, and no flames or jibes even. :) Dave From cjfields at uiuc.edu Thu Dec 14 21:21:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 20:21:10 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> Message-ID: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> On Dec 14, 2006, at 7:45 PM, David Messina wrote: > Hey Chris, > > My thoughts below. > >> [Chris] >> This could be used to annotate any >> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, >> maybe in a collection (similar to AnnotationCollection). I thought >> something like this may be of general use for any PrimarySeq >> (quality, structure), alignments like NEXUS and Stockholm, >> SeqFeatures where structure could be stored (tRNA or riboswitches), >> etc. >> >> However, this also seems to fall into the category of sequence >> annotation. So, would it be better to have a set of Bio::Annotation >> classes used for this purpose? > > > To me, all meta data is equal. That is, your classic Genbank feature > annotation and a user's arbitrary meta-tag like "Bob thinks this is a > kinase domain" aren't different in kind even if they are different in > content. > > As resequencing projects multiply, the ability to create arbitrary > meta tags, attach them to different types of objects, and use those > tags to link them together will become desirable, if not essential. > > Keeping a common interface to all of these meta data types would be > advantageous, plus new users won't have to determine whether they > need to use Bio::Meta objects or Bio::Annotation objects. > > So I would argue for all of the meta data types to live "under one > roof". Which roof isn't as important. Bio::Annotation, since it > already exists for today's meta data, seems like a reasonable choice. > (assuming Annotation objects are flexible enough to be extended as > you propose) > > There, and no flames or jibes even. :) I guess what I want to know is whether there should to be a distinction between 'normal' sequence annotation (comments, references, and so on) and annotation that could be best described as position-specific (like RNA or protein structural annotation). The current meta implementation is for sequence data only; I felt it would be nice to have a generic implementation that would be applicable to any object data. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dmessina at wustl.edu Thu Dec 14 21:46:27 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 20:46:27 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> Message-ID: <9C72012A-EFD7-42DD-93F8-578251CFDE01@wustl.edu> And it all seemed so clear to me when I wrote it. :) > whether there should to be a distinction I would argue no because it would contravene a s > a generic implementation that would be applicable to any object data. I wholeheartedly agree that this is the way to go. A generic implementation would allow arbitrary object data while maintaining a standard interface. From dmessina at wustl.edu Thu Dec 14 21:46:27 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 20:46:27 -0600 Subject: [Bioperl-l] Proposal for Meta data Message-ID: [oops, accidentally hit send midsentence] And it all seemed so clear to me when I wrote it. :) > whether there should to be a distinction I would argue no because it would contravene a standard interface. > a generic implementation that would be applicable to any object data. I wholeheartedly agree that this is the way to go. A generic implementation would allow arbitrary object data while maintaining a standard interface. Dave From neetisomaiya at gmail.com Fri Dec 15 00:21:42 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 15 Dec 2006 10:51:42 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> Message-ID: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Hi, Thanks a lot for your response. I ran needle like this /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out It gave me the output in format msf. But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO. Please help. Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence. ~Neeti. On 12/14/06, Fairley, Derek wrote: > > Neeti, > > > > From http://emboss.sourceforge.net/apps/cvs/needle.html: > > > > "The results can be output in one of several styles by using the > command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of > the required format. Some of the alignment formats can cope with an > unlimited number of sequences, while others are only for pairs of sequences. > > > > > The available multiple alignment format names are: unknown, multiple, > simple, fasta, msf, trace, srs > > > > The available pairwise alignment format names are: pair, markx0, markx1, > markx2, markx3, markx10, srspair, score > > > > See: http://emboss.sf.net/docs/themes/AlignFormats.html for further > information on alignment formats." > > > > Not sure based on this whether you can get pairwise alignment in .msf > format; can't think of a good reason why not. The BioPerl Align::IO module > will allow you to parse alignments in .msf format. > > > > HTH, > > > > Derek. > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto: > bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya > Sent: 14 December 2006 08:03 > To: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > > > How do I run needle specifying that I want the MSF format, on a linux box? > > The help doesnt show me any format option. Is there anything available to > > pasre MSF format? > > Please find an example alignment file attached. Here the seq_of_contig > > aligns with the reference sequence (i.e. SEQ_1.REF) starting at position > > (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the > > output alignment, how can I parse the result to get this? > > > > On 12/12/06, Chris Fields wrote: > > > > > > > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > > > > > Hi, > > > > > > > > Does anyone know of a bioperl parser for needle output, basically I > > > > won't > > > > where the target sequence aligns on the template (i.e. coordinate > > > > on the > > > > template where the taget aligns). > > > > > > > > -- > > > > -Neeti > > > > Even my blood says, B positive > > > > > > I answered this a number of months back: > > > > > > http://tinyurl.com/yzlbx5 > > > > > > Basically, newer versions of EMBOSS have changed the output for the > > > AlignIO::emboss parser (which parses needle). I don't believe the > > > parser has been fixed to deal with that, but Jason has pointed out > > > you can use MSF output when running needle, then parse using AlignIO > > > with the format set to 'msf'. > > > > > > chris > > > > > > > > > > > -- > > -Neeti > > Even my blood says, B positive > -- -Neeti Even my blood says, B positive From Derek.Fairley at bll.n-i.nhs.uk Fri Dec 15 04:57:35 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Fri, 15 Dec 2006 09:57:35 -0000 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Message-ID: Neeti, In lieu of a response from a BioPerl guru... why not use Needle to generate your pairwise alignment in fasta format, rather than msf format? The sequence you want should correspond to a single HSP which you can get directly from the fasta alignment with Bio::SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use Bio::AlignIO at all. Derek. -----Original Message----- From: neeti somaiya [mailto:neetisomaiya at gmail.com] Sent: 15 December 2006 05:22 To: Fairley, Derek; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? Hi, Thanks a lot for your response. I ran needle like this ?/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out It gave me the output in format msf. But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO. Please help. Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence. ~Neeti. On 12/14/06, Fairley, Derek wrote: Neeti, ? >From http://emboss.sourceforge.net/apps/cvs/needle.html : ? "The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. ? The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs ? The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score ? See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats." ? Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format. ? HTH, ? Derek. ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya Sent: 14 December 2006 08:03 To: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? ? How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? ? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle).? I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > ? ? ? -- -Neeti Even my blood says, B positive -- -Neeti Even my blood says, B positive From cain at cshl.edu Fri Dec 15 00:01:36 2006 From: cain at cshl.edu (Scott Cain) Date: Fri, 15 Dec 2006 00:01:36 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <4581CCEB.20206@sendu.me.uk> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> Message-ID: <1166158897.2569.335.camel@localhost.localdomain> As much as I would like to take credit for this :-) Allen Day wrote the original code, and then Chris Fields tried to fix it so that it actually worked :-) I think it would be a good idea to have a validate_terms option like Bio::FeatureIO::gff. Scott On Thu, 2006-12-14 at 17:15 -0500, Sendu Bala wrote: > Matthew Vaughn wrote: > > Dear all, > > > > I'm trying to bring some of my code into compliance with the BioPerl > > 1.5.2 and am running into some design decisions that I am unclear on. > > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of > > the 'type' against SOFA? It seems to me that this should be optional > > behavior as is the case with the Bio::FeatureIO family. I'd be happy to > > write the patch if there is any agreement with me on this case. > > Lots of people seem to have worked on it over the years, but perhaps > Scott Cain is the person to talk to? > > revision 1.4 > date: 2004/09/25 11:41:29; author: scain; state: Exp; lines: +1 -1 > two things: > * adding SOFA as an available ontology to DocumentRegistry.pm > * modifying FeatureIO::gff to use SOFA to validate, and to parse > Ontology_term > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/021ec42f/attachment.bin From neetisomaiya at gmail.com Fri Dec 15 07:46:08 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 15 Dec 2006 18:16:08 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Message-ID: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> I ran needle like this /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out Please find the output attached. When I run the following :- use Bio::SearchIO; my $io = Bio::SearchIO->new(-file => "1.out", -format => "fasta" ); while ( my $result = $io->next_result() ) { while( my $hit = $result->next_hit) { print "yes\n"; } } It says :- -------------------- WARNING --------------------- MSG: unrecognized FASTA Family report file! --------------------------------------------------- What should I do? ~Neeti. On 12/15/06, Fairley, Derek wrote: > > Neeti, > > In lieu of a response from a BioPerl guru... why not use Needle to > generate your pairwise alignment in fasta format, rather than msf format? > The sequence you want should correspond to a single HSP which you can get > directly from the fasta alignment with Bio::SearchIO: > http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use > Bio::AlignIO at all. > > Derek. > > > -----Original Message----- > From: neeti somaiya [mailto:neetisomaiya at gmail.com] > Sent: 15 December 2006 05:22 > To: Fairley, Derek; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > Hi, > > Thanks a lot for your response. > I ran needle like this > /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out > It gave me the output in format msf. > But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I > get the alignment start and stop coordinates on the sequence. I mean > something like hsp->query->start which gives us the alignment start position > on query sequence in a blast output when using Bio::SearchIO. > Please help. > Like I explained with an example in my previous mail, I want the > coordinate where the alignment starts on the sequence. > > ~Neeti. > On 12/14/06, Fairley, Derek wrote: > Neeti, > > From http://emboss.sourceforge.net/apps/cvs/needle.html : > > "The results can be output in one of several styles by using the > command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of > the required format. Some of the alignment formats can cope with an > unlimited number of sequences, while others are only for pairs of sequences. > > The available multiple alignment format names are: unknown, multiple, > simple, fasta, msf, trace, srs > > The available pairwise alignment format names are: pair, markx0, markx1, > markx2, markx3, markx10, srspair, score > > See: http://emboss.sf.net/docs/themes/AlignFormats.html for further > information on alignment formats." > > Not sure based on this whether you can get pairwise alignment in .msf > format; can't think of a good reason why not. The BioPerl Align::IO module > will allow you to parse alignments in .msf format. > > HTH, > > Derek. > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto: > bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya > Sent: 14 December 2006 08:03 > To: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > How do I run needle specifying that I want the MSF format, on a linux box? > The help doesnt show me any format option. Is there anything available to > pasre MSF format? > Please find an example alignment file attached. Here the seq_of_contig > aligns with the reference sequence (i.e. SEQ_1.REF) starting at position > (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the > output alignment, how can I parse the result to get this? > > On 12/12/06, Chris Fields wrote: > > > > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > > > Hi, > > > > > > Does anyone know of a bioperl parser for needle output, basically I > > > won't > > > where the target sequence aligns on the template (i.e. coordinate > > > on the > > > template where the taget aligns). > > > > > > -- > > > -Neeti > > > Even my blood says, B positive > > > > I answered this a number of months back: > > > > http://tinyurl.com/yzlbx5 > > > > Basically, newer versions of EMBOSS have changed the output for the > > AlignIO::emboss parser (which parses needle). I don't believe the > > parser has been fixed to deal with that, but Jason has pointed out > > you can use MSF output when running needle, then parse using AlignIO > > with the format set to 'msf'. > > > > chris > > > > > > -- > -Neeti > Even my blood says, B positive > > > > -- > -Neeti > Even my blood says, B positive > -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.out Type: application/octet-stream Size: 90277 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/34b05d03/attachment-0001.obj From jason at bioperl.org Fri Dec 15 09:28:13 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 09:28:13 -0500 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> Message-ID: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > > On Dec 14, 2006, at 7:45 PM, David Messina wrote: > >> Hey Chris, >> >> My thoughts below. >> >>> [Chris] >>> This could be used to annotate any >>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, >>> maybe in a collection (similar to AnnotationCollection). I thought >>> something like this may be of general use for any PrimarySeq >>> (quality, structure), alignments like NEXUS and Stockholm, >>> SeqFeatures where structure could be stored (tRNA or riboswitches), >>> etc. >>> >>> However, this also seems to fall into the category of sequence >>> annotation. So, would it be better to have a set of Bio::Annotation >>> classes used for this purpose? >> >> >> To me, all meta data is equal. That is, your classic Genbank feature >> annotation and a user's arbitrary meta-tag like "Bob thinks this is a >> kinase domain" aren't different in kind even if they are different in >> content. >> >> As resequencing projects multiply, the ability to create arbitrary >> meta tags, attach them to different types of objects, and use those >> tags to link them together will become desirable, if not essential. >> >> Keeping a common interface to all of these meta data types would be >> advantageous, plus new users won't have to determine whether they >> need to use Bio::Meta objects or Bio::Annotation objects. >> >> So I would argue for all of the meta data types to live "under one >> roof". Which roof isn't as important. Bio::Annotation, since it >> already exists for today's meta data, seems like a reasonable choice. >> (assuming Annotation objects are flexible enough to be extended as >> you propose) >> >> There, and no flames or jibes even. :) > > I guess what I want to know is whether there should to be a > distinction between 'normal' sequence annotation (comments, > references, and so on) and annotation that could be best described as > position-specific (like RNA or protein structural annotation). The > current meta implementation is for sequence data only; I felt it > would be nice to have a generic implementation that would be > applicable to any object data. my stream-of-consciousness for right now: I was thinking Bio::Annotation is where this should go - that system doesn't have anything about it that makes it explicitly sequence related. What we're trying to hammer out here on the Alignment side - which fits with your RNA example - is have features, basically SeqFeatures - associated with alignments so columns can be annotated to cover things like character sets and partitions for phylogenetic analyses. As for data which annotates non-contiguous things like RNAstems we may have to be more creative about that or model it with a splitLocation. So currently we've added code so that an Alignment is-a Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this end, with the goal of being able to capture more of the data that can be represented in a NEXUS file. It feels more like a hack than an elegant Meta-data solution, but I am totally sure whether the data you are thinking about doing at this point, perhaps I need to spend more time thinking about it. Or are you worried about the idea of whether the semantic mapping of the data into features or annotations is confusing users? From jason at bioperl.org Fri Dec 15 09:48:32 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 09:48:32 -0500 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> Message-ID: <42CB9018-72CD-433E-A42F-152D63D2F584@bioperl.org> I get the impression you are trying to use the wrong tool for the job. Can you explain a little more generally what you want to do? Semantically FASTA in Bio::SearchIO is much different from FASTA in Bio::AlignIO. We explain this on the wiki, please have a look on the FASTA page. do not use Bio::SearchIO to parse multi-fasta alignment output Bio::SearchIO is for pairwise alignment reports use Bio::AlignIO for a multi-fasta format or for msf - you just provide a different field to '-format'. But none of that is going to help you get start/end for your alignment because that is not part of the output format - do the experiment of looking at the file and figuring out what are the actual fields you want output, if they don't exist then you either have a format that won't work for your question, or you will have to calculate additional . If you trying to align transcripts to genome please consider tools that are built for it (and referenced on the wiki like Sim4, est2genome, exonerate, BLAT). -jason On Dec 15, 2006, at 7:46 AM, neeti somaiya wrote: > I ran needle like this > > /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out > > Please find the output attached. > > When I run the following :- > > use Bio::SearchIO; > > my $io = Bio::SearchIO->new(-file => "1.out", > -format => "fasta" ); > > while ( my $result = $io->next_result() ) > { > while( my $hit = $result->next_hit) > { > > print "yes\n"; > } > } > > > It says :- > > -------------------- WARNING --------------------- > MSG: unrecognized FASTA Family report file! > --------------------------------------------------- > > What should I do? > > ~Neeti. > > On 12/15/06, Fairley, Derek wrote: >> >> Neeti, >> >> In lieu of a response from a BioPerl guru... why not use Needle to >> generate your pairwise alignment in fasta format, rather than msf >> format? >> The sequence you want should correspond to a single HSP which you >> can get >> directly from the fasta alignment with Bio::SearchIO: >> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need >> to use >> Bio::AlignIO at all. >> >> Derek. >> >> >> -----Original Message----- >> From: neeti somaiya [mailto:neetisomaiya at gmail.com] >> Sent: 15 December 2006 05:22 >> To: Fairley, Derek; bioperl-l >> Subject: Re: [Bioperl-l] needle parser in bioperl? >> >> Hi, >> >> Thanks a lot for your response. >> I ran needle like this >> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out >> It gave me the output in format msf. >> But now my problem is, if I use Bio::AlignIO module of Bioperl, >> how can I >> get the alignment start and stop coordinates on the sequence. I mean >> something like hsp->query->start which gives us the alignment >> start position >> on query sequence in a blast output when using Bio::SearchIO. >> Please help. >> Like I explained with an example in my previous mail, I want the >> coordinate where the alignment starts on the sequence. >> >> ~Neeti. >> On 12/14/06, Fairley, Derek wrote: >> Neeti, >> >> From http://emboss.sourceforge.net/apps/cvs/needle.html : >> >> "The results can be output in one of several styles by using the >> command-line qualifier -aformat xxx, where 'xxx' is replaced by >> the name of >> the required format. Some of the alignment formats can cope with an >> unlimited number of sequences, while others are only for pairs of >> sequences. >> >> The available multiple alignment format names are: unknown, multiple, >> simple, fasta, msf, trace, srs >> >> The available pairwise alignment format names are: pair, markx0, >> markx1, >> markx2, markx3, markx10, srspair, score >> >> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further >> information on alignment formats." >> >> Not sure based on this whether you can get pairwise alignment in .msf >> format; can't think of a good reason why not. The BioPerl >> Align::IO module >> will allow you to parse alignments in .msf format. >> >> HTH, >> >> Derek. >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto: >> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya >> Sent: 14 December 2006 08:03 >> To: Chris Fields; bioperl-l >> Subject: Re: [Bioperl-l] needle parser in bioperl? >> >> How do I run needle specifying that I want the MSF format, on a >> linux box? >> The help doesnt show me any format option. Is there anything >> available to >> pasre MSF format? >> Please find an example alignment file attached. Here the >> seq_of_contig >> aligns with the reference sequence (i.e. SEQ_1.REF) starting at >> position >> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate >> from the >> output alignment, how can I parse the result to get this? >> >> On 12/12/06, Chris Fields wrote: >> > >> > >> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: >> > >> > > Hi, >> > > >> > > Does anyone know of a bioperl parser for needle output, >> basically I >> > > won't >> > > where the target sequence aligns on the template (i.e. coordinate >> > > on the >> > > template where the taget aligns). >> > > >> > > -- >> > > -Neeti >> > > Even my blood says, B positive >> > >> > I answered this a number of months back: >> > >> > http://tinyurl.com/yzlbx5 >> > >> > Basically, newer versions of EMBOSS have changed the output for the >> > AlignIO::emboss parser (which parses needle). I don't believe the >> > parser has been fixed to deal with that, but Jason has pointed out >> > you can use MSF output when running needle, then parse using >> AlignIO >> > with the format set to 'msf'. >> > >> > chris >> > >> >> >> >> -- >> -Neeti >> Even my blood says, B positive >> >> >> >> -- >> -Neeti >> Even my blood says, B positive >> > > > > -- > -Neeti > Even my blood says, B positive > <1.out> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From lubapardo at gmail.com Fri Dec 15 11:39:11 2006 From: lubapardo at gmail.com (Luba Pardo) Date: Fri, 15 Dec 2006 17:39:11 +0100 Subject: [Bioperl-l] NO BLAST Message-ID: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> *Hello,* *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;* ** *I got the following error message: cannot find path to blastall.* *The code I used is (modified from HOWTObeginners): * #! /local/bin/perl -w #use strict; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; use Bio::Tools::Run::StandAloneBlast; my $db_object = Bio::DB::GenBank-> new; #my $seq_ob = $db_object->get_Seq_by_id('NM_004043'); #$seq= Bio::SeqIO->new(-file => "> out.fasta", -format => 'fasta'); #$seq ->write_seq($seq_ob); #print $seq; @params = (program =>'blastn', database =>'db.fa'); $blast_obj =Bio::Tools::Run::StandAloneBlast->new(@params); $seq_obj = Bio::Seq->new(-id =>"testquery", -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); $report_obj = $blast_obj->blastall($seq_obj); $result_obj =$report_obj->next_result; print $result_obj->num_hits; *Whether I create a sequence the novo or retrieve one from internet I got the same message.* From cjfields at uiuc.edu Fri Dec 15 12:23:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 11:23:27 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> Message-ID: On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: > > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > >> >> On Dec 14, 2006, at 7:45 PM, David Messina wrote: >> >>> Hey Chris, >>> >>> My thoughts below. >>> >>>> [Chris] >>>> This could be used to annotate any >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- >>>> you, >>>> maybe in a collection (similar to AnnotationCollection). I thought >>>> something like this may be of general use for any PrimarySeq >>>> (quality, structure), alignments like NEXUS and Stockholm, >>>> SeqFeatures where structure could be stored (tRNA or riboswitches), >>>> etc. >>>> >>>> However, this also seems to fall into the category of sequence >>>> annotation. So, would it be better to have a set of >>>> Bio::Annotation >>>> classes used for this purpose? >>> >>> >>> To me, all meta data is equal. That is, your classic Genbank feature >>> annotation and a user's arbitrary meta-tag like "Bob thinks this >>> is a >>> kinase domain" aren't different in kind even if they are >>> different in >>> content. >>> >>> As resequencing projects multiply, the ability to create arbitrary >>> meta tags, attach them to different types of objects, and use those >>> tags to link them together will become desirable, if not essential. >>> >>> Keeping a common interface to all of these meta data types would be >>> advantageous, plus new users won't have to determine whether they >>> need to use Bio::Meta objects or Bio::Annotation objects. >>> >>> So I would argue for all of the meta data types to live "under one >>> roof". Which roof isn't as important. Bio::Annotation, since it >>> already exists for today's meta data, seems like a reasonable >>> choice. >>> (assuming Annotation objects are flexible enough to be extended as >>> you propose) >>> >>> There, and no flames or jibes even. :) >> >> I guess what I want to know is whether there should to be a >> distinction between 'normal' sequence annotation (comments, >> references, and so on) and annotation that could be best described as >> position-specific (like RNA or protein structural annotation). The >> current meta implementation is for sequence data only; I felt it >> would be nice to have a generic implementation that would be >> applicable to any object data. > > my stream-of-consciousness for right now: > > I was thinking Bio::Annotation is where this should go - that > system doesn't have anything about it that makes it explicitly > sequence related. What we're trying to hammer out here on the > Alignment side - which fits with your RNA example - is have > features, basically SeqFeatures - associated with alignments so > columns can be annotated to cover things like character sets and > partitions for phylogenetic analyses. As for data which annotates > non-contiguous things like RNAstems we may have to be more > creative about that or model it with a splitLocation. > > So currently we've added code so that an Alignment is-a > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this > end, with the goal of being able to capture more of the data that > can be represented in a NEXUS file. > > It feels more like a hack than an elegant Meta-data solution, but I > am totally sure whether the data you are thinking about doing at > this point, perhaps I need to spend more time thinking about it. > Or are you worried about the idea of whether the semantic mapping > of the data into features or annotations is confusing users? Sorry in advance for the longish response here... My original thought was to have a generic abstract class capable of positionally describing data in any another class, similar to Heikki's Bio::Seq::MetaI but not constrained to sequence data only. Implementing classes would be capable of having different data structures based on their use (simple string, array, AoA, AoH, AoO). One MetaCollection class to contain them all in a tag-like system, so you could have mixed data types describe the same object. The latter Collection class is so similar to AnnotationCollection that I agree Bio::Annotation would be the best place for this. The way I reconfigured Stockholm alignment parsing/writing is to use Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is capable of holding a sequence and several meta strings, stored as tags or 'names'. However, there is no Meta object for alignments (for RNA/protein structure consensus and other Rfam/Pfam markup); I hacked around this by using a Bio::Seq::Meta w/o a seq, but I would rather have a generic Meta object independent of the sequence cruft. So for this partial Pfam alignment, Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG #=GR Q92SV1_RHIME/122-299 pAS ......................... Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT #=GC SA_cons 03002200312...1312414..676 #=GC seq_cons luhhLuhsRpl...hthppth..+pG // '#=GC' lines would be in generic meta string objects in the alignment, while '#=GR' tags would be in similar meta objects in the relevant sequences. As long as both aren't AnnotatableI this isn't an issue. Similarly, NEXUS files which contained any position-based values could hold a meta string/array object in a similar tag. The basic scheme is: |--String | Annotation::Meta----|--Array | |--HorriblyComplexDataStruct Then I started thinking about where this could be applied, and whether a true Meta object needs to be constrained only to describing position-based data. This somewhat relates to this bug: http://bugzilla.open-bio.org/show_bug.cgi?id=1825 which seems to need a simple but unconstrained hash-of-arrays-based meta object. Then my head appropriately exploded... Hope everything is going well at the hackathon! Looks like some interesting stuff coming out of it. chris From cjfields at uiuc.edu Fri Dec 15 12:49:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 11:49:45 -0600 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <1166158897.2569.335.camel@localhost.localdomain> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> Message-ID: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> On Dec 14, 2006, at 11:01 PM, Scott Cain wrote: > As much as I would like to take credit for this :-) Allen Day > wrote the > original code, and then Chris Fields tried to fix it so that it > actually > worked :-) I think it would be a good idea to have a validate_terms > option like Bio::FeatureIO::gff. > > Scott I did ?!? I committed a bug fix a while back: Revision 1.34 / (view) - annotate - [select for diffs] , Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields Branch: MAIN CVS Tags: branch-experimental Branch point for: branch-1-5-2 Changes since 1.33: +155 -33 lines Diff to previous 1.33 Bug 2026; Robert's enhancements To tell the truth I don't know if this is where the mandatory checks were added in; I'm not too familiar with SeqFeature::Annotation yet. I agree with Scott (and Matthew) that SOFA checks should be optional. Matthew, can you write up a patch and maybe some tests? chris From stewarta at nmrc.navy.mil Thu Dec 14 18:30:11 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 18:30:11 -0500 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown Message-ID: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> I'm getting the following exception... ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ SearchIO/blast.pm:1172 STACK: main::process_reports ./new_blast_script.pl:254 STACK: ./new_blast_script.pl:132 ----------------------------------------------------------- next_result is a pretty dense chunk of code to decipher. I was wondering if anyone more familiar with that code might know what the "no data for midline $_" exception is referring to? For context: 1161 if( /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+ (\-?\d+)/ ) { 1162 my ($full,$type,$start,$str,$end) = ($1, $2,$3,$4,$5); 1163 if( $str eq '-' ) { 1164 $i = 3 if $type eq 'Sbjct'; 1165 } else { 1166 $data{$type} = $str; 1167 } 1168 $len = length($full); 1169 $self->{"\_$type"}->{'begin'} = $start unless $self->{"_$type"}->{'begin'}; 1170 $self->{"\_$type"}->{'end'} = $end; 1171 } else { 1172 $self->throw("no data for midline $_") 1173 unless (defined $_ && defined $len); 1174 $data{'Mid'} = substr($_,$len); 1175 } -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From jason at bioperl.org Fri Dec 15 13:56:13 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 13:56:13 -0500 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown In-Reply-To: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> Message-ID: It means it is expecting alignment block of data and there is none (or there is none in the context it is expecting it) - so something is wrong with the report as it gets tripped up. I'm not sure reading the code is going to help you - what someone will have to do is figure out what is different about this report than reports that do work for the parser. You'll do better if you just provide an example report that is failing as a bug report. Providing the version of BLAST you are using and version of bioperl will help. I seem to remember NCBI changing the BLAST text format so that will break the parser if it is a significant change. As has been mentioned in the past, this playing cat and mouse with format changes means things will periodically break. If you need rock- solid always going to work, I guess the XML is better route to go. -jason On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote: > I'm getting the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ > SearchIO/blast.pm:1172 > STACK: main::process_reports ./new_blast_script.pl:254 > STACK: ./new_blast_script.pl:132 > ----------------------------------------------------------- > > > next_result is a pretty dense chunk of code to decipher. I was > wondering if anyone more familiar with that code might know what the > "no data for midline $_" exception is referring to? > > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Dec 15 14:21:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 13:21:32 -0600 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown In-Reply-To: References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> Message-ID: <6A0D17FA-CB98-4937-998E-11B87FB9CBBD@uiuc.edu> On Dec 15, 2006, at 12:56 PM, Jason Stajich wrote: > It means it is expecting alignment block of data and there is none > (or there is none in the context it is expecting it) - so something > is wrong with the report as it gets tripped up. > > I'm not sure reading the code is going to help you - what someone > will have to do is figure out what is different about this report > than reports that do work for the parser. > You'll do better if you just provide an example report that is > failing as a bug report. > > Providing the version of BLAST you are using and version of bioperl > will help. I seem to remember NCBI changing the BLAST text format so > that will break the parser if it is a significant change. > > As has been mentioned in the past, this playing cat and mouse with > format changes means things will periodically break. If you need rock- > solid always going to work, I guess the XML is better route to go. > > -jason I agree that XML is the only reliable way to go, though I have been reading on the BioPython group about some issues with newer (2.2.13 or greater) BLAST XML output when reports with multiple BLAST queries. Don't know if this affects Bioperl or not. As for the 'midline' error, there was a similar error a while back (fixed for the 1.5.2 release) that had to do with extra lines in the alignment section in some BLAST reports. Unless we have a demo BLAST report and sample code we can't do much about it (we need to reproduce the error in order to fix it), so the best thing to do it file a bug report. chris > On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote: > >> I'm getting the following exception... >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: >> 328 >> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ >> SearchIO/blast.pm:1172 >> STACK: main::process_reports ./new_blast_script.pl:254 >> STACK: ./new_blast_script.pl:132 >> ----------------------------------------------------------- >> >> >> next_result is a pretty dense chunk of code to decipher. I was >> wondering if anyone more familiar with that code might know what the >> "no data for midline $_" exception is referring to? >> >> >> -- >> Andrew Stewart >> Research Assistant, Genomics Team >> Navy Medical Research Center (NMRC) >> Biological Defense Research Directorate (BDRD) >> BDRD Annex >> 12300 Washington Avenue, 2nd Floor >> Rockville, MD 20852 >> >> email: stewarta at nmrc.navy.mil >> phone: 301-231-6700 Ext 270 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From vaughn at cshl.edu Fri Dec 15 13:05:47 2006 From: vaughn at cshl.edu (Matthew Vaughn) Date: Fri, 15 Dec 2006 13:05:47 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> Message-ID: Yes, I will. I am working on it today. It's a little more complicated to fix this than I expected because SeqFeature::Annotation->type() returns a Bio::AnnotationI rather than a simple scalar like it used to. On 12/15/06, Chris Fields wrote: > On Dec 14, 2006, at 11:01 PM, Scott Cain wrote: > > > As much as I would like to take credit for this :-) Allen Day > > wrote the > > original code, and then Chris Fields tried to fix it so that it > > actually > > worked :-) I think it would be a good idea to have a validate_terms > > option like Bio::FeatureIO::gff. > > > > Scott > > I did ?!? I committed a bug fix a while back: > > Revision 1.34 / (view) - annotate - [select for diffs] , > Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields > Branch: MAIN > CVS Tags: branch-experimental > Branch point for: branch-1-5-2 > Changes since 1.33: +155 -33 lines > Diff to previous 1.33 > > Bug 2026; Robert's enhancements > > To tell the truth I don't know if this is where the mandatory checks > were added in; I'm not too familiar with SeqFeature::Annotation yet. > > I agree with Scott (and Matthew) that SOFA checks should be > optional. Matthew, can you write up a patch and maybe some tests? > > chris > > > > From valiente at lsi.upc.edu Fri Dec 15 19:45:27 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Sat, 16 Dec 2006 01:45:27 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4577EFD3.7090904@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> Message-ID: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> > I don't think that can be true. Your error message contains 'Must > supply > a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). > > If you uninstall the fink installation and install 1.5.2 using cpan > (with root privileges by going sudo cpan) that should at least get > rid of the error messages... > > >> The tree is not correct (I've parsed it from R to have a double >> check) but don't know yet what the problem is with it. > > ... But if the tree is wrong anyway... Let me know what you find out. I've uninstalled the fink installation and used the cvs instead, and the error message is gone. However, on a larger set of 190 species, which are all present in the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, something must be wrong with the merge_lineage method in the major rewrite of the taxonomy2tree script. Can someone please check this? I'm attaching the 190 species call to the script. Thanks, Gabriel -------------- next part -------------- A non-text attachment was scrubbed... Name: fetch-bork.sh Type: application/octet-stream Size: 7378 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061216/5e392593/attachment.obj From lincoln.stein at gmail.com Fri Dec 15 11:02:27 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 15 Dec 2006 11:02:27 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> Message-ID: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> This is very embarassing for me, particularly since I spent a lot of time validating that Bio::Graphics was working properly before the 1.5.2 release went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release? Lincoln On 12/14/06, Lincoln Stein wrote: > > Hi All, > > I'm afraid that the xyplot glyph that is in the recent bioperl release has > an error that causes the content to be printed to the right of the correct > position. Unfortunately this wasn't caught before the release because the > glyph was only tested on very large (whole genome) features. > > You will need to do a CVS update to get a fixed version from bioperl-live. > A future bugfix release of gbrowse will patch this glyph for you > automatically. > > Lincoln > > On 12/12/06, Kara Dolinski wrote: > > > > Hi, > > I'm having a problem getting features and an xyplot properly aligned in > > Gbrowse. For example, see this page: > > > > http://tinyurl.com/ylbq3q > > > > The feature in the "CENPK SNPs" track should actually be around the peak > > of the graph in the "CENPK prediction signal" xyplot ie. the SNP > > feature is at position 79, and the xyplot axes and data should span from > > 61 - 95. However, as you can see, the data in the xyplot are oddly > > separated from the axes (which seem to be in the correct place), with the > > data shifted over to about position 120-155. > > This occurs elsewhere, not just at the ends of the chromosomes. > > > > When I zoom to ~80 bp, all is well, see: > > > > http://tinyurl.com/yzav8k > > > > The relevant snippets from the GFF and the config files are below. > > > > Thanks! > > Kara > > > > GFF: > > > > chrI SNPScanner > > CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score > > is 2.24506 > > chrI SNPScanner > > CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score > > is 3.26837 > > chrI SNPScanner > > CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score > > is 1.39938 > > chrI SNPScanner > > CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score > > is 1.4039 > > chrI SNPScanner > > CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score > > is 9.16134 > > chrI SNPScanner > > CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score > > is 10.1413 > > chrI SNPScanner > > CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score > > is 12.9256 > > chrI SNPScanner > > CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score > > is 13.195 > > chrI SNPScanner > > CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score > > is 22.7127 > > chrI SNPScanner > > CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score > > is 23.8289 > > chrI SNPScanner > > CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score > > is 21.9123 > > chrI SNPScanner > > CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score > > is 28.3344 > > chrI SNPScanner > > CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score > > is 35.0436 > > chrI SNPScanner > > CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score > > is 37.361 > > chrI SNPScanner > > CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score > > is 39.5408 > > chrI SNPScanner > > CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score > > is 28.2008 > > chrI SNPScanner > > CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score > > is 32.6254 > > chrI SNPScanner > > CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score > > is 36.0832 > > chrI SNPScanner > > CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score > > is 32.1205 > > chrI SNPScanner > > CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score > > is 41.3048 > > chrI SNPScanner > > CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score > > is 30.7975 > > chrI SNPScanner > > CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score > > is 29.4282 > > chrI SNPScanner > > CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score > > is 35.3586 > > chrI SNPScanner > > CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score > > is 34.1426 > > chrI SNPScanner > > CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score > > is 30.2966 > > chrI SNPScanner > > CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score > > is 17.8402 > > chrI SNPScanner > > CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score > > is 15.2637 > > chrI SNPScanner > > CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score > > is 12.657 > > chrI SNPScanner > > CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score > > is 10.2033 > > chrI SNPScanner > > CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score > > is 9.40143 > > chrI SNPScanner > > CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score > > is 6.56273 > > chrI SNPScanner > > CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score > > is 3.66211 > > chrI SNPScanner > > CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score > > is 0.394194 > > > > CONFIG: > > > > > > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} > > > > [CENPK_all_scores_graph] > > feature = GRAPH_CENPK:SNPScanner > > glyph = xyplot > > graph_type = boxes > > fgcolor = purple > > bgcolor = purple > > height = 100 > > min_score = 0 > > max_score = 110 > > label = 0 > > key = CENPK prediction signal > > link = > > category = SNPs: signal graphs > > > > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > > your > > opinions on IT & business topics through brief surveys - and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > > > _______________________________________________ > > Gmod-gbrowse mailing list > > Gmod-gbrowse at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > > > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Sat Dec 16 01:10:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 00:10:07 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> Message-ID: <70A5E333-8CF5-49D3-84AC-7A6A02791B5C@uiuc.edu> We could feasibly have regular point releases of the 1.5 dev. series for bug fixes; I guess it just depends on how often these should come out and what critical tests must pass for a release to go forward. Sendu's already done a ton of work towards getting BioPerl switched over to Module::Build and Test::More, and fixing bugs. As Hilmar has pointed out in the past, this is a developer's series, so not every test needs to pass before a release goes out. When would you like this to go out? chris On Dec 15, 2006, at 10:02 AM, Lincoln Stein wrote: > This is very embarassing for me, particularly since I spent a lot > of time > validating that Bio::Graphics was working properly before the 1.5.2 > release > went out. How long before there is a 1.5.3 release? How about a > 1.5.2.1release? > > Lincoln > > On 12/14/06, Lincoln Stein wrote: >> >> Hi All, >> >> I'm afraid that the xyplot glyph that is in the recent bioperl >> release has >> an error that causes the content to be printed to the right of the >> correct >> position. Unfortunately this wasn't caught before the release >> because the >> glyph was only tested on very large (whole genome) features. >> >> You will need to do a CVS update to get a fixed version from >> bioperl-live. >> A future bugfix release of gbrowse will patch this glyph for you >> automatically. >> >> Lincoln >> >> On 12/12/06, Kara Dolinski wrote: >>> >>> Hi, >>> I'm having a problem getting features and an xyplot properly >>> aligned in >>> Gbrowse. For example, see this page: >>> >>> http://tinyurl.com/ylbq3q >>> >>> The feature in the "CENPK SNPs" track should actually be around >>> the peak >>> of the graph in the "CENPK prediction signal" xyplot ie. the SNP >>> feature is at position 79, and the xyplot axes and data should >>> span from >>> 61 - 95. However, as you can see, the data in the xyplot are oddly >>> separated from the axes (which seem to be in the correct place), >>> with the >>> data shifted over to about position 120-155. >>> This occurs elsewhere, not just at the ends of the chromosomes. >>> >>> When I zoom to ~80 bp, all is well, see: >>> >>> http://tinyurl.com/yzav8k >>> >>> The relevant snippets from the GFF and the config files are below. >>> >>> Thanks! >>> Kara >>> >>> GFF: >>> >>> chrI SNPScanner >>> CENPK_GRAPH 61 95 41.9883 . . >>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_CALL 79 79 41.9883 . . >>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_SCORE 61 61 2.24506 . . >>> ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score >>> is 2.24506 >>> chrI SNPScanner >>> CENPK_SCORE 62 62 3.26837 . . >>> ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score >>> is 3.26837 >>> chrI SNPScanner >>> CENPK_SCORE 63 63 1.39938 . . >>> ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score >>> is 1.39938 >>> chrI SNPScanner >>> CENPK_SCORE 64 64 1.4039 . . >>> ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score >>> is 1.4039 >>> chrI SNPScanner >>> CENPK_SCORE 65 65 9.16134 . . >>> ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score >>> is 9.16134 >>> chrI SNPScanner >>> CENPK_SCORE 66 66 10.1413 . . >>> ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score >>> is 10.1413 >>> chrI SNPScanner >>> CENPK_SCORE 67 67 12.9256 . . >>> ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score >>> is 12.9256 >>> chrI SNPScanner >>> CENPK_SCORE 68 68 13.195 . . >>> ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score >>> is 13.195 >>> chrI SNPScanner >>> CENPK_SCORE 69 69 22.7127 . . >>> ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score >>> is 22.7127 >>> chrI SNPScanner >>> CENPK_SCORE 70 70 23.8289 . . >>> ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score >>> is 23.8289 >>> chrI SNPScanner >>> CENPK_SCORE 71 71 21.9123 . . >>> ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score >>> is 21.9123 >>> chrI SNPScanner >>> CENPK_SCORE 72 72 28.3344 . . >>> ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score >>> is 28.3344 >>> chrI SNPScanner >>> CENPK_SCORE 73 73 35.0436 . . >>> ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score >>> is 35.0436 >>> chrI SNPScanner >>> CENPK_SCORE 74 74 37.361 . . >>> ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score >>> is 37.361 >>> chrI SNPScanner >>> CENPK_SCORE 75 75 39.5408 . . >>> ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score >>> is 39.5408 >>> chrI SNPScanner >>> CENPK_SCORE 76 76 28.2008 . . >>> ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score >>> is 28.2008 >>> chrI SNPScanner >>> CENPK_SCORE 77 77 32.6254 . . >>> ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score >>> is 32.6254 >>> chrI SNPScanner >>> CENPK_SCORE 78 78 36.0832 . . >>> ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score >>> is 36.0832 >>> chrI SNPScanner >>> CENPK_SCORE 79 79 41.9883 . . >>> ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_SCORE 80 80 32.1205 . . >>> ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score >>> is 32.1205 >>> chrI SNPScanner >>> CENPK_SCORE 81 81 41.3048 . . >>> ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score >>> is 41.3048 >>> chrI SNPScanner >>> CENPK_SCORE 82 82 30.7975 . . >>> ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score >>> is 30.7975 >>> chrI SNPScanner >>> CENPK_SCORE 83 83 29.4282 . . >>> ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score >>> is 29.4282 >>> chrI SNPScanner >>> CENPK_SCORE 84 84 35.3586 . . >>> ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score >>> is 35.3586 >>> chrI SNPScanner >>> CENPK_SCORE 85 85 34.1426 . . >>> ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score >>> is 34.1426 >>> chrI SNPScanner >>> CENPK_SCORE 86 86 30.2966 . . >>> ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score >>> is 30.2966 >>> chrI SNPScanner >>> CENPK_SCORE 87 87 17.8402 . . >>> ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score >>> is 17.8402 >>> chrI SNPScanner >>> CENPK_SCORE 88 88 15.2637 . . >>> ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score >>> is 15.2637 >>> chrI SNPScanner >>> CENPK_SCORE 89 89 12.657 . . >>> ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score >>> is 12.657 >>> chrI SNPScanner >>> CENPK_SCORE 90 90 10.2033 . . >>> ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score >>> is 10.2033 >>> chrI SNPScanner >>> CENPK_SCORE 91 91 9.40143 . . >>> ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score >>> is 9.40143 >>> chrI SNPScanner >>> CENPK_SCORE 92 92 6.56273 . . >>> ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score >>> is 6.56273 >>> chrI SNPScanner >>> CENPK_SCORE 93 93 3.66211 . . >>> ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score >>> is 3.66211 >>> chrI SNPScanner >>> CENPK_SCORE 94 94 0.394194 . . >>> ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score >>> is 0.394194 >>> >>> CONFIG: >>> >>> >>> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} >>> >>> [CENPK_all_scores_graph] >>> feature = GRAPH_CENPK:SNPScanner >>> glyph = xyplot >>> graph_type = boxes >>> fgcolor = purple >>> bgcolor = purple >>> height = 100 >>> min_score = 0 >>> max_score = 110 >>> label = 0 >>> key = CENPK prediction signal >>> link = >>> category = SNPs: signal graphs >>> >>> >>> >>> -------------------------------------------------------------------- >>> ----- >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to >>> share >>> your >>> opinions on IT & business topics through brief surveys - and earn >>> cash >>> http://www.techsay.com/default.php? >>> page=join.php&p=sourceforge&CID=DEVDEV >>> >>> >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>> >>> >>> >> >> >> -- >> Lincoln D. Stein >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> FOR URGENT MESSAGES & SCHEDULING, >> PLEASE CONTACT MY ASSISTANT, >> SANDRA MICHELSEN, AT michelse at cshl.edu >> > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sat Dec 16 01:28:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 00:28:47 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: >> I don't think that can be true. Your error message contains 'Must >> supply >> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >> >> If you uninstall the fink installation and install 1.5.2 using >> cpan (with root privileges by going sudo cpan) that should at >> least get rid of the error messages... >> >> >>> The tree is not correct (I've parsed it from R to have a double >>> check) but don't know yet what the problem is with it. >> >> ... But if the tree is wrong anyway... Let me know what you find out. > > I've uninstalled the fink installation and used the cvs instead, > and the error message is gone. However, on a larger set of 190 > species, which are all present in the NCBI taxonomy, the resulting > tree has only 178 taxa. I suspect, something must be wrong with the > merge_lineage method in the major rewrite of the taxonomy2tree > script. Can someone please check this? I'm attaching the 190 > species call to the script. Thanks, > > Gabriel I can confirm that. It is definitely dropping them in merge_lineage (); if you add a call to get_leaf_nodes to check how many are present after each merge_lineage() call, you can see it dropping nodes along the trace. in taxonomy2tree.pl: my $ct; my ($treect, $mergect) = 0; for my $name (@species) { my $ncbi_id = $db->get_taxonid($name); if ($ncbi_id) { #print "Species: $name\n\tTaxID: $ncbi_id\n"; #$ids{$ncbi_id}++; my $node = $db->get_taxon(-taxonid => $ncbi_id); if ($tree) { $tree->merge_lineage($node); } else { $tree = Bio::Tree::Tree->new(-node => $node); } printf("%-3d: Nodes: %-4d\n",$ct,scalar($tree->get_leaf_nodes)); } else { warn "no NCBI Taxonomy node for species ",$name,"\n"; } $ct++; } chris From bix at sendu.me.uk Sat Dec 16 09:37:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 14:37:49 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> Message-ID: <458404BD.8030908@sendu.me.uk> Lincoln Stein wrote: > This is very embarassing for me, particularly since I spent a lot of time > validating that Bio::Graphics was working properly before the 1.5.2 release > went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release? I'm happy to try a point release for critical bug fixes. Why don't you commit the necessary fixes to branch-1-5-2 and let me know when you're happy, and I'll do 1.5.2.1. Cheers, Sendu. From bix at sendu.me.uk Sat Dec 16 09:47:57 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 14:47:57 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: <4584071D.3070005@sendu.me.uk> Gabriel Valiente wrote: >> I don't think that can be true. Your error message contains 'Must supply >> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >> >> If you uninstall the fink installation and install 1.5.2 using cpan >> (with root privileges by going sudo cpan) that should at least get rid >> of the error messages... >> >> >>> The tree is not correct (I've parsed it from R to have a double >>> check) but don't know yet what the problem is with it. >> >> ... But if the tree is wrong anyway... Let me know what you find out. > > I've uninstalled the fink installation and used the cvs instead, and the > error message is gone. However, on a larger set of 190 species, which > are all present in the NCBI taxonomy, the resulting tree has only 178 > taxa. I suspect, something must be wrong with the merge_lineage method > in the major rewrite of the taxonomy2tree script. Can someone please > check this? I'm attaching the 190 species call to the script. Thanks, Ok, I'll look into it. You're also welcome to see if you can take your own code from your original taxonomy2tree script and see if you can merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with your algorithms to get it working correctly. Indeed, does your original version of the script work on this data set? Cheers, Sendu. From cjfields at uiuc.edu Sat Dec 16 10:18:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 09:18:50 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4584071D.3070005@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4584071D.3070005@sendu.me.uk> Message-ID: <6AE33842-B2E7-4E9B-B80D-68A058045818@uiuc.edu> On Dec 16, 2006, at 8:47 AM, Sendu Bala wrote: > Gabriel Valiente wrote: >>> I don't think that can be true. Your error message contains 'Must >>> supply >>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >>> >>> If you uninstall the fink installation and install 1.5.2 using cpan >>> (with root privileges by going sudo cpan) that should at least >>> get rid >>> of the error messages... >>> >>> >>>> The tree is not correct (I've parsed it from R to have a double >>>> check) but don't know yet what the problem is with it. >>> >>> ... But if the tree is wrong anyway... Let me know what you find >>> out. >> >> I've uninstalled the fink installation and used the cvs instead, >> and the >> error message is gone. However, on a larger set of 190 species, which >> are all present in the NCBI taxonomy, the resulting tree has only 178 >> taxa. I suspect, something must be wrong with the merge_lineage >> method >> in the major rewrite of the taxonomy2tree script. Can someone please >> check this? I'm attaching the 190 species call to the script. Thanks, > > Ok, I'll look into it. You're also welcome to see if you can take your > own code from your original taxonomy2tree script and see if you can > merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with > your algorithms to get it working correctly. Indeed, does your > original > version of the script work on this data set? > > > Cheers, > Sendu. Sendu, Don't know if it helps, but when I tried Gabriel's shell script last night I ran a modification of taxonomy2tree to see what would pop up. Everything is fine up to about 100 iterations, then merge_lineage () starts dropping leaf nodes. chris From bix at sendu.me.uk Sat Dec 16 10:33:35 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 15:33:35 +0000 Subject: [Bioperl-l] NO BLAST In-Reply-To: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> References: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> Message-ID: <458411CF.8000707@sendu.me.uk> Luba Pardo wrote: > *Hello,* > *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;* > ** > *I got the following error message: cannot find path to blastall.* > *The code I used is (modified from HOWTObeginners): Bioperl doesn't know where you installed blast. If you've actually installed it, you can set the environment variable BLASTDIR to point to the directory that contains the blastall executable. From cain.cshl at gmail.com Fri Dec 15 13:09:48 2006 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 15 Dec 2006 13:09:48 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> Message-ID: <1166206188.2569.380.camel@localhost.localdomain> On Fri, 2006-12-15 at 11:49 -0600, Chris Fields wrote: > > To tell the truth I don't know if this is where the mandatory checks > were added in; I'm not too familiar with SeqFeature::Annotation yet. > > I agree with Scott (and Matthew) that SOFA checks should be > optional. Matthew, can you write up a patch and maybe some tests? > > chris > That's not where they were added in, it just that they hadn't been fully implemented before then, so they didn't work (which probably meant they weren't mandatory, though I don't remember (it could be that it just croaked)). Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/b248a096/attachment.bin From hlapp at gmx.net Sun Dec 17 01:02:04 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 17 Dec 2006 01:02:04 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <458404BD.8030908@sendu.me.uk> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> Message-ID: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: > Lincoln Stein wrote: >> This is very embarassing for me, particularly since I spent a lot >> of time >> validating that Bio::Graphics was working properly before the >> 1.5.2 release >> went out. How long before there is a 1.5.3 release? How about a >> 1.5.2.1release? > > I'm happy to try a point release for critical bug fixes. Why don't you > commit the necessary fixes to branch-1-5-2 and let me know when you're > happy, and I'll do 1.5.2.1. Feel free to do that, but why not make a 1.5.3 off the main trunk? 1.5.2.1 may be adding more to the version confusion (developer/stable/ point-release/etc) than it is worth, and there is no shame in releasing new developer versions every few weeks. My $0.02 ... -hilmar > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From fgarret at ub.edu Mon Dec 18 07:07:02 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 13:07:02 +0100 Subject: [Bioperl-l] codeml Message-ID: <45868466.508@ub.edu> Hi all, I've been using bioperl's PAML module (specifically the codeml part) but with just one tree. Since the program accepts several trees as input (and runs the analysis for each tree outputting the difference in likelihoods for each one) I was wondering if there's some way to do it through bioperl? thanks in adv, FG From heikki at sanbi.ac.za Mon Dec 18 08:51:50 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 18 Dec 2006 15:51:50 +0200 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> Message-ID: <200612181551.51277.heikki@sanbi.ac.za> Reading the discussion, I think it is time to draw some guidelines. 1. Base the Meta implementation to a real use cases. MSA is a good example. 2. Allow generalisations If you can see an other implementation of the same idea that can be merged with the first do it but do not hurt yourself if you can not. The most difficult question is how to separate case-specific attributes that are best implemented by subclassing with additional methods from truly widely variable meta data that is best done as a parallel track meta information holding class. The problem I see with undefined, totally open meta annotation, is that if you can put anything in there, it is also totally confusing to a user. If you can put anything in, how do you know what to get get out and know that it is there? That leads to the the third guideline: 3. Use separate meta classes only when there are several different ways of encoding data that is present in large numbers *and* when you are expecting to be assessing the data computationally rather than just checking if an attribute is there. -Heikki On Friday 15 December 2006 19:23, Chris Fields wrote: > On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: > > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > >> On Dec 14, 2006, at 7:45 PM, David Messina wrote: > >>> Hey Chris, > >>> > >>> My thoughts below. > >>> > >>>> [Chris] > >>>> This could be used to annotate any > >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- > >>>> you, > >>>> maybe in a collection (similar to AnnotationCollection). I thought > >>>> something like this may be of general use for any PrimarySeq > >>>> (quality, structure), alignments like NEXUS and Stockholm, > >>>> SeqFeatures where structure could be stored (tRNA or riboswitches), > >>>> etc. > >>>> > >>>> However, this also seems to fall into the category of sequence > >>>> annotation. So, would it be better to have a set of > >>>> Bio::Annotation > >>>> classes used for this purpose? > >>> > >>> To me, all meta data is equal. That is, your classic Genbank feature > >>> annotation and a user's arbitrary meta-tag like "Bob thinks this > >>> is a > >>> kinase domain" aren't different in kind even if they are > >>> different in > >>> content. > >>> > >>> As resequencing projects multiply, the ability to create arbitrary > >>> meta tags, attach them to different types of objects, and use those > >>> tags to link them together will become desirable, if not essential. > >>> > >>> Keeping a common interface to all of these meta data types would be > >>> advantageous, plus new users won't have to determine whether they > >>> need to use Bio::Meta objects or Bio::Annotation objects. > >>> > >>> So I would argue for all of the meta data types to live "under one > >>> roof". Which roof isn't as important. Bio::Annotation, since it > >>> already exists for today's meta data, seems like a reasonable > >>> choice. > >>> (assuming Annotation objects are flexible enough to be extended as > >>> you propose) > >>> > >>> There, and no flames or jibes even. :) > >> > >> I guess what I want to know is whether there should to be a > >> distinction between 'normal' sequence annotation (comments, > >> references, and so on) and annotation that could be best described as > >> position-specific (like RNA or protein structural annotation). The > >> current meta implementation is for sequence data only; I felt it > >> would be nice to have a generic implementation that would be > >> applicable to any object data. > > > > my stream-of-consciousness for right now: > > > > I was thinking Bio::Annotation is where this should go - that > > system doesn't have anything about it that makes it explicitly > > sequence related. What we're trying to hammer out here on the > > Alignment side - which fits with your RNA example - is have > > features, basically SeqFeatures - associated with alignments so > > columns can be annotated to cover things like character sets and > > partitions for phylogenetic analyses. As for data which annotates > > non-contiguous things like RNAstems we may have to be more > > creative about that or model it with a splitLocation. > > > > So currently we've added code so that an Alignment is-a > > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this > > end, with the goal of being able to capture more of the data that > > can be represented in a NEXUS file. > > > > It feels more like a hack than an elegant Meta-data solution, but I > > am totally sure whether the data you are thinking about doing at > > this point, perhaps I need to spend more time thinking about it. > > Or are you worried about the idea of whether the semantic mapping > > of the data into features or annotations is confusing users? > > Sorry in advance for the longish response here... > > My original thought was to have a generic abstract class capable of > positionally describing data in any another class, similar to > Heikki's Bio::Seq::MetaI but not constrained to sequence data only. > Implementing classes would be capable of having different data > structures based on their use (simple string, array, AoA, AoH, AoO). > One MetaCollection class to contain them all in a tag-like system, so > you could have mixed data types describe the same object. The latter > Collection class is so similar to AnnotationCollection that I agree > Bio::Annotation would be the best place for this. > > The way I reconfigured Stockholm alignment parsing/writing is to use > Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is > capable of holding a sequence and several meta strings, stored as > tags or 'names'. However, there is no Meta object for alignments > (for RNA/protein structure consensus and other Rfam/Pfam markup); I > hacked around this by using a Bio::Seq::Meta w/o a seq, but I would > rather have a generic Meta object independent of the sequence cruft. > > So for this partial Pfam alignment, > > Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG > #=GR Q92SV1_RHIME/122-299 pAS ......................... > Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS > Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG > #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT > #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 > #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT > #=GC SA_cons 03002200312...1312414..676 > #=GC seq_cons luhhLuhsRpl...hthppth..+pG > // > > '#=GC' lines would be in generic meta string objects in the > alignment, while '#=GR' tags would be in similar meta objects in the > relevant sequences. As long as both aren't AnnotatableI this isn't > an issue. > > Similarly, NEXUS files which contained any position-based values > could hold a meta string/array object in a similar tag. > > The basic scheme is: > |--String > > Annotation::Meta----|--Array > > |--HorriblyComplexDataStruct > > Then I started thinking about where this could be applied, and > whether a true Meta object needs to be constrained only to describing > position-based data. This somewhat relates to this bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=1825 > > which seems to need a simple but unconstrained hash-of-arrays-based > meta object. > > Then my head appropriately exploded... > > Hope everything is going well at the hackathon! Looks like some > interesting stuff coming out of it. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From fgarret at ub.edu Mon Dec 18 11:18:31 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 17:18:31 +0100 Subject: [Bioperl-l] PAML files Message-ID: <4586BF57.4090002@ub.edu> Hi all, does anyone knows how to get the name of the .ctl file created by the PAML module? Inside the tmp directory there are 2 files with random names (tree and ctl). Why do they have random names?? Wouldn't it be easier to assign them a fixed name?? For instance "codeml.ctl" and "tree.nwk"?? thanks in adv, FG From bix at sendu.me.uk Mon Dec 18 11:15:21 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 16:15:21 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> Message-ID: <4586BE99.7020308@sendu.me.uk> Hilmar Lapp wrote: > > On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: > >> Lincoln Stein wrote: >>> This is very embarassing for me, particularly since I spent a lot >>> of time validating that Bio::Graphics was working properly before >>> the 1.5.2 release went out. How long before there is a 1.5.3 >>> release? How about a 1.5.2.1release? >> >> I'm happy to try a point release for critical bug fixes. Why don't >> you commit the necessary fixes to branch-1-5-2 and let me know when >> you're happy, and I'll do 1.5.2.1. > > Feel free to do that, but why not make a 1.5.3 off the main trunk? > 1.5.2.1 may be adding more to the version confusion > (developer/stable/point-release/etc) than it is worth, My feeling is that 1.5.3 should be reserved for some significant changes and new features, and not just a few bug fixes. I'd say this causes less confusion amongst users - they can associate '1.5.2' with a certain API and feature-set, and the specific name of the file they download and install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't matter at all to them. I also won't have to make some major announcement about it; it will simply be the most recent developer version of bioperl available so new users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing 1.5.2 users will only feel compelled to get it if they suffer from the bugs fixed. > and there is no shame in releasing new developer versions every few > weeks. I think doing frequent releases are inadvisable; such a quick release won't have had much testing so we shouldn't encourage people to install it: encouragement is implicit when a major new version comes out like 1.5.3. People who want to live on the edge can and should be using a CVS checkout. From bix at sendu.me.uk Mon Dec 18 14:15:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 19:15:16 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: <4586E8C4.6030306@sendu.me.uk> Chris Fields wrote: > On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: > >> However, on a larger set of 190 species, which are all present in >> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, >> something must be wrong with the merge_lineage method in the major >> rewrite of the taxonomy2tree script. Can someone please check this? >> I'm attaching the 190 species call to the script. Thanks, >> >> Gabriel > > I can confirm that. It is definitely dropping them in merge_lineage > (); if you add a call to get_leaf_nodes to check how many are > present after each merge_lineage() call, you can see it dropping > nodes along the trace. I confirm the 'dropped' nodes, but also claim that this is no bug. For example, the first 'drop' happens for the 101st species which is 'Leptospira interrogans serovar Copenhageni'. This is a variation (descendant) of species 24: 'Leptospira interrogans'. So when the variation is added it becomes a leaf and 'Leptospira interrogans' is no longer a leaf, so the overall number of leaves does not increase. The next drop is for species 103 'Prochlorococcus marinus subsp. pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'. Same deal. I didn't check any others, but suspect the same issue arises in all cases. Gabriel, please confirm this isn't a bug, or suggest how you propose to see your taxa when they are not all leaves of the tree. PS. I changed the merge_lineage() algorithm to be 18x faster (from the absurd 3mins for making the 190 species tree to a more reasonable 10s), without changing the tree produced. From fgarret at ub.edu Mon Dec 18 15:01:38 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 21:01:38 +0100 Subject: [Bioperl-l] PAML files In-Reply-To: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> References: <4586BF57.4090002@ub.edu> <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> Message-ID: <4586F3A2.4010607@ub.edu> Hi Jason, This question is related with the one I made previously today. I need to run codeml with 3 tree topologies. I looked on codeml module but it only accepts one tree as input so I thought of using the codeml module to prepare all the files and then I would just have to run the codeml with the new tree file in batch. But for that I need to know which one is the ctl file. any idea? FG Jason Stajich wrote: > They are temporary names so they are deliberately random and there is no > intention of you needing them after a run since it to be cleaned up on > the fly. We use an internal method for generating tempfiles that takes > care of cleanup afterwards. I suppose since we do all the work within a > temp directory that is cleaned up, one could have a fixed name for the > tree, alignment, and ctl files but honestly we never expect people to be > reading these filenames as they are intended to be transient. > > What problem are you having that you need the filename? > > -jason > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > >> Hi all, >> >> does anyone knows how to get the name of the .ctl file created by the >> PAML module? Inside the tmp directory there are 2 files with random >> names (tree and ctl). Why do they have random names?? Wouldn't it be >> easier to assign them a fixed name?? For instance "codeml.ctl" and >> "tree.nwk"?? >> >> thanks in adv, >> FG >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > From fgarret at ub.edu Mon Dec 18 15:07:46 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 21:07:46 +0100 Subject: [Bioperl-l] codeml In-Reply-To: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> References: <45868466.508@ub.edu> <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> Message-ID: <4586F512.1030209@ub.edu> Right now it's impossible for me to write it. By February or March I should have more time but I'll let you know. FG Jason Stajich wrote: > This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I > guess we'll need to allow the -tree option to accept and arrayref of trees. > Are you willing to try write this patch? It should be added as a > bug/feature request to bugzilla so it can be corrected in short order. > > -jason > On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote: > >> Hi all, >> >> I've been using bioperl's PAML module (specifically the codeml part) but >> with just one tree. >> >> Since the program accepts several trees as input (and runs the analysis >> for each tree outputting the difference in likelihoods for each one) I >> was wondering if there's some way to do it through bioperl? >> >> thanks in adv, >> FG >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > > From cjfields at uiuc.edu Mon Dec 18 15:55:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 14:55:55 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4586E8C4.6030306@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> Message-ID: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: >> >>> However, on a larger set of 190 species, which are all present in >>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, >>> something must be wrong with the merge_lineage method in the major >>> rewrite of the taxonomy2tree script. Can someone please check this? >>> I'm attaching the 190 species call to the script. Thanks, >>> >>> Gabriel >> >> I can confirm that. It is definitely dropping them in merge_lineage >> (); if you add a call to get_leaf_nodes to check how many are >> present after each merge_lineage() call, you can see it dropping >> nodes along the trace. > > I confirm the 'dropped' nodes, but also claim that this is no bug. > > For example, the first 'drop' happens for the 101st species which is > 'Leptospira interrogans serovar Copenhageni'. This is a variation > (descendant) of species 24: 'Leptospira interrogans'. So when the > variation is added it becomes a leaf and 'Leptospira interrogans' > is no > longer a leaf, so the overall number of leaves does not increase. > > The next drop is for species 103 'Prochlorococcus marinus subsp. > pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'. > Same deal. I didn't check any others, but suspect the same issue > arises > in all cases. Makes sense now. I personally would consider this a bug since the results are unexpected (so the docs need to be modified in order to clarify). Some say tomato... I suppose this is one of the issues one might run into when using NCBI taxonomy to build trees. > Gabriel, please confirm this isn't a bug, or suggest how you > propose to > see your taxa when they are not all leaves of the tree. Having the nodes appear internally seems semantically correct to me. Is there any other way? > PS. I changed the merge_lineage() algorithm to be 18x faster (from the > absurd 3mins for making the 190 species tree to a more reasonable > 10s), > without changing the tree produced. Definitely an improvement! chris From jason at bioperl.org Mon Dec 18 14:33:32 2006 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Dec 2006 14:33:32 -0500 Subject: [Bioperl-l] PAML files In-Reply-To: <4586BF57.4090002@ub.edu> References: <4586BF57.4090002@ub.edu> Message-ID: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> They are temporary names so they are deliberately random and there is no intention of you needing them after a run since it to be cleaned up on the fly. We use an internal method for generating tempfiles that takes care of cleanup afterwards. I suppose since we do all the work within a temp directory that is cleaned up, one could have a fixed name for the tree, alignment, and ctl files but honestly we never expect people to be reading these filenames as they are intended to be transient. What problem are you having that you need the filename? -jason On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > Hi all, > > does anyone knows how to get the name of the .ctl file created by the > PAML module? Inside the tmp directory there are 2 files with random > names (tree and ctl). Why do they have random names?? Wouldn't it be > easier to assign them a fixed name?? For instance "codeml.ctl" and > "tree.nwk"?? > > thanks in adv, > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjm at fruitfly.org Mon Dec 18 16:50:00 2006 From: cjm at fruitfly.org (Chris Mungall) Date: Mon, 18 Dec 2006 13:50:00 -0800 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> Message-ID: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> I agree with everything Heikki is saying, I just wanted to highlight one paragraph: > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? One solution is to give your annotation/metadata-model formal computational semantics and use ontologies to give additional semantics to your metadata tags. This provides both user information in the form of documentation, and a means of specifying to the computer exactly what should be done with the tags. This is probably overkill for bioperl; but if the use cases being proposed do lean in the direction of a new metadata system that is not necessarily backwards compatible with the existing one, then I'd recommend checking out what's already out there before re-inventing the wheel. Perl RDF libraries are getting a little better. If anyone is interested in pursuing this sort of thing (probably on a branch), let me know On Dec 18, 2006, at 5:51 AM, Heikki Lehvaslaiho wrote: > > Reading the discussion, I think it is time to draw some guidelines. > > 1. Base the Meta implementation to a real use cases. > > MSA is a good example. > > 2. Allow generalisations > > If you can see an other implementation of the same idea that can > be merged > with the first do it but do not hurt yourself if you can not. > > > The most difficult question is how to separate case-specific > attributes that > are best implemented by subclassing with additional methods from > truly widely > variable meta data that is best done as a parallel track meta > information > holding class. > > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? > > That leads to the the third guideline: > > 3. Use separate meta classes only when there are several different > ways of > encoding data that is present in large numbers *and* when you are > expecting > to be assessing the data computationally rather than just checking > if an > attribute is there. > > > -Heikki > > > > On Friday 15 December 2006 19:23, Chris Fields wrote: >> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: >>> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: >>>> On Dec 14, 2006, at 7:45 PM, David Messina wrote: >>>>> Hey Chris, >>>>> >>>>> My thoughts below. >>>>> >>>>>> [Chris] >>>>>> This could be used to annotate any >>>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- >>>>>> you, >>>>>> maybe in a collection (similar to AnnotationCollection). I >>>>>> thought >>>>>> something like this may be of general use for any PrimarySeq >>>>>> (quality, structure), alignments like NEXUS and Stockholm, >>>>>> SeqFeatures where structure could be stored (tRNA or >>>>>> riboswitches), >>>>>> etc. >>>>>> >>>>>> However, this also seems to fall into the category of sequence >>>>>> annotation. So, would it be better to have a set of >>>>>> Bio::Annotation >>>>>> classes used for this purpose? >>>>> >>>>> To me, all meta data is equal. That is, your classic Genbank >>>>> feature >>>>> annotation and a user's arbitrary meta-tag like "Bob thinks this >>>>> is a >>>>> kinase domain" aren't different in kind even if they are >>>>> different in >>>>> content. >>>>> >>>>> As resequencing projects multiply, the ability to create arbitrary >>>>> meta tags, attach them to different types of objects, and use >>>>> those >>>>> tags to link them together will become desirable, if not >>>>> essential. >>>>> >>>>> Keeping a common interface to all of these meta data types >>>>> would be >>>>> advantageous, plus new users won't have to determine whether they >>>>> need to use Bio::Meta objects or Bio::Annotation objects. >>>>> >>>>> So I would argue for all of the meta data types to live "under one >>>>> roof". Which roof isn't as important. Bio::Annotation, since it >>>>> already exists for today's meta data, seems like a reasonable >>>>> choice. >>>>> (assuming Annotation objects are flexible enough to be extended as >>>>> you propose) >>>>> >>>>> There, and no flames or jibes even. :) >>>> >>>> I guess what I want to know is whether there should to be a >>>> distinction between 'normal' sequence annotation (comments, >>>> references, and so on) and annotation that could be best >>>> described as >>>> position-specific (like RNA or protein structural annotation). The >>>> current meta implementation is for sequence data only; I felt it >>>> would be nice to have a generic implementation that would be >>>> applicable to any object data. >>> >>> my stream-of-consciousness for right now: >>> >>> I was thinking Bio::Annotation is where this should go - that >>> system doesn't have anything about it that makes it explicitly >>> sequence related. What we're trying to hammer out here on the >>> Alignment side - which fits with your RNA example - is have >>> features, basically SeqFeatures - associated with alignments so >>> columns can be annotated to cover things like character sets and >>> partitions for phylogenetic analyses. As for data which annotates >>> non-contiguous things like RNAstems we may have to be more >>> creative about that or model it with a splitLocation. >>> >>> So currently we've added code so that an Alignment is-a >>> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this >>> end, with the goal of being able to capture more of the data that >>> can be represented in a NEXUS file. >>> >>> It feels more like a hack than an elegant Meta-data solution, but I >>> am totally sure whether the data you are thinking about doing at >>> this point, perhaps I need to spend more time thinking about it. >>> Or are you worried about the idea of whether the semantic mapping >>> of the data into features or annotations is confusing users? >> >> Sorry in advance for the longish response here... >> >> My original thought was to have a generic abstract class capable of >> positionally describing data in any another class, similar to >> Heikki's Bio::Seq::MetaI but not constrained to sequence data only. >> Implementing classes would be capable of having different data >> structures based on their use (simple string, array, AoA, AoH, AoO). >> One MetaCollection class to contain them all in a tag-like system, so >> you could have mixed data types describe the same object. The latter >> Collection class is so similar to AnnotationCollection that I agree >> Bio::Annotation would be the best place for this. >> >> The way I reconfigured Stockholm alignment parsing/writing is to use >> Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is >> capable of holding a sequence and several meta strings, stored as >> tags or 'names'. However, there is no Meta object for alignments >> (for RNA/protein structure consensus and other Rfam/Pfam markup); I >> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would >> rather have a generic Meta object independent of the sequence cruft. >> >> So for this partial Pfam alignment, >> >> Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG >> #=GR Q92SV1_RHIME/122-299 pAS ......................... >> Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS >> Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG >> #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT >> #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 >> #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT >> #=GC SA_cons 03002200312...1312414..676 >> #=GC seq_cons luhhLuhsRpl...hthppth..+pG >> // >> >> '#=GC' lines would be in generic meta string objects in the >> alignment, while '#=GR' tags would be in similar meta objects in the >> relevant sequences. As long as both aren't AnnotatableI this isn't >> an issue. >> >> Similarly, NEXUS files which contained any position-based values >> could hold a meta string/array object in a similar tag. >> >> The basic scheme is: >> |--String >> >> Annotation::Meta----|--Array >> >> |--HorriblyComplexDataStruct >> >> Then I started thinking about where this could be applied, and >> whether a true Meta object needs to be constrained only to describing >> position-based data. This somewhat relates to this bug: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1825 >> >> which seems to need a simple but unconstrained hash-of-arrays-based >> meta object. >> >> Then my head appropriately exploded... >> >> Hope everything is going well at the hackathon! Looks like some >> interesting stuff coming out of it. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Dec 18 14:35:50 2006 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Dec 2006 14:35:50 -0500 Subject: [Bioperl-l] codeml In-Reply-To: <45868466.508@ub.edu> References: <45868466.508@ub.edu> Message-ID: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I guess we'll need to allow the -tree option to accept and arrayref of trees. Are you willing to try write this patch? It should be added as a bug/ feature request to bugzilla so it can be corrected in short order. -jason On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote: > Hi all, > > I've been using bioperl's PAML module (specifically the codeml > part) but > with just one tree. > > Since the program accepts several trees as input (and runs the > analysis > for each tree outputting the difference in likelihoods for each one) I > was wondering if there's some way to do it through bioperl? > > thanks in adv, > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From gowthaman.ramasamy at sbri.org Mon Dec 18 17:19:09 2006 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Mon, 18 Dec 2006 14:19:09 -0800 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence Message-ID: Hi List, Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) Many thanks in advance, gowtham From cjfields at uiuc.edu Mon Dec 18 17:33:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 16:33:34 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> Message-ID: On Dec 18, 2006, at 7:51 AM, Heikki Lehvaslaiho wrote: > > Reading the discussion, I think it is time to draw some guidelines. > > 1. Base the Meta implementation to a real use cases. > > MSA is a good example. AlignIO::stockholm is where I'll initially test it out. > 2. Allow generalisations > > If you can see an other implementation of the same idea that can > be merged > with the first do it but do not hurt yourself if you can not. I agree. > The most difficult question is how to separate case-specific > attributes that > are best implemented by subclassing with additional methods from > truly widely > variable meta data that is best done as a parallel track meta > information > holding class. I would probably start with a general Bio::Annotation::MetaI abstract class, which supplements AnnotationI with general meta-specific methods (meta, meta_text, named_meta, etc)? Implement this in whatever way one wanted (RNA structure as strings, quality data as arrays, etc) under the constraints of the interface description. Multiple meta objects, potentially of mixed data types, could be added in an AnnotationCollection along with other Bio::Annotation data, or stored in a nested meta-specific AnnotationCollection object (I favor the former as it's simpler). So you could have an alignment, sequence, seqfeature (anything that is AnnotatableI) with a regular AnnotationCollection also containing possibly multiple meta objects, each meta object also containing possibly more than one set of meta data. The key issue I have is whether or not to constrain these to describing positional data, similar to Bio::Seq::Meta, by ensuring that the data is_flush(), etc. My current inclination is 'no', and to have a separate abstract class which describes these methods, implementing those separately. > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? > > That leads to the the third guideline: > > 3. Use separate meta classes only when there are several different > ways of > encoding data that is present in large numbers *and* when you are > expecting > to be assessing the data computationally rather than just checking > if an > attribute is there. > > > -Heikki The initial use case for this would be simple data strings for alignment data. I already have a partial implementation in place for stockholm using Bio::Seq::Meta (which led me to this proposal!). I like Chris M.'s idea of ensuring that meta implementations use some sort of formalized ontology, but I'll probably start out very simple and work up from there. chris From cjfields at uiuc.edu Mon Dec 18 17:38:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 16:38:14 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <4586BE99.7020308@sendu.me.uk> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> <4586BE99.7020308@sendu.me.uk> Message-ID: <6AD475AE-7F5E-4612-BC24-73B65AA47F30@uiuc.edu> On Dec 18, 2006, at 10:15 AM, Sendu Bala wrote: > Hilmar Lapp wrote: >> >> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: >> >>> Lincoln Stein wrote: >>>> This is very embarassing for me, particularly since I spent a lot >>>> of time validating that Bio::Graphics was working properly before >>>> the 1.5.2 release went out. How long before there is a 1.5.3 >>>> release? How about a 1.5.2.1release? >>> >>> I'm happy to try a point release for critical bug fixes. Why don't >>> you commit the necessary fixes to branch-1-5-2 and let me know when >>> you're happy, and I'll do 1.5.2.1. >> >> Feel free to do that, but why not make a 1.5.3 off the main trunk? >> 1.5.2.1 may be adding more to the version confusion >> (developer/stable/point-release/etc) than it is worth, > > My feeling is that 1.5.3 should be reserved for some significant > changes > and new features, and not just a few bug fixes. I'd say this causes > less > confusion amongst users - they can associate '1.5.2' with a certain > API > and feature-set, and the specific name of the file they download and > install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't > matter at all to them. > > I also won't have to make some major announcement about it; it will > simply be the most recent developer version of bioperl available so > new > users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing > 1.5.2 users will only feel compelled to get it if they suffer from the > bugs fixed. > > >> and there is no shame in releasing new developer versions every few >> weeks. > > I think doing frequent releases are inadvisable; such a quick release > won't have had much testing so we shouldn't encourage people to > install > it: encouragement is implicit when a major new version comes out like > 1.5.3. People who want to live on the edge can and should be using a > CVS checkout. I thought that 1.5.2 was considered a point release for the 1.5 dev series, for bug fixes along with the potential for added/experimental features. Similarly, 1.6.x releases would be point releases for bug fixes only with all tests passing (no added features since it is a stable release series). I guess one could reason that 1.5.x releases have both bug fixes and new features, while 1.5.x.y releases are simply bug fixes for the 1.5.x branch (no new features). We probably should add something to the FAQ and maybe make a few changes to the 1.5.2 wiki page. I think having a 1.5.2.1 release is feasible as a quick one-off to get Lincoln's fixes in, since you would make them off the 1.5.2 branch anyway (so I guess it could be considered a bug release from that branch). It's probably not something we should make a habit of, but then again I'm not the Pumpkin! chris From bix at sendu.me.uk Mon Dec 18 17:50:11 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 22:50:11 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> Message-ID: <45871B23.8070103@sendu.me.uk> Chris Fields wrote: > > On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: > >> For example, the first 'drop' happens for the 101st species which is >> 'Leptospira interrogans serovar Copenhageni'. This is a variation >> (descendant) of species 24: 'Leptospira interrogans'. So when the >> variation is added it becomes a leaf and 'Leptospira interrogans' is no >> longer a leaf, so the overall number of leaves does not increase. > > Makes sense now. I personally would consider this a bug since the > results are unexpected (so the docs need to be modified in order to > clarify). Some say tomato... > > I suppose this is one of the issues one might run into when using NCBI > taxonomy to build trees. No, the tree produced is perfectly fine. The taxonomy2tree.pl script deliberately then does: # simple paths are contracted by removing degree one nodes $tree->contract_linear_paths; Because that is what Gabriel's script originally did. >> Gabriel, please confirm this isn't a bug, or suggest how you propose to >> see your taxa when they are not all leaves of the tree. > > Having the nodes appear internally seems semantically correct to me. Is > there any other way? I suppose if we want to see all the input species output again we have to make contract_linear_paths() aware of nodes we want to keep, even when they are degree one nodes. Gabriel, is that what you want to see? From cjfields at uiuc.edu Mon Dec 18 18:14:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 17:14:23 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <45871B23.8070103@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> Message-ID: On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: >>> For example, the first 'drop' happens for the 101st species which is >>> 'Leptospira interrogans serovar Copenhageni'. This is a variation >>> (descendant) of species 24: 'Leptospira interrogans'. So when the >>> variation is added it becomes a leaf and 'Leptospira interrogans' >>> is no >>> longer a leaf, so the overall number of leaves does not increase. >> >> Makes sense now. I personally would consider this a bug since the >> results are unexpected (so the docs need to be modified in order >> to clarify). Some say tomato... >> I suppose this is one of the issues one might run into when using >> NCBI taxonomy to build trees. > > No, the tree produced is perfectly fine. The taxonomy2tree.pl > script deliberately then does: > > # simple paths are contracted by removing degree one nodes > $tree->contract_linear_paths; > > Because that is what Gabriel's script originally did. I think you misunderstood me. The tree is fine; the data used to make the tree (NCBI taxonomy) is the issue. One of the clear caveats that NCBI attaches to their taxonomy data is that should not be the 'primary source for taxonomic or phylogenetic information': http://tinyurl.com/y3k624 I think it works as a good guide as long as one takes the above into consideration. That and the fact that not all taxids attached to sequence data will represent leaf nodes. chris From cjfields at uiuc.edu Mon Dec 18 18:15:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 17:15:56 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> Message-ID: <16D6DB51-C2CB-4E89-A597-4672FAA6681B@uiuc.edu> On Dec 18, 2006, at 3:50 PM, Chris Mungall wrote: > > I agree with everything Heikki is saying, I just wanted to highlight > one paragraph: > >> The problem I see with undefined, totally open meta annotation, is >> that if you >> can put anything in there, it is also totally confusing to a user. >> If you can >> put anything in, how do you know what to get get out and know that >> it is >> there? > > One solution is to give your annotation/metadata-model formal > computational semantics and use ontologies to give additional > semantics to your metadata tags. This provides both user information > in the form of documentation, and a means of specifying to the > computer exactly what should be done with the tags. > > This is probably overkill for bioperl; but if the use cases being > proposed do lean in the direction of a new metadata system that is > not necessarily backwards compatible with the existing one, then I'd > recommend checking out what's already out there before re-inventing > the wheel. Perl RDF libraries are getting a little better. > > If anyone is interested in pursuing this sort of thing (probably on a > branch), let me know ... I like the idea of of using ontologies (although that's one of my many weak points!). I'll likely start off with simple examples using meta data initially, then progress from there. It is a developer series, after all! Thanks everybody! I think I have an idea on how to at least get started. chris From bix at sendu.me.uk Mon Dec 18 18:27:15 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 23:27:15 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> Message-ID: <458723D3.4010908@sendu.me.uk> Chris Fields wrote: > > On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: >>>> For example, the first 'drop' happens for the 101st species which is >>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation >>>> (descendant) of species 24: 'Leptospira interrogans'. So when the >>>> variation is added it becomes a leaf and 'Leptospira interrogans' is no >>>> longer a leaf, so the overall number of leaves does not increase. >>> >>> Makes sense now. I personally would consider this a bug since the >>> results are unexpected (so the docs need to be modified in order to >>> clarify). Some say tomato... >>> I suppose this is one of the issues one might run into when using >>> NCBI taxonomy to build trees. >> >> No, the tree produced is perfectly fine. The taxonomy2tree.pl script >> deliberately then does: >> >> # simple paths are contracted by removing degree one nodes >> $tree->contract_linear_paths; >> >> Because that is what Gabriel's script originally did. > > I think you misunderstood me. The tree is fine; the data used to make > the tree (NCBI taxonomy) is the issue. In what way is it the issue? The database is also fine as far as I can see, in so far as it is not causing any problems in this instance. Gabriel asked for a tree featuring a species and its subspecies. The NCBI taxonomy database provided Bioperl the correct data to build such a tree. Then Gabriel asked to remove the degree one nodes of his tree. His problem was that doing that happened to (correctly) remove the species node. If he wants to see both his species and his subspecies he must either not remove degree one nodes, or alter the method of doing so to keep desired nodes. There is no possible way for NCBI to improve matters here. From bix at sendu.me.uk Mon Dec 18 18:45:59 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 23:45:59 +0000 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <45872837.6050403@sendu.me.uk> Gowthaman Ramasamy wrote: > Hi List, Is there any module in bioperl which can find out the primer > binding sites in a genomic sequence. I am interested in finding > locations with few mismatches along the primer...not just the exact > match (which is a very trivial task) There's no module dedicated to that task, but Bioperl may help you to answer the question. Probably the easiest/reliable/clear thing to do is to do a Blast with appropriate settings for short sequence with few mismatches. You can write a script to only consider hits for your forward primer that are a 'primable' distance from a hit to your reverse primer (and check their orientations are correct as well). Or use some e-pcr tool. From Kevin.M.Brown at asu.edu Mon Dec 18 18:52:20 2006 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 18 Dec 2006 16:52:20 -0700 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence Message-ID: <1A4207F8295607498283FE9E93B775B40270F3BB@EX02.asurite.ad.asu.edu> A function I use to find the first landing site for a primer. Should be modifiable to handle multiple occurences: =head1 C Match searches for a near alignment between two strings and returns the position at which the two strings align. Match is based on 80% conformation match($this, $in_that) =cut sub match { my ($primer, $gene) = @_; my $start = 0; my $pattern = ""; for (my $i = 0 ; $i < length($primer) ; $i++) { $pattern .= substr($primer, $i, 1); pos($gene) = 0; if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($pattern) + 1; } else { $start = 0; chop($pattern); $pattern .= '.'; } } if ($pattern =~ /\.$/) { if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($pattern) + 1; } } $pattern =~ s/\.//g; if ((length($pattern) / length($primer)) > .8) { #print $start . "\n"; return $start; } return 0; } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, December 18, 2006 4:46 PM > To: Gowthaman Ramasamy > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] module to find out primer binding > sites in a genome sequence > > Gowthaman Ramasamy wrote: > > Hi List, Is there any module in bioperl which can find out > the primer > > binding sites in a genomic sequence. I am interested in finding > > locations with few mismatches along the primer...not just the exact > > match (which is a very trivial task) > > There's no module dedicated to that task, but Bioperl may help you to > answer the question. > > Probably the easiest/reliable/clear thing to do is to do a Blast with > appropriate settings for short sequence with few mismatches. You can > write a script to only consider hits for your forward primer > that are a > 'primable' distance from a hit to your reverse primer (and check their > orientations are correct as well). > > Or use some e-pcr tool. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From torsten.seemann at infotech.monash.edu.au Mon Dec 18 18:52:58 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 19 Dec 2006 10:52:58 +1100 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <458729DA.9030909@infotech.monash.edu.au> Gowthaman Ramasamy wrote: > Hi List, > Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. > I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) This FAQ question may help: http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F This software may help: http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From sdavis2 at mail.nih.gov Mon Dec 18 21:16:19 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 18 Dec 2006 21:16:19 -0500 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <45874B73.7010600@mail.nih.gov> Gowthaman Ramasamy wrote: > Hi List, > Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. > I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) > See here: http://genome.ucsc.edu/cgi-bin/hgPcr?command=start It is designed for exactly this task, is very fast, is available as an executable or web-based (though watch the usage requirements), and the output can be parsed rather easily. Sean From cjfields at uiuc.edu Mon Dec 18 21:30:04 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 20:30:04 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <458723D3.4010908@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> Message-ID: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> >> I think you misunderstood me. The tree is fine; the data used to >> make >> the tree (NCBI taxonomy) is the issue. > > In what way is it the issue? The database is also fine as far as I can > see, in so far as it is not causing any problems in this instance. I should maybe have clarified a bit more: what I said has nothing to do with the structure of the database itself. I was just pointing out that NCBI Taxonomy isn't the best source of data for building a phylogenetic tree, something NCBI also points out (the link in my last post). Not a big deal, really. > Gabriel asked for a tree featuring a species and its subspecies. The > NCBI taxonomy database provided Bioperl the correct data to build > such a > tree. Then Gabriel asked to remove the degree one nodes of his > tree. His > problem was that doing that happened to (correctly) remove the species > node. If he wants to see both his species and his subspecies he must > either not remove degree one nodes, or alter the method of doing so to > keep desired nodes. There is no possible way for NCBI to improve > matters > here. Actually, there isn't any way they could w/o digging through the literature in many cases. The problem is incomplete taxonomic information for nodes derived from older sequence data, where a genus and species was designated but nothing else (strain, etc) is known. Again, I merely was pointing out what I had mentioned above. I wasn't criticizing you, Gabriel, or the methodology here. Honest! chris From avilella at gmail.com Mon Dec 18 16:43:27 2006 From: avilella at gmail.com (Albert Vilella) Date: Mon, 18 Dec 2006 21:43:27 +0000 Subject: [Bioperl-l] PAML files In-Reply-To: <4586F3A2.4010607@ub.edu> References: <4586BF57.4090002@ub.edu> <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> <4586F3A2.4010607@ub.edu> Message-ID: <358f4d650612181343o5bd51169w7b46cceb34a5c92b@mail.gmail.com> Filipe, if you need to create the ctl file but not run the job, you can use the "prepare" method in Codeml run. Also, there is a tmpdir and save_tempfiles method that will keep the files where you want. About the naming, you can add a ".tree" and ".aln" extension to the tempnames if you want, by altering the $temptreefile and $tempseqfile variables in bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm (cvs head version). If you want, you can also add a couple of getters/setters there: sub alnfilename{ my $self = shift; return $self->{'alnfilename'} = shift if @_; return $self->{'alnfilename'}; } and subtitute those $tempseqfile io calls for you $self->{'alnfilename'} io calls. $codeml->alnfilename("/path/name"); $codeml->prepare; ... $codeml->run; What I use to do is to have the aln and tree files in a different place. Codeml will create the tmp files for running somewhere, and then delete all the stuff when done. Cheers, Albert. On 12/18/06, Filipe Garrett wrote: > > Hi Jason, > > This question is related with the one I made previously today. > I need to run codeml with 3 tree topologies. I looked on codeml module > but it only accepts one tree as input so I thought of using the codeml > module to prepare all the files and then I would just have to run the > codeml with the new tree file in batch. But for that I need to know > which one is the ctl file. > > any idea? > FG > > Jason Stajich wrote: > > They are temporary names so they are deliberately random and there is no > > intention of you needing them after a run since it to be cleaned up on > > the fly. We use an internal method for generating tempfiles that takes > > care of cleanup afterwards. I suppose since we do all the work within a > > temp directory that is cleaned up, one could have a fixed name for the > > tree, alignment, and ctl files but honestly we never expect people to be > > reading these filenames as they are intended to be transient. > > > > What problem are you having that you need the filename? > > > > -jason > > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > > > >> Hi all, > >> > >> does anyone knows how to get the name of the .ctl file created by the > >> PAML module? Inside the tmp directory there are 2 files with random > >> names (tree and ctl). Why do they have random names?? Wouldn't it be > >> easier to assign them a fixed name?? For instance "codeml.ctl" and > >> "tree.nwk"?? > >> > >> thanks in adv, > >> FG > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > http://jason.open-bio.org/ > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From valiente at lsi.upc.edu Mon Dec 18 23:18:20 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Tue, 19 Dec 2006 13:18:20 +0900 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> Message-ID: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> Thanks a lot for the prompt answer and follow-up discussion. I think this turned out not to be a bug in the merge_lineage() code but a direct consequence of building a phylogenetic tree instead of a taxonomic tree, aka with internal node labels. In order to reconstruct the NCBI taxonomy for the set of species present in a given phylogenetic tree, the only reasonable work-around seems to be a first step of merging lineages and contracting linear paths with the current implementation, followed by a second step of restricting the given phylogenetic tree to the set of species present in the obtained NCBI taxonomy. But this does not affect the code for merge_lineage(). Gabriel >>> I think you misunderstood me. The tree is fine; the data used to >>> make >>> the tree (NCBI taxonomy) is the issue. >> >> In what way is it the issue? The database is also fine as far as I >> can >> see, in so far as it is not causing any problems in this instance. > > I should maybe have clarified a bit more: what I said has nothing > to do with the structure of the database itself. I was just > pointing out that NCBI Taxonomy isn't the best source of data for > building a phylogenetic tree, something NCBI also points out (the > link in my last post). Not a big deal, really. > >> Gabriel asked for a tree featuring a species and its subspecies. The >> NCBI taxonomy database provided Bioperl the correct data to build >> such a >> tree. Then Gabriel asked to remove the degree one nodes of his >> tree. His >> problem was that doing that happened to (correctly) remove the >> species >> node. If he wants to see both his species and his subspecies he must >> either not remove degree one nodes, or alter the method of doing >> so to >> keep desired nodes. There is no possible way for NCBI to improve >> matters >> here. > > Actually, there isn't any way they could w/o digging through the > literature in many cases. The problem is incomplete taxonomic > information for nodes derived from older sequence data, where a > genus and species was designated but nothing else (strain, etc) is > known. > > Again, I merely was pointing out what I had mentioned above. I > wasn't criticizing you, Gabriel, or the methodology here. Honest! > > chris From cjfields at uiuc.edu Mon Dec 18 23:41:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 22:41:16 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> Message-ID: On Dec 18, 2006, at 10:18 PM, Gabriel Valiente wrote: > Thanks a lot for the prompt answer and follow-up discussion. I > think this turned out not to be a bug in the merge_lineage() code > but a direct consequence of building a phylogenetic tree instead of > a taxonomic tree, aka with internal node labels. > > In order to reconstruct the NCBI taxonomy for the set of species > present in a given phylogenetic tree, the only reasonable work- > around seems to be a first step of merging lineages and contracting > linear paths with the current implementation, followed by a second > step of restricting the given phylogenetic tree to the set of > species present in the obtained NCBI taxonomy. But this does not > affect the code for merge_lineage(). > > Gabriel I did notice one thing, though it's minor: if you use the option to retrieve the data from Entrez, a few species aren't found (even though they show up in a local taxonomy search). I think both were E. coli strains. chris From DGroskreutz at twt.com Tue Dec 19 02:00:40 2006 From: DGroskreutz at twt.com (DGroskreutz at twt.com) Date: Tue, 19 Dec 2006 01:00:40 -0600 Subject: [Bioperl-l] CN=Deb Groskreutz/OU=MSN/O=TWT is out of the office. Message-ID: I will be out of the office starting 12/18/2006 and will not return until 01/02/2007. NOTICE OF CONFIDENTIALITY: The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments. From michael.watson at bbsrc.ac.uk Tue Dec 19 07:20:56 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 19 Dec 2006 12:20:56 -0000 Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs? Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I'm using bioperl-1.4. I did do a google search fro this but couldn't find anything. If this is fixed in 1.5.2 then forgive me. I'm getting a warning: MSG: No whitespace allowed in FASTA ID [unknown id] When trying to convert from EMBL format to fasta. The offending sequence is CK234114: ID CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP. XX AC CK234114; XX DT 03-MAR-2004 (Rel. 79, Created) DT 03-MAR-2004 (Rel. 79, Last updated, Version 1) XX DE SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon cDNA DE Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA DE sequence. Etc Any advice? Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From michael.watson at bbsrc.ac.uk Tue Dec 19 07:27:59 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 19 Dec 2006 12:27:59 -0000 Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs? In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E682@iahce2ksrv1.iah.bbsrc.ac.uk> Sorry, problem solved. Mick -----Original Message----- From: michael watson (IAH-C) Sent: 19 December 2006 12:21 To: bioperl-l at lists.open-bio.org Subject: Problems with EMBL entries and fasta IDs? Hi I'm using bioperl-1.4. I did do a google search fro this but couldn't find anything. If this is fixed in 1.5.2 then forgive me. I'm getting a warning: MSG: No whitespace allowed in FASTA ID [unknown id] When trying to convert from EMBL format to fasta. The offending sequence is CK234114: ID CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP. XX AC CK234114; XX DT 03-MAR-2004 (Rel. 79, Created) DT 03-MAR-2004 (Rel. 79, Last updated, Version 1) XX DE SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon cDNA DE Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA DE sequence. Etc Any advice? Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From roest216 at student.otago.ac.nz Tue Dec 19 04:15:55 2006 From: roest216 at student.otago.ac.nz (Stephan Roessner) Date: Tue, 19 Dec 2006 22:15:55 +1300 Subject: [Bioperl-l] problems installing bioperl Message-ID: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> Dear support team, I installed bioperl 1.5.2_100 on a ferdora machine to be able to use gbrowse. The installation seems to work (except of the test failures) but the gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but of course it requires 1.52. Is there a chance to find out what went wrong? thanks a lot, Stephan From bix at sendu.me.uk Tue Dec 19 10:12:39 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 15:12:39 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> Message-ID: <45880167.9010605@sendu.me.uk> Stephan Roessner wrote: > Dear support team, > > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use > gbrowse. > The installation seems to work (except of the test failures) but the > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but > of course it requires 1.52. > > Is there a chance to find out what went wrong? Nothing went wrong with the Bioperl installation (well, expect there shouldn't have been any test failures - can you post those please?). gbrowse simply defined its Bioperl requirement incorrectly. If you tell me exactly where you downloaded gbrowse from and how you went about installing it, and provide the exact, complete error message you got from it, I might be able help the authors fix the problem. Or I'm pretty sure they can figure it our for themselves :) From cjfields at uiuc.edu Tue Dec 19 11:05:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 10:05:01 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] problems installing bioperl In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> Message-ID: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> On Dec 19, 2006, at 9:31 AM, Scott Cain wrote: > I really don't think the BioPerl version detection is wrong. I > actually > don't check Bio::Root::Version::VERSION in Makefile.PL, I check > Bio::Graphics::Panel->api_version. When it doesn't find the correct > api_version, it gives a warning the BioPerl 1.5.2 is not installed. I > have seen this happen when more than one BioPerl instance is installed > and `perl Makefile.PL` finds the wrong one first. My suggestion is to > try reinstalling BioPerl and providing the --uninst 1 argument to > remove > older versions of BioPerl: > > sudo ./Build install --uninst 1 > > Scott Could having two Bioperl instances explain the test failures? I'm not sure (maybe Sendu can answer this), but I would assume Module::Build uses the current working directory for test runs. chris From bix at sendu.me.uk Tue Dec 19 12:02:34 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 17:02:34 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] problems installing bioperl In-Reply-To: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> Message-ID: <45881B2A.8060907@sendu.me.uk> Chris Fields wrote: > > On Dec 19, 2006, at 9:31 AM, Scott Cain wrote: > >> I really don't think the BioPerl version detection is wrong. I actually >> don't check Bio::Root::Version::VERSION in Makefile.PL, I check >> Bio::Graphics::Panel->api_version. When it doesn't find the correct >> api_version, it gives a warning the BioPerl 1.5.2 is not installed. I >> have seen this happen when more than one BioPerl instance is installed >> and `perl Makefile.PL` finds the wrong one first. My suggestion is to >> try reinstalling BioPerl and providing the --uninst 1 argument to remove >> older versions of BioPerl: >> >> sudo ./Build install --uninst 1 >> >> Scott > > Could having two Bioperl instances explain the test failures? I'm not > sure (maybe Sendu can answer this), but I would assume Module::Build > uses the current working directory for test runs. It does, so that shouldn't be an issue for the test failures. From ferraria at gmail.com Tue Dec 19 11:40:05 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Tue, 19 Dec 2006 17:40:05 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy Message-ID: Hi all, I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with the cpan shell. I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI 'gene' database (first step of my pipeline). But the installation of this package doesn't seem to be correct : The simple example given on the documentation doesn't work. (this one : http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) Here is the error message I got : "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." In the UserAgent package, line 779 is in the private "_need_proxy" subroutine and corresponds to the code : ...if (@{ $self->{'no_proxy'} }) ... If I comment this line in the UserAgent package and the corresponding "}", the example works. But obviously, I prefer to solve the problem in a regular way :) Indeed, my computer accesses the internet via a http proxy and I didn't tell this to BioPerl at any moment. As I read on the BioPerl Wiki site, I tried to configure an $http_proxy environment variable but it still doesn't work. One last maybe important information is that I saw during the installation that the tests 't/EUtilities' were skipped because of an unrevealed reason. So finally I got two questions : 1. Is there somebody who can figure out what is my problem ? 2. At the moment, is the Bio::DB::EUtilities package really efficient or using directly the NCBI eutilities with the LWP::Simple package could be an good alternative ? Many thanks in advance, Best Regards, Anthony Ferrari From bix at sendu.me.uk Tue Dec 19 12:06:03 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 17:06:03 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> Message-ID: <45881BFB.7020008@sendu.me.uk> Scott Cain wrote: > I really don't think the BioPerl version detection is wrong. I actually > don't check Bio::Root::Version::VERSION in Makefile.PL, I check > Bio::Graphics::Panel->api_version. When it doesn't find the correct > api_version, it gives a warning the BioPerl 1.5.2 is not installed. I > have seen this happen when more than one BioPerl instance is installed > and `perl Makefile.PL` finds the wrong one first. Yes, I saw that, which is why I thought I must be looking at something different to what the OP had installed. > My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove > older versions of BioPerl: > > sudo ./Build install --uninst 1 My confusion is that he has definitely installed 1.5.2 and this version is being polled for its version number (by something!) and returning the correct '1.0050021', whilst the something expects '1.52'. Anyway, this can only be resolved if Stephan provides the real error message and its context. From cjfields at uiuc.edu Tue Dec 19 12:27:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 11:27:24 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: Message-ID: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote: > Hi all, > > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake > machine with > the cpan shell. > I want to use the Bio::DB::EUtilities to retrieve data (id's) from > NCBI > 'gene' database (first step of my pipeline). > > But the installation of this package doesn't seem to be correct : > The simple example given on the documentation doesn't work. (this > one : > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) > > Here is the error message I got : > "Can't use an undefined value as an ARRAY reference at > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > In the UserAgent package, line 779 is in the private "_need_proxy" > subroutine and corresponds to the code : ...if (@{ $self-> > {'no_proxy'} }) > ... > > If I comment this line in the UserAgent package and the > corresponding "}", > the example works. But obviously, I prefer to solve the problem in > a regular > way :) > > Indeed, my computer accesses the internet via a http proxy and I > didn't tell > this to BioPerl at any moment. > As I read on the BioPerl Wiki site, I tried to configure an > $http_proxy > environment variable but it still doesn't work. > > One last maybe important information is that I saw during the > installation > that the tests 't/EUtilities' were skipped because of an unrevealed > reason. > > > So finally I got two questions : > 1. Is there somebody who can figure out what is my problem ? > 2. At the moment, is the Bio::DB::EUtilities package really > efficient or > using directly the NCBI eutilities with the LWP::Simple package > could be an > good alternative ? > > Many thanks in advance, > Best Regards, > Anthony Ferrari First things first: at the moment the BioPerl EUtilities interface is very experimental (as specifically outlined in the POD), so I can't really recommend it for production use until the API is cleaned up. However, I do appreciate any feedback or comments re:EUtilities (the reason it's out there in the 1.5.2 release). You might check out this bug report, which relates directly to your issue: http://bugzilla.open-bio.org/show_bug.cgi?id=2109 After I worked out the proxy issue Torsten got it working. Let me know if this doesn't help or fix the problem. chris From cain at cshl.edu Tue Dec 19 10:31:50 2006 From: cain at cshl.edu (Scott Cain) Date: Tue, 19 Dec 2006 10:31:50 -0500 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <45880167.9010605@sendu.me.uk> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> Message-ID: <1166542310.6981.119.camel@localhost.localdomain> I really don't think the BioPerl version detection is wrong. I actually don't check Bio::Root::Version::VERSION in Makefile.PL, I check Bio::Graphics::Panel->api_version. When it doesn't find the correct api_version, it gives a warning the BioPerl 1.5.2 is not installed. I have seen this happen when more than one BioPerl instance is installed and `perl Makefile.PL` finds the wrong one first. My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove older versions of BioPerl: sudo ./Build install --uninst 1 Scott On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote: > Stephan Roessner wrote: > > Dear support team, > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use > > gbrowse. > > The installation seems to work (except of the test failures) but the > > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but > > of course it requires 1.52. > > > > Is there a chance to find out what went wrong? > > Nothing went wrong with the Bioperl installation (well, expect there > shouldn't have been any test failures - can you post those please?). > gbrowse simply defined its Bioperl requirement incorrectly. If you tell > me exactly where you downloaded gbrowse from and how you went about > installing it, and provide the exact, complete error message you got > from it, I might be able help the authors fix the problem. > > Or I'm pretty sure they can figure it our for themselves :) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/67132cb3/attachment.bin From ferraria at gmail.com Tue Dec 19 12:06:31 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Tue, 19 Dec 2006 18:06:31 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: Message-ID: Hi all, I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with the cpan shell. I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI 'gene' database (first step of my pipeline). But the installation of this package doesn't seem to be correct : The simple example given on the documentation doesn't work. (this one : http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) Here is the error message I got : "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." In the UserAgent package, line 779 is in the private "_need_proxy" subroutine and corresponds to the code : ...if (@{ $self->{'no_proxy'} }) ... If I comment this line in the UserAgent package and the corresponding "}", the example works. But obviously, I prefer to solve the problem in a regular way :) Indeed, my computer accesses the internet via a http proxy and I didn't tell this to BioPerl at any moment. As I read on the BioPerl Wiki site, I tried to configure an $http_proxy environment variable but it still doesn't work. One last maybe important information is that I saw during the installation that the tests 't/EUtilities' were skipped because of an unrevealed reason. So finally I got two questions : 1. Is there somebody who can figure out what is my problem ? 2. At the moment, is the Bio::DB::EUtilities package really efficient or using directly the NCBI eutilities with the LWP::Simple package could be an good alternative ? Many thanks in advance, Best Regards, Anthony Ferrari From stewarta at nmrc.navy.mil Tue Dec 19 13:49:57 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Tue, 19 Dec 2006 13:49:57 -0500 Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3 Message-ID: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> I see that Bio::Tools::Glimmer documentation clearly states that this module is intended for use with GlimmerM (eukaryotic version) only. I am wondering if anyone can recall any talk about adopting Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)? I've searched the list history with little luck other than someone else asking a similar question. If not, does anyone have any thoughts on how difficult it might be to implement support for glimmer2/3 result parsing? Perhaps just a matter of editing the _parse_predictions method? -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From rvosa at sfu.ca Tue Dec 19 13:53:47 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 19 Dec 2006 10:53:47 -0800 Subject: [Bioperl-l] problems installing bioperl Message-ID: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/276348b7/attachment.pl From cjfields at uiuc.edu Tue Dec 19 14:31:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 13:31:17 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3 In-Reply-To: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> References: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> Message-ID: <71E04575-DFD2-4F5A-B268-493D3246CBFA@uiuc.edu> On Dec 19, 2006, at 12:49 PM, Andrew Stewart wrote: > I see that Bio::Tools::Glimmer documentation clearly states that this > module is intended for use with GlimmerM (eukaryotic version) only. > I am wondering if anyone can recall any talk about adopting > Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)? > I've searched the list history with little luck other than someone > else asking a similar question. There is a thread here: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12546/ focus=12546 > If not, does anyone have any thoughts on how difficult it might be to > implement support for glimmer2/3 result parsing? Perhaps just a > matter of editing the _parse_predictions method? It depends on how different the various Glimmer formats are; I'll have to look at the ones Torsten added in CVS. You could always try modifying Bio::Tools::Glimmer to parse Glimmer2/3 and GlimmerM reports, but based on the mail list thread above it may not be so straightforward. chris From MEC at stowers-institute.org Tue Dec 19 14:57:48 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 19 Dec 2006 13:57:48 -0600 Subject: [Bioperl-l] bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation Message-ID: Lincoln and fellow Bio::DB::SeqFeature travelers, I find that using bp_seqfeature_load.PLS to load subfeatures of genes already loaded using bp_seqfeature_load.PLS fails with ------------- EXCEPTION ------------- MSG: FBgn0017545 doesn't have a primary id STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 STACK toplevel /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo ad.PLS:76 Where FBgn0017545 is the ID of a gene previously loaded. I am unsure how to remedy my situation and welcome any advise on correct or improved approach to my problem. Here's some detail if it helps. I am developing a pipeline to design a microarray probes capable of distinguishing among splice variants in drosophila (using latest Flybase dmel_r5.1 annotation). So I 1) load a filtered selection of Flybase annotation using bp_seqfeature_load. (for testing purposes, I am using a single gene's worth of annotation, FBgn0017545.gff, attached). This is done as follows: > bp_seqfeature_load.PLS --create FBgn0017545.gff 2) analyze all the genes in the database, and create GFF3 output each feature of which has a 'Parent' that is a previously loaded gene (i.e. FBgn0017545). (These features represent the unique introns, splice sites, and exonic design targets. Output of this analysis, FBgn0017545_matd.gff, is also attached) 3) load these analysis results into the same database, as follows: > bp_seqfeature_load.PLS FBgn0017545_matd.gff It is at this point that I get the above error. However, I don't get any error and the data loads fine if I load the two files together, as follows: > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff FBgn0017545_matd.gff) So, I suspect that either I am misunderstanding when/how to use bp_seqfeature_load.PLS or else this use case has not yet arisen and must be provided for somehow. I am running against bioperl-live Thanks for your thoughts and assistance, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri From Kevin.M.Brown at asu.edu Tue Dec 19 16:46:19 2006 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Tue, 19 Dec 2006 14:46:19 -0700 Subject: [Bioperl-l] Bio::SimpleAlign Message-ID: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> I'm working on a script that plays around with alignments of sequences and one of the things I noticed is that the code for the match method does not seem to actually use the start/end information when creating the match between objects in the alignment. Maybe I'm misunderstanding what the alignment is supposed to hold in terms of sequence. The alignment objects I build up are based on the sequence of a gene and the sequences of the primers that amplify that gene. $alignments{$gene->id()}->add_seq( new Bio::LocatableSeq( -seq => $seq[0]->seq(), -id => $seq[0]->id(), -start => $start, -end => $start + $seq[0]->length() - 1, -strand => 1 ) ); $alignments{$gene->id()}->add_seq( new Bio::LocatableSeq( -seq => $seq[1]->seq(), -id => $seq[1]->id(), -start => $stop, -end => $stop + $seq[1]->length() - 1, -strand => -1 ) ); So, you can see I input a start and stop point for the primer, but when I use the match function all it does is match the first character of the gene sequence to the first char of the primer sequences, then the second gene char to the second in each primer, etc... This doesn't seem to fit with the documentation and seems odd that there would be holders for the start/stop points and not use them when doing things like matching of sequences in an alignment. From bix at sendu.me.uk Tue Dec 19 17:01:22 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 22:01:22 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> References: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> Message-ID: <45886132.7050505@sendu.me.uk> Rutger Vos wrote: > Aren't 1.5.2_100 and 1.0050021 supposed to be equivalent in in this weird > version-string-translation way that makes 5.5 and 5.005 equivalent also? Yes, 1.5.2_100 and 1.0050021 are equivalent. The equivalent of 5.5 is 5.500 however. From lstein at cshl.edu Tue Dec 19 16:58:24 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 19 Dec 2006 16:58:24 -0500 Subject: [Bioperl-l] bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation In-Reply-To: References: Message-ID: <6dce9a0b0612191358t4764bfe0g601cd22d09132e55@mail.gmail.com> Hi Malcom, Your second guess was right. The use case of augmenting an existing gene with additional splice forms isn't provided for. You can get the functionality by making direct calls to Bio::DB::SeqFeature::Store methods: my @genes = $db->get_features_by_name('FBgn0017545'); @genes == 1 or die "Didn't get exactly one gene"; my $parent = $genes[0]; my $parent = $genes[0]; my $chr = $parent->seq_id; my $start = $parent->start; my $end = $parent->end; my $strand = $parent->strand; my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', -source => 'added', -seq_id => '4r', -strand => $strand, -start => $start+10, -end => $end, ); $parent->add_SeqFeature($new_splice_form); for my $pos ([$start+10,$start+100],[$start+200,$end]) { my ($e_start,$e_end) = @$pos; my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', -store => $db, -seq_id => '4r', -strand => $strand, -start => $e_start, -end => $e_end); $new_splice_form->add_SeqFeature($exon); } I found a bug in updating the seqfeature database when I wrote this script, so you'll have to get the latest biperl live. I think you can use this to write a splice form updating script. In order to support the idea of adding new splice forms to an existing gene using the GFF3 format, I will have to either modify the loader, or write a separate script (probably better to do the latter). It shouldn't be hard if you'd like to give it a try. Lincoln On 12/19/06, Cook, Malcolm wrote: > > Lincoln and fellow Bio::DB::SeqFeature travelers, > > I find that using bp_seqfeature_load.PLS to load subfeatures of genes > already loaded using bp_seqfeature_load.PLS fails with > > ------------- EXCEPTION ------------- > MSG: FBgn0017545 doesn't have a primary id > STACK > Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 > STACK toplevel > /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo > ad.PLS:76 > > Where FBgn0017545 is the ID of a gene previously loaded. > > I am unsure how to remedy my situation and welcome any advise on correct > or improved approach to my problem. > > Here's some detail if it helps. I am developing a pipeline to design a > microarray probes capable of distinguishing among splice variants in > drosophila (using latest Flybase dmel_r5.1 annotation). So I > > 1) load a filtered selection of Flybase annotation using > bp_seqfeature_load. (for testing purposes, I am using a single gene's > worth of annotation, FBgn0017545.gff, attached). This is done as > follows: > > > bp_seqfeature_load.PLS --create FBgn0017545.gff > > 2) analyze all the genes in the database, and create GFF3 output each > feature of which has a 'Parent' that is a previously loaded gene (i.e. > FBgn0017545). (These features represent the unique introns, splice > sites, and exonic design targets. Output of this analysis, > FBgn0017545_matd.gff, is also attached) > > 3) load these analysis results into the same database, as follows: > > > bp_seqfeature_load.PLS FBgn0017545_matd.gff > > It is at this point that I get the above error. > > However, I don't get any error and the data loads fine if I load the two > files together, as follows: > > > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff > FBgn0017545_matd.gff) > > So, I suspect that either I am misunderstanding when/how to use > bp_seqfeature_load.PLS or else this use case has not yet arisen and must > be provided for somehow. > > I am running against bioperl-live > > Thanks for your thoughts and assistance, > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From rvosa at sfu.ca Tue Dec 19 23:23:20 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 19 Dec 2006 20:23:20 -0800 Subject: [Bioperl-l] suggestions for suitable 'taxon' object Message-ID: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/17ec7ff3/attachment.pl From cjfields at uiuc.edu Wed Dec 20 01:16:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 00:16:47 -0600 Subject: [Bioperl-l] suggestions for suitable 'taxon' object In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> Message-ID: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote: > Hi all, > > I am looking for a bioperl object that can be abused to function as a > suitable 'taxon' object, where I mean 'taxon' as understood by the > NEXUS > file format (i.e. not strictly an entity from a taxonomy, but more > loosely > an OTU). > > The object would primarily function as a way to relate nodes in > trees to > sequences in an alignment (a foreign key that both nodes and > sequences refer > to), and secondarily as the keeper of the canonical name of the > OTU, such > that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node > named 'Homo > sapiens (constrained monophyly)' can still be understood to refer > to the > same thing - the OTU 'Homo sapiens sapiens' (for example). Alignment (SimpleAlign) objects contain Bio::LocatableSeq sequence objects; at the moment LocatableSeqs don't store their own annotation but they could easily be made or subclassed to be AnnotatableI (i.e. they can store annotation collections). I recently made SimpleAlign Annotatable; Jason has also made SimpleAlign implement FeatureHolderI, so alignments can store SeqFeatures as well; he may have his own designs here. There may be a wide variety of ways to go about this. I would probably do the following (bear in mind I'm a microbiologist, not a computer scientist). If one could add trees as annotation to the alignment (i.e. if trees could be Annotation objects and kept in the SimpleAlign's annotation collection), and each sequence in the alignment contained reference to a node object of the tree (i.e. if Bio::Taxon/Bio::Species objects could also be Annotation objects, but kept in a LocatableSeq annotation collection), both could refer to the same node object. This may not be exactly what you want, but maybe it's close? SimpleAlign->AnnoColln->Tree->OTU(Nodes) \----->LocSeqs-->AnnoColln-----/ I suppose this could also be done with Seqfeatures... > I was thinking that a (possibly expanded) Bio::Species might work > if there > was some sensible way of appending references to node and sequence > objects > to it (or otherwise associate them with each other), but I am > writing *to > solicit any and all suggestions*. I am looking for something > similar to > Bio::Phylo::Taxa::Taxon. > > Any and all comments and suggestions greatly appreciated! > > Best wishes, > > Rutger Vos Sendu would be the best one to speak about Bio::Taxon and Bio::Species and may have some ideas on the above. The current plan was to deprecate Bio::Species, but who knows? chris From heikki at sanbi.ac.za Wed Dec 20 05:25:08 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 20 Dec 2006 12:25:08 +0200 Subject: [Bioperl-l] Bio::SimpleAlign In-Reply-To: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> Message-ID: <200612201225.08862.heikki@sanbi.ac.za> Kevin, Sequences that are added to the alignment are supposed to be *aligned*. SimpleAlign does not do it for you. It seems to me that you are adding sequences like this: nnnnnnnnnnnnnnnnnnnn 1 - 20, "a short gene" nnnnnn 21 - 26 "a short primer after the gene" when you should be doing this nnnnnnnnnnnnnnnnnnnn 1 - 20, "a short gene" --------------------nnnnnn 21 - 26 "a short primer after the gene" Note that the default way of displaying names in SimpleAlign is "name/start-end". The name usually are expected to refer to the sequence from which this subsequence is derived from. The displayname does not change if you add gaps. Yours, -Heikki On Tuesday 19 December 2006 23:46, Kevin Brown wrote: > I'm working on a script that plays around with alignments of sequences > and one of the things I noticed is that the code for the match method > does not seem to actually use the start/end information when creating > the match between objects in the alignment. Maybe I'm misunderstanding > what the alignment is supposed to hold in terms of sequence. The > alignment objects I build up are based on the sequence of a gene and the > sequences of the primers that amplify that gene. > > $alignments{$gene->id()}->add_seq( > new Bio::LocatableSeq( > -seq => $seq[0]->seq(), > -id => $seq[0]->id(), > -start => $start, > -end => $start + $seq[0]->length() - 1, > -strand => 1 > ) > ); If your sequence does not contain gaps and the numbering starts from one, you can let the object handle start/stop: my $a = new Bio::LocatableSeq( -seq => 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', -id => 'A00001', -strand => 1 } > $alignments{$gene->id()}->add_seq( > new Bio::LocatableSeq( > -seq => $seq[1]->seq(), > -id => $seq[1]->id(), > -start => $stop, > -end => $stop + $seq[1]->length() - 1, > -strand => -1 > ) > ); > > So, you can see I input a start and stop point for the primer, but when > I use the match function all it does is match the first character of the > gene sequence to the first char of the primer sequences, then the second > gene char to the second in each primer, etc... This doesn't seem to fit > with the documentation and seems odd that there would be holders for the > start/stop points and not use them when doing things like matching of > sequences in an alignment. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From ferraria at gmail.com Wed Dec 20 06:04:16 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 12:04:16 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> Message-ID: On 19/12/06, Chris Fields wrote: > > > On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote: > > > Hi all, > > > > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake > > machine with > > the cpan shell. > > I want to use the Bio::DB::EUtilities to retrieve data (id's) from > > NCBI > > 'gene' database (first step of my pipeline). > > > > But the installation of this package doesn't seem to be correct : > > The simple example given on the documentation doesn't work. (this > > one : > > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) > > > > Here is the error message I got : > > "Can't use an undefined value as an ARRAY reference at > > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > In the UserAgent package, line 779 is in the private "_need_proxy" > > subroutine and corresponds to the code : ...if (@{ $self-> > > {'no_proxy'} }) > > ... > > > > If I comment this line in the UserAgent package and the > > corresponding "}", > > the example works. But obviously, I prefer to solve the problem in > > a regular > > way :) > > > > Indeed, my computer accesses the internet via a http proxy and I > > didn't tell > > this to BioPerl at any moment. > > As I read on the BioPerl Wiki site, I tried to configure an > > $http_proxy > > environment variable but it still doesn't work. > > > > One last maybe important information is that I saw during the > > installation > > that the tests 't/EUtilities' were skipped because of an unrevealed > > reason. > > > > > > So finally I got two questions : > > 1. Is there somebody who can figure out what is my problem ? > > 2. At the moment, is the Bio::DB::EUtilities package really > > efficient or > > using directly the NCBI eutilities with the LWP::Simple package > > could be an > > good alternative ? > > > > Many thanks in advance, > > Best Regards, > > Anthony Ferrari > > First things first: at the moment the BioPerl EUtilities interface is > very experimental (as specifically outlined in the POD), so I can't > really recommend it for production use until the API is cleaned up. > However, I do appreciate any feedback or comments re:EUtilities (the > reason it's out there in the 1.5.2 release). > > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > I carefully read this bug but that doesn't help because this has already been modified in the now given GenericWebDBI.pm So my problem does not come from a deep recursion loop. As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/EUtilities.t " to see what's really happening. And actually, all tests are skipped because of the same message error -> "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." *** I tried the same command with the modified LWP::UserAgent package (which means I comment the line 779 and the corresponding '}') and all 453 tests passed. But not always. I made the tests several times and it often failed. And always on a test called "eXXX->cookie->cookie() query key" (ending with query key). In those cases, I got back a html message indicating that the error was thrown by the internal sever of NCBI. So I guess that sometimes it is just NCBI server fault (internal problem), and BioPerl is not implied.. But once more, I comment a line from a basic package so it is a bit hazardous. *** tony - a little bit lost. From smane at vbi.vt.edu Tue Dec 19 14:46:56 2006 From: smane at vbi.vt.edu (Shrinivasrao P. Mane) Date: Tue, 19 Dec 2006 14:46:56 -0500 Subject: [Bioperl-l] Using Muscle parameter within bioperl Message-ID: Hi, I need to run muscle using bioperl. This is how I do it in command line. muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet I used the following in perl script my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); The program runs and produces the result file but it doesn't create a log file nor does it stop sending output to STDOUT (-quiet). Could anybody help me with this? Thanks Mane From cjfields at uiuc.edu Wed Dec 20 09:09:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 08:09:56 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> Message-ID: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > > > I carefully read this bug but that doesn't help because this has > already been modified in the now given GenericWebDBI.pm > So my problem does not come from a deep recursion loop. > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > EUtilities.t " to see what's really happening. > And actually, all tests are skipped because of the same message error > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > *** > I tried the same command with the modified LWP::UserAgent package > (which means I comment the line 779 and the corresponding '}') and > all 453 tests passed. > But not always. I made the tests several times and it often > failed. And always on a test called "eXXX->cookie->cookie() query > key" (ending with query key). In those cases, I got back a html > message indicating that the error was thrown by the internal sever > of NCBI. So I guess that sometimes it is just NCBI server fault > (internal problem), and BioPerl is not implied.. > But once more, I comment a line from a basic package so it is a bit > hazardous. > *** > > tony - a little bit lost. I'm cc'ing Torsten as he has a bit more experience with proxies. EUtilities is set up to check for an env. proxy and also take a set proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy to say this was a bug in LWP, but I think the problem is that something is undefined (i.e. an env. variable), or username/password. From the bug report, Torsten set his proxy variables using the following: -------------------------------------- "Note: I am behind an _authenticating_ proxy. My $http_proxy and $HTTP_PROXY are both set to http://USER:PASS at proxy.monash.edu.au:80/" -------------------------------------- Note the lowercase for $http_proxy, which can make a difference. After the recursion fix, I'm assuming he made no changes to the env. settings, and according to the bug everything was fine (is that correct Tortsen?). Also LWP::UserAgent has this: -------------------------------------- "Load proxy settings from *_proxy environment variables. You might specify proxies like this (sh-syntax): gopher_proxy=http://proxy.my.place/ wais_proxy=http://proxy.my.place/ no_proxy="localhost,my.domain" export gopher_proxy wais_proxy no_proxy csh or tcsh users should use the setenv command to define these environment variables. On systems with case insensitive environment variables there exists a name clash between the CGI environment variables and the HTTP_PROXY environment variable normally picked up by env_proxy(). Because of this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY environment variable can be used instead." -------------------------------------- chris From bix at sendu.me.uk Wed Dec 20 09:08:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 14:08:16 +0000 Subject: [Bioperl-l] Using Muscle parameter within bioperl In-Reply-To: References: Message-ID: <458943D0.10400@sendu.me.uk> Shrinivasrao P. Mane wrote: > Hi, > I need to run muscle using bioperl. This is how I do it in command line. > > muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet > > I used the following in perl script > > my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => > 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); > > The program runs and produces the result file but it doesn't create a > log file nor does it stop sending output to STDOUT (-quiet). > Could anybody help me with this? The Muscle arguments don't take dashed args. To make switches active you need to set them to some true value. So (-verbose => 1, quiet => 1, log => 'inv.log'). Verbose may not do what you want since it is both a Bioperl option and a Muscle option; if you want the latter try using verbose => 1. From bix at sendu.me.uk Wed Dec 20 09:51:33 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 14:51:33 +0000 Subject: [Bioperl-l] suggestions for suitable 'taxon' object In-Reply-To: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> Message-ID: <45894DF5.1060503@sendu.me.uk> Chris Fields wrote: > On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote: > >> Hi all, >> >> I am looking for a bioperl object that can be abused to function as >> a suitable 'taxon' object, where I mean 'taxon' as understood by >> the NEXUS file format (i.e. not strictly an entity from a taxonomy, >> but more loosely an OTU). >> >> The object would primarily function as a way to relate nodes in >> trees to sequences in an alignment (a foreign key that both nodes >> and sequences refer to), and secondarily as the keeper of the >> canonical name of the OTU, such that a sequence named >> 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo sapiens >> (constrained monophyly)' can still be understood to refer to the >> same thing - the OTU 'Homo sapiens sapiens' (for example). I haven't had time to give your suggestions consideration, but I can say that I'm having to do the same thing for a bioperl-run module and my work-around is simply to set a custom name on my Bio::Taxon objects. To explain, I have the benefit that my tree is made up of Bio::Taxon objects, so I call $taxon->name('seq_id', $seq->id). Then when I want to know which of my sequences corresponds to a particular taxon, I work out which of them has the id given by shift @{$taxon->name('seq_id')}. Hardly ideal, but it works for now. >> I was thinking that a (possibly expanded) Bio::Species might work >> if there was some sensible way of appending references to node and >> sequence objects to it (or otherwise associate them with each >> other), but I am writing *to solicit any and all suggestions*. I am >> looking for something similar to Bio::Phylo::Taxa::Taxon. > > Sendu would be the best one to speak about Bio::Taxon and > Bio::Species and may have some ideas on the above. The current plan > was to deprecate Bio::Species, but who knows? Given that we do plan to deprecate Bio::Species, I'd resist the temptation to expand on it. Use Bio::Taxon as a base if it has stuff you need, or base straight from Bio::Tree::Node if not. From ferraria at gmail.com Wed Dec 20 10:40:34 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 16:40:34 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> Message-ID: Defining a "no_proxy" environment variable in my '.bashrc' file solved my problem. I set it to "localhost". It indeed corresponds to the line... [ ...if (@{ $self->{'no_proxy'} }) ... ] (I guess!) I really don't know why we are compelled to do this, but let's say that's the way it is. It works now ! Thanks a lot. Tony On 20/12/06, Chris Fields wrote: > > > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > > > You might check out this bug report, which relates directly to your > > issue: > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > > > After I worked out the proxy issue Torsten got it working. Let me > > know if this doesn't help or fix the problem. > > > > chris > > > > > > I carefully read this bug but that doesn't help because this has > > already been modified in the now given GenericWebDBI.pm > > So my problem does not come from a deep recursion loop. > > > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > > EUtilities.t " to see what's really happening. > > And actually, all tests are skipped because of the same message error > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > *** > > I tried the same command with the modified LWP::UserAgent package > > (which means I comment the line 779 and the corresponding '}') and > > all 453 tests passed. > > But not always. I made the tests several times and it often > > failed. And always on a test called "eXXX->cookie->cookie() query > > key" (ending with query key). In those cases, I got back a html > > message indicating that the error was thrown by the internal sever > > of NCBI. So I guess that sometimes it is just NCBI server fault > > (internal problem), and BioPerl is not implied.. > > But once more, I comment a line from a basic package so it is a bit > > hazardous. > > *** > > > > tony - a little bit lost. > > I'm cc'ing Torsten as he has a bit more experience with proxies. > > EUtilities is set up to check for an env. proxy and also take a set > proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy > to say this was a bug in LWP, but I think the problem is that > something is undefined (i.e. an env. variable), or username/password. > > From the bug report, Torsten set his proxy variables using the > following: > > -------------------------------------- > "Note: I am behind an _authenticating_ proxy. > My $http_proxy and $HTTP_PROXY are both set to > http://USER:PASS at proxy.monash.edu.au:80/" > -------------------------------------- > > Note the lowercase for $http_proxy, which can make a difference. > After the recursion fix, I'm assuming he made no changes to the env. > settings, and according to the bug everything was fine (is that > correct Tortsen?). > > Also LWP::UserAgent has this: > > -------------------------------------- > "Load proxy settings from *_proxy environment variables. You might > specify proxies like this (sh-syntax): > > gopher_proxy=http://proxy.my.place/ > wais_proxy=http://proxy.my.place/ > no_proxy="localhost,my.domain" > export gopher_proxy wais_proxy no_proxy > > csh or tcsh users should use the setenv command to define these > environment variables. > > On systems with case insensitive environment variables there exists a > name clash between the CGI environment variables and the HTTP_PROXY > environment variable normally picked up by env_proxy(). Because of > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY > environment variable can be used instead." > -------------------------------------- > > chris > From cjfields at uiuc.edu Wed Dec 20 11:10:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 10:10:48 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: Message-ID: <007901c72451$6ad540a0$15327e82@pyrimidine> Just to clarify: does it work it you don't have any proxy env. settings? chris _____ From: Anthony Ferrari [mailto:ferraria at gmail.com] Sent: Wednesday, December 20, 2006 9:41 AM To: Chris Fields Cc: bioperl-l List; Torsten Seemann Subject: Re: [Bioperl-l] Problem with : EUtilities - Proxy Defining a "no_proxy" environment variable in my '.bashrc' file solved my problem. I set it to "localhost". It indeed corresponds to the line... [ ...if (@{ $self->{'no_proxy'} }) ... ] (I guess!) I really don't know why we are compelled to do this, but let's say that's the way it is. It works now ! Thanks a lot. Tony On 20/12/06, Chris Fields wrote: On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > > > I carefully read this bug but that doesn't help because this has > already been modified in the now given GenericWebDBI.pm > So my problem does not come from a deep recursion loop. > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > EUtilities.t " to see what's really happening. > And actually, all tests are skipped because of the same message error > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > *** > I tried the same command with the modified LWP::UserAgent package > (which means I comment the line 779 and the corresponding '}') and > all 453 tests passed. > But not always. I made the tests several times and it often > failed. And always on a test called "eXXX->cookie->cookie() query > key" (ending with query key). In those cases, I got back a html > message indicating that the error was thrown by the internal sever > of NCBI. So I guess that sometimes it is just NCBI server fault > (internal problem), and BioPerl is not implied.. > But once more, I comment a line from a basic package so it is a bit > hazardous. > *** > > tony - a little bit lost. I'm cc'ing Torsten as he has a bit more experience with proxies. EUtilities is set up to check for an env. proxy and also take a set proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy to say this was a bug in LWP, but I think the problem is that something is undefined ( i.e. an env. variable), or username/password. >From the bug report, Torsten set his proxy variables using the following: -------------------------------------- "Note: I am behind an _authenticating_ proxy. My $http_proxy and $HTTP_PROXY are both set to http://USER:PASS at proxy.monash.edu.au:80/" -------------------------------------- Note the lowercase for $http_proxy, which can make a difference. After the recursion fix, I'm assuming he made no changes to the env. settings, and according to the bug everything was fine (is that correct Tortsen?). Also LWP::UserAgent has this: -------------------------------------- "Load proxy settings from *_proxy environment variables. You might specify proxies like this (sh-syntax): gopher_proxy=http://proxy.my.place/ wais_proxy= http://proxy.my.place/ no_proxy="localhost,my.domain" export gopher_proxy wais_proxy no_proxy csh or tcsh users should use the setenv command to define these environment variables. On systems with case insensitive environment variables there exists a name clash between the CGI environment variables and the HTTP_PROXY environment variable normally picked up by env_proxy(). Because of this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY environment variable can be used instead." -------------------------------------- chris From ferraria at gmail.com Wed Dec 20 11:59:49 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 17:59:49 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <007901c72451$6ad540a0$15327e82@pyrimidine> References: <007901c72451$6ad540a0$15327e82@pyrimidine> Message-ID: First, I got a $http_proxy env. variable automatically defined by the BioPerl installation (I don't define and export it in my .bash_profile). So when I'm logging in, $http_proxy=http://ip_adress:port/ Next step : two solutions : 1) defining an $no_proxy env.variable in my .bash_profile. It can be set to 'whatever'. 2) If I do not define '$no_proxy'; to make it work, I must call the no_proxy() method on each Bio::DB::EUtilities object I create before I can call the get_response() method on it. (The bug is in the 'get_response' call) And finally without 1) or 2) it doesn't work. Tony On 20/12/06, Chris Fields wrote: > > Just to clarify: does it work it you don't have any proxy env. settings? > One thing I didn't point out previously is that Bio::DB::GenericWebDBI > inherits LWP::UserAgent. You should be able to use $eutil->no_proxy() > instead of setting it in your env. > chris > > ------------------------------ > *From:* Anthony Ferrari [mailto:ferraria at gmail.com] > *Sent:* Wednesday, December 20, 2006 9:41 AM > *To:* Chris Fields > *Cc:* bioperl-l List; Torsten Seemann > *Subject:* Re: [Bioperl-l] Problem with : EUtilities - Proxy > > Defining a "no_proxy" environment variable in my '.bashrc' file solved my > problem. I set it to "localhost". > > It indeed corresponds to the line... [ ...if (@{ > $self->{'no_proxy'} }) ... ] (I guess!) > > > I really don't know why we are compelled to do this, but let's say that's > the way it is. > > It works now ! > > Thanks a lot. > > Tony > > > > > On 20/12/06, Chris Fields wrote: > > > > > > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > > > > > You might check out this bug report, which relates directly to your > > > issue: > > > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > > > > > After I worked out the proxy issue Torsten got it working. Let me > > > know if this doesn't help or fix the problem. > > > > > > chris > > > > > > > > > I carefully read this bug but that doesn't help because this has > > > already been modified in the now given GenericWebDBI.pm > > > So my problem does not come from a deep recursion loop. > > > > > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > > > EUtilities.t " to see what's really happening. > > > And actually, all tests are skipped because of the same message error > > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > > > *** > > > I tried the same command with the modified LWP::UserAgent package > > > (which means I comment the line 779 and the corresponding '}') and > > > all 453 tests passed. > > > But not always. I made the tests several times and it often > > > failed. And always on a test called "eXXX->cookie->cookie() query > > > key" (ending with query key). In those cases, I got back a html > > > message indicating that the error was thrown by the internal sever > > > of NCBI. So I guess that sometimes it is just NCBI server fault > > > (internal problem), and BioPerl is not implied.. > > > But once more, I comment a line from a basic package so it is a bit > > > hazardous. > > > *** > > > > > > tony - a little bit lost. > > > > I'm cc'ing Torsten as he has a bit more experience with proxies. > > > > EUtilities is set up to check for an env. proxy and also take a set > > proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy > > to say this was a bug in LWP, but I think the problem is that > > something is undefined ( i.e. an env. variable), or username/password. > > > > From the bug report, Torsten set his proxy variables using the > > following: > > > > -------------------------------------- > > "Note: I am behind an _authenticating_ proxy. > > My $http_proxy and $HTTP_PROXY are both set to > > http://USER:PASS at proxy.monash.edu.au:80/" > > -------------------------------------- > > > > Note the lowercase for $http_proxy, which can make a difference. > > After the recursion fix, I'm assuming he made no changes to the env. > > settings, and according to the bug everything was fine (is that > > correct Tortsen?). > > > > Also LWP::UserAgent has this: > > > > -------------------------------------- > > "Load proxy settings from *_proxy environment variables. You might > > specify proxies like this (sh-syntax): > > > > gopher_proxy=http://proxy.my.place/ > > wais_proxy= http://proxy.my.place/ > > no_proxy="localhost,my.domain" > > export gopher_proxy wais_proxy no_proxy > > > > csh or tcsh users should use the setenv command to define these > > environment variables. > > > > On systems with case insensitive environment variables there exists a > > name clash between the CGI environment variables and the HTTP_PROXY > > environment variable normally picked up by env_proxy(). Because of > > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY > > environment variable can be used instead." > > -------------------------------------- > > > > chris > > > > From cjfields at uiuc.edu Wed Dec 20 13:28:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 12:28:09 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: Message-ID: <000301c72464$9a12a070$15327e82@pyrimidine> > First, I got a $http_proxy env. variable automatically > defined by the BioPerl installation (I don't define and > export it in my .bash_profile). > So when I'm logging in, $http_proxy=http://ip_adress:port/ BioPerl can't permanently set any env. variables out of the box since that would mean modifying your local .bash_profile or the system profile. If you're a user on a system where you're not the sysadmin, then it's more likely the sysadmin has set up user accounts with an already-defined $http_proxy variable in the system .bash_profile (which is passed on to all users). The problem I can see (going by what you have above) is there is no username/password defined, only the address (IP:Port). I am assuming LWP is expecting some form of authentication when a proxy is env. defined w/o username/password included. If so, you'll have to supply those yourself, either by redefining $http_proxy to include it in your local .bash_profile, export $http_proxy='http://USER:PASS at proxy.monash.edu.au:80/' by using $agent->proxy() for including all proxy information, or by using $agent->authentication() so that a proxy can authorize any outgoing/incoming requests. The first may be preferrable if you are able to do so since you wouldn't have to authenticate every agent. Note that this would also explain why you had an LWP problem with an undefined array ref: the LWP agent is likely expecting some form of authentication, probably in the form [username, password], if a proxy env. variable is found. > Next step : two solutions : > 1) defining an $no_proxy env.variable in my .bash_profile. > It can be set to 'whatever'. > > 2) If I do not define '$no_proxy'; to make it work, I must call the > no_proxy() method on each Bio::DB::EUtilities object I create > before I can call the get_response() method on it. > > (The bug is in the 'get_response' call) If you mean when the request is calling proxy_authorization_basic(), that's not a bug. If we didn't authorize then it likely wouldn't work for properly set up proxies (Torsten's worked). Note that it's supposed to be passing a username/password from $self->authentication(). The fact that you can set $no_proxy to anything suggests there is no proxy in place. > And finally without 1) or 2) it doesn't work. > > Tony We can't guarantee that defining no_proxy will always work on your system, either. It's possible a proxy was added systemwide but a firewall hasn't been put in place yet; once it goes up and all requests need to be authorized, then you'll run into problems again. Conversely, maybe this was defined at some point systemwide in the .bash_profile but wasn't removed. The only one who would know is the sysadmin. If you aren't the sysadmin, you can contact them to find out about how to properly set up your proxy, or whether it is even necessary (maybe they neglected to remove the proxy definition from the system .bash_profile). Who knows? chris From bix at sendu.me.uk Wed Dec 20 16:03:03 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 21:03:03 +0000 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine> References: <000301c72464$9a12a070$15327e82@pyrimidine> Message-ID: <4589A507.60106@sendu.me.uk> Chris Fields wrote: >> First, I got a $http_proxy env. variable automatically >> defined by the BioPerl installation (I don't define and >> export it in my .bash_profile). >> So when I'm logging in, $http_proxy=http://ip_adress:port/ > > BioPerl can't permanently set any env. variables out of the box since True, and it doesn't try to set one temporarily either. To clarify some of the other points Chris made, the proxy variable certainly doesn't need username and password to be defined (from LWPs point of view), since not all proxies authenticate. Of course accesses won't work if authentication is actually required and these aren't set. There's no reason that no_proxy should have to be set. It is used to say what domains shouldn't be proxied. Either this is a real LWP bug, or somehow EUtilities or one of its bases is doing something wrong. It should be investigated... It would be very informative if Anthony could log in when he hasn't done anything to his environment variables (and so where the original problem manifests) and give us the results of: perl -e 'while (($key, $val) = each %ENV) { print "$key => $val\n" }' From avilella at gmail.com Wed Dec 20 09:07:17 2006 From: avilella at gmail.com (Albert Vilella) Date: Wed, 20 Dec 2006 14:07:17 +0000 Subject: [Bioperl-l] Using Muscle parameter within bioperl In-Reply-To: References: Message-ID: <358f4d650612200607m4324b8f1r91d2d917cd4951bd@mail.gmail.com> Try something like: my @params =('verbose'=>0, 'quiet'=>1, 'log'=>'/tmp/inv.log'); my $factory = Bio::Tools::Run::Alignment::Muscle->new(@params); it works for me with muscle 3.6. The log only gives me a start, commandstring and end. I dunno if that is what muscle is supposed to spit out. Albert. On 12/19/06, Shrinivasrao P. Mane wrote: > Hi, > I need to run muscle using bioperl. This is how I do it in command line. > > muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet > > I used the following in perl script > > my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => > 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); > > The program runs and produces the result file but it doesn't create a > log file nor does it stop sending output to STDOUT (-quiet). > Could anybody help me with this? > Thanks > Mane > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Dec 20 17:46:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 16:46:35 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <4589A507.60106@sendu.me.uk> Message-ID: <000c01c72488$b6a690b0$15327e82@pyrimidine> > Chris Fields wrote: > >> First, I got a $http_proxy env. variable automatically > defined by the > >> BioPerl installation (I don't define and export it in my > >> .bash_profile). > >> So when I'm logging in, > $http_proxy=http://ip_adress:port/ > > > > BioPerl can't permanently set any env. variables out of the > box since > > True, and it doesn't try to set one temporarily either. > > To clarify some of the other points Chris made, the proxy > variable certainly doesn't need username and password to be > defined (from LWPs point of view), since not all proxies > authenticate. Of course accesses won't work if authentication > is actually required and these aren't set. > > There's no reason that no_proxy should have to be set. It is > used to say what domains shouldn't be proxied. Either this is > a real LWP bug, or somehow EUtilities or one of its bases is > doing something wrong. It should be investigated... Actually, after some investigation I repeated the error and committed a fix. If I set (on WinXP) HTTP_PROXY to a dummy variable I get the same error: Can't use an undefined value as an ARRAY reference at C:/Perl/lib/LWP/UserAgent.pm line 787. It's EUtilities-specific as other WebAgents that have proxy settings do not have the same problem, though I haven't checked any WebAgent-based classes. I think this may also partly be an LWP bug as setting env_proxy to TRUE/FALSE doesn't seem to have an effect, but instantiating with it (env_proxy => 1) in the constructor fixes the problem. Anthony, I have committed a fix to CVS to GenericWebDBI and EUtilities. Could you try it out? -chris From cjfields at uiuc.edu Wed Dec 20 18:19:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 17:19:59 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine> Message-ID: <000001c7248d$5e7df450$15327e82@pyrimidine> > > First, I got a $http_proxy env. variable automatically > defined by the > > BioPerl installation (I don't define and export it in my > > .bash_profile). > > So when I'm logging in, > $http_proxy=http://ip_adress:port/ Anthony, Sorry about the prior long-winded response. I managed to reproduce the error about five minutes after I responded and managed to trace the problem back to GenericWebDBI. The issue seems to be with the LWP::UserAgent env_proxy method not setting correctly post-instantiation; setting to 0 or 1 doesn't seem to do anything. If I add it to the list of args for chained instantiation in the constructor: my $self = $class->SUPER::new(@args, env_proxy => 1); it suddenly works like a charm. Hard to know why it's being fussy... I'm going to try reproducing this on a few platforms and check to see if it has been reported as an LWP bug. I have also committed a fix to CVS if you want to test it out. Chris From jnewcomer at jhu.edu Wed Dec 20 20:56:10 2006 From: jnewcomer at jhu.edu (Joe Newcomer) Date: Wed, 20 Dec 2006 20:56:10 -0500 Subject: [Bioperl-l] a stupid question Message-ID: <002101c724a3$2ff80100$bd59dc80@aap.jhu.edu> Hello Paul Leo, I am with Johns Hopkins University Advanced Academic Programs. I am trying to contact a student named Paul Leo who has registered for Protein Bioinformatics. If this is you please email me. I would like to send you information about the spring course. Respectfully, Joe Newcomer (410) 516-5047 Online Education From anhthu.tieu at gsf.de Thu Dec 21 05:10:47 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 11:10:47 +0100 Subject: [Bioperl-l] imagemaps with heterogeneous_segments Message-ID: <458A5DA7.1010802@gsf.de> Hi, I use bioperl 1.5.2 and have been wondering whether it is possible to apply the image_and_map function with the glyph option "heterogenous_segments". Up to now I can successfully create an underlying imagemap for the entire track. However, what I want is to create an imagemap for each single segment on my track/glyph. Does anyone know who to realise this? Any help is appreciated. Thanks a lot. Anh Thu From anhthu.tieu at gsf.de Thu Dec 21 05:12:36 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 11:12:36 +0100 Subject: [Bioperl-l] imagemaps with heterogeneous_segments Message-ID: <458A5E14.8060409@gsf.de> Hi, I use bioperl 1.5.2 and have been wondering whether it is possible to apply the image_and_map function with the glyph option "heterogenous_segments". Up to now I can successfully create an underlying imagemap for the entire track. However, what I want is to create an imagemap for each single segment on my track/glyph. Does anyone know who to realise this? Any help is appreciated. Thanks a lot. Anh Thu From somil.sharma1 at gmail.com Thu Dec 21 01:22:24 2006 From: somil.sharma1 at gmail.com (Somil Sharma) Date: Thu, 21 Dec 2006 14:22:24 +0800 Subject: [Bioperl-l] problem Message-ID: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> hello *i run this program* *#!/use/bin/perl* *use Bio::DB::GenBank;* *$gb = new Bio::DB::GenBank; $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); print $seq1; * *and got this error on cmd line--* ---------- *EXCEPTION ------------- MSG: WebDBSeqI Request Error: 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error) Content-Type: text/plain Client-Date: Thu, 21 Dec 2006 06:28:33 GMT Client-Warning: Internal response* *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)* *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685 STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491 STACK Bio::DB::WebDBSeqI::get_Stream_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27 STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145 STACK toplevel C:\Perl\a2.pl:5* plz see if u can help me out. my ppm is also not able to install Bioperl so i did that also manually. waiting for ur reply From granjeau at tagc.univ-mrs.fr Thu Dec 21 06:14:25 2006 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Thu, 21 Dec 2006 12:14:25 +0100 Subject: [Bioperl-l] BioFetch: Adding databases Message-ID: <458A6C91.7090000@tagc.univ-mrs.fr> Hello! I needed to query the Unisave database at EBI. Up to date, the only way to access it is the dbfetch web service (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet defined in the BioFetch package (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote these few lines to make it work, but I don't think it fits a good programming practice. May be it makes sense to defined a method to add databases to FORMATMAP, in order to follow the dbfetch service evolutions. Cheers, --Samuel use Bio::DB::BioFetch; $Bio::DB::BioFetch::FORMATMAP{unisave} = { default => 'swiss', swissprot => 'swiss', fasta => 'fasta', namespace => 'unisave', }; my $bf = new Bio::DB::BioFetch(-db=>'unisave'); my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); print $seq->display_id(); print $seq->seq(); From cain at cshl.edu Thu Dec 21 08:56:21 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 21 Dec 2006 08:56:21 -0500 Subject: [Bioperl-l] problem In-Reply-To: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> References: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> Message-ID: <1166709381.3739.47.camel@localhost.localdomain> Hello, It looks to me like you have a networking problem that doesn't have anything to do with BioPerl. When I run your script, I get: Bio::Seq::RichSeq=HASH(0x97013e0) Fairly quickly, too. Can you get to http://eutils.ncbi.nlm.nih.gov/ in a browser without proxy settings? As an aside, you probably don't really want the HASH stuff above, so I modified your script to look like this, with warnings and strict to make future debugging easier: #!/use/bin/perl -w use strict; use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank; my $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); print $seq1->seq; Scott On Thu, 2006-12-21 at 14:22 +0800, Somil Sharma wrote: > hello > > *i run this program* > > *#!/use/bin/perl* > > *use Bio::DB::GenBank;* > > *$gb = new Bio::DB::GenBank; > $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); > print $seq1; > * > > *and got this error on cmd line--* > > ---------- *EXCEPTION ------------- > MSG: WebDBSeqI Request Error: > 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error) > Content-Type: text/plain > Client-Date: Thu, 21 Dec 2006 06:28:33 GMT > Client-Warning: Internal response* > > *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)* > > *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685 > STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491 > STACK Bio::DB::WebDBSeqI::get_Stream_by_id > C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27 > STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145 > STACK toplevel C:\Perl\a2.pl:5* > > plz see if u can help me out. > > my ppm is also not able to install Bioperl so i did that also manually. > > waiting for ur reply > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/f63031e2/attachment.bin From cjfields at uiuc.edu Thu Dec 21 09:28:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 21 Dec 2006 08:28:07 -0600 Subject: [Bioperl-l] BioFetch: Adding databases In-Reply-To: <458A6C91.7090000@tagc.univ-mrs.fr> References: <458A6C91.7090000@tagc.univ-mrs.fr> Message-ID: <193C6D1C-6374-4A86-9FBD-7FA994D5FDDF@uiuc.edu> I've added this to the BioFetch FORMATMAP as 'unisave' and committed to CVS. Thanks! chris On Dec 21, 2006, at 5:14 AM, Samuel GRANJEAUD - IR/IFR137 wrote: > Hello! > > I needed to query the Unisave database at EBI. Up to date, the only > way > to access it is the dbfetch web service > (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet > defined > in the BioFetch package > (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote > these few lines to make it work, but I don't think it fits a good > programming practice. May be it makes sense to defined a method to add > databases to FORMATMAP, in order to follow the dbfetch service > evolutions. > > Cheers, > --Samuel > > use Bio::DB::BioFetch; > $Bio::DB::BioFetch::FORMATMAP{unisave} = { > default => 'swiss', > swissprot => 'swiss', > fasta => 'fasta', > namespace => 'unisave', > }; > my $bf = new Bio::DB::BioFetch(-db=>'unisave'); > my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); > > print $seq->display_id(); > print $seq->seq(); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From anhthu.tieu at gsf.de Thu Dec 21 09:31:45 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 15:31:45 +0100 Subject: [Bioperl-l] multiple glyph elements in one track Message-ID: <458A9AD1.50907@gsf.de> Hello, I use bioperl 1.5.2. Does anyone know how I could create two seperate glyph elements on the same track with the Bio::Graphics::Panel module? My aim is to have two (e.g. two different) clickable imagemap elements on the same track. Until now I can merely create two glyph elements (transcript2 or generic options) per track with only one imagemap element (e.g. the same imagemap element is used for the entire track as the entire (=both elements) glyph's coordinates are returned to the image_and_map function as one set of coordinate). Thank you for your help. Best regards, Anh Thu From cain at cshl.edu Thu Dec 21 09:47:32 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 21 Dec 2006 09:47:32 -0500 Subject: [Bioperl-l] multiple glyph elements in one track In-Reply-To: <458A9AD1.50907@gsf.de> References: <458A9AD1.50907@gsf.de> Message-ID: <1166712453.3739.53.camel@localhost.localdomain> Hello Anh Thu, You can provide a callback for the glyph argument that returns different glyphs depending on what you want to do (ie, how you've coded your callback). This example is from the perldoc for Bio::Graphics::Panel: $panel->add_track(\@exons, -glyph => sub { my $feature = shift; $feature->source_tag eq ?curated? ? ?ellipse? : ?generic?; } ); Scott On Thu, 2006-12-21 at 15:31 +0100, Anh-Thu Tieu wrote: > Hello, > > I use bioperl 1.5.2. Does anyone know how I could create two seperate > glyph elements on the same track with the Bio::Graphics::Panel module? > My aim is to have two (e.g. two different) clickable imagemap elements > on the same track. Until now I can merely create two glyph elements > (transcript2 or generic options) per track with only one imagemap > element (e.g. the same imagemap element is used for the entire track as > the entire (=both elements) glyph's coordinates are returned to the > image_and_map function as one set of coordinate). > > Thank you for your help. > > Best regards, > > Anh Thu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/9ec29c3e/attachment.bin From cain.cshl at gmail.com Thu Dec 21 15:03:48 2006 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 21 Dec 2006 15:03:48 -0500 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> <1166604008.4588f6e87cccc@www.studentmail.otago.ac.nz> <1166621113.3739.11.camel@localhost.localdomain> <1166642653.45898dddbd8cf@www.studentmail.otago.ac.nz> <1166643051.3739.28.camel@localhost.localdomain> <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz> Message-ID: <1166731428.3739.71.camel@localhost.localdomain> Hi Stephan, About your bioperl mail: did you cancel it, or did it just disappear? If the latter, I might have accidentally deleted it, sorry :-/ So 'GBrowse is running' means that you can see the sample yeast chr1 database, browse around, etc, right? I still don't know what is up with the warning but my guess is that everything is working there. As for your question about writing a callback, the reason it's not working is that the attributes method returns a list (typically but not always with only one element), so what you are really doing in your test is this "number of elements in the list > 1200", which is almost always going to be false. You should change it to this: my $feature = shift; my ($score) = $feature->attributes('score'); if ($score > 1200) { ...etc... Finally, if you really want to test that you are using the correct bioperl, you can put this simple cgi in your cgi-bin directory as test_biographics.pl, set it as world executable and go to http://localhost/cgi-bin/test_biographics.pl (and, yes, I use strict and warnings even when the script is only 10 lines long :-) : #!/usr/bin/perl use strict; use warnings; use Bio::Graphics::Panel; use CGI qw/:standard/; print header(), start_html, p("Bio::Graphics::Panel api_version is ".Bio::Graphics::Panel->api_version), p("It should be 1.654 for BioPerl 1.5.2"), end_html; Scott On Fri, 2006-12-22 at 08:27 +1300, Stephan Roessner wrote: > Hi Scott, > > responded to group but did get through. > So I reply back to you. > > I installed Class-Base-0.03 using CPAN. > > Reinstalling GBrowse gives me still a warning like: > Warning: prerequisite Bio::Perl 1.52 not found. We have 1.0050021. > Writing Makefile for Bio::Graphocs::Browser::CAlign > Writing Makefile for Generic-Genome-Browser. > > GBrowse is running but I cannot access attributes and/or the score column > of .gff files. Is this related to the warning? > > To get an attribute I use > > my $feature = shift; > if ($feature->attributes('score') > 1200) { > return 'blue'; > } else { > return 'pink'; > } > But I retrieve not data using $feature-> > > Can I somehaow verify what bioperl version GBrowse is using? > > Stephan, > > > > Quoting Scott Cain : > > > Stephan, > > > > Yes, it is in cpan: > > > > http://search.cpan.org/~abw/Class-Base-0.03/lib/Class/Base.pm > > > > The cpan shell should be able to install it. > > > > Whether or not that works, please respond to the mailing list so that > > the rest of the conversation can be archived. > > > > Scott > > > > > > On Thu, 2006-12-21 at 08:24 +1300, Stephan Roessner wrote: > > > Hi Scott, > > > > > > No I didn't. > > > I had a look and couldn't find it. > > > It is not part of CPAN? > > > > > > Stephan > > > > > > > > > Quoting Scott Cain : > > > > > > > Stephan, > > > > > > > > Did you install Class::Base? It was inadvertantly left out the > > > > install > > > > document, but is required. > > > > > > > > Scott > > > > > > > > > > > > On Wed, 2006-12-20 at 21:40 +1300, Stephan Roessner wrote: > > > > > Hi all, > > > > > > > > > > I did sudo ./Build install --uninst 1 and got the error > > > > > * ERROR: Confiduration was initially created with MOdule::Build > > > > version > > > > > '0.2805', but we are now using '0.2806'. ... > > > > > > > > > > So I ran perl Build.PL and got the message > > > > > Creating new 'Buid' script for 'bioperl' verion '1.0050021'. > > > > > > > > > > I did run sudo ./Build install --uninst 1 again. > > > > > Seems to be fine with no error messages. > > > > > > > > > > When I run perl Makefile.PL for GBrowse 1.66-RC2 it results in > > > > > > > > > > Warning: prerequisite Bio::Perl 1.52 not found. We have > > 1.0050021. > > > > > Warning: prerequisite Class::Base 0 not found. > > > > > Writing Makefile for Bio::Graphocs::Browser::CAlign > > > > > Writing Makefile for Generic-Genome-Browser > > > > > > > > > > GBrowse is running but I have really troubles with aggregators > > trying > > > > to > > > > > use xyplot. It does not plot anything. So I thought the bioperl > > could > > > > be > > > > > the problem. > > > > > > > > > > Stephan > > > > > > > > > > > > > > > > > > > > Quoting Scott Cain : > > > > > > > > > > > I really don't think the BioPerl version detection is wrong. > > I > > > > > > actually > > > > > > don't check Bio::Root::Version::VERSION in Makefile.PL, I > > check > > > > > > Bio::Graphics::Panel->api_version. When it doesn't find the > > > > correct > > > > > > api_version, it gives a warning the BioPerl 1.5.2 is not > > installed. > > > > I > > > > > > have seen this happen when more than one BioPerl instance is > > > > installed > > > > > > and `perl Makefile.PL` finds the wrong one first. My > > suggestion is > > > > to > > > > > > try reinstalling BioPerl and providing the --uninst 1 argument > > to > > > > > > remove > > > > > > older versions of BioPerl: > > > > > > > > > > > > sudo ./Build install --uninst 1 > > > > > > > > > > > > Scott > > > > > > > > > > > > > > > > > > On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote: > > > > > > > Stephan Roessner wrote: > > > > > > > > Dear support team, > > > > > > > > > > > > > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be > > able > > > > to > > > > > > use > > > > > > > > gbrowse. > > > > > > > > The installation seems to work (except of the test > > failures) > > > > but > > > > > > the > > > > > > > > gbrowse installation tells me that BIO::pERL 1.0050021 is > > > > > > installed, but > > > > > > > > of course it requires 1.52. > > > > > > > > > > > > > > > > Is there a chance to find out what went wrong? > > > > > > > > > > > > > > Nothing went wrong with the Bioperl installation (well, > > expect > > > > there > > > > > > > shouldn't have been any test failures - can you post those > > > > please?). > > > > > > > gbrowse simply defined its Bioperl requirement incorrectly. > > If > > > > you > > > > > > tell > > > > > > > me exactly where you downloaded gbrowse from and how you > > went > > > > about > > > > > > > installing it, and provide the exact, complete error message > > you > > > > got > > > > > > > from it, I might be able help the authors fix the problem. > > > > > > > > > > > > > > Or I'm pretty sure they can figure it our for themselves :) > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > Scott Cain, Ph. D. > > > > > > cain at cshl.edu > > > > > > GMOD Coordinator (http://www.gmod.org/) > > > > > > 216-392-3087 > > > > > > Cold Spring Harbor Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > ------------------------------------------------------------------------ > > > > Scott Cain, Ph. D. > > > > cain.cshl at gmail.com > > > > GMOD Coordinator (http://www.gmod.org/) > > > > 216-392-3087 > > > > Cold Spring Harbor Laboratory > > > > > > > > > > > > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. > > cain.cshl at gmail.com > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/f8621965/attachment-0001.bin From rvosa at sfu.ca Sat Dec 23 17:17:37 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Sat, 23 Dec 2006 14:17:37 -0800 Subject: [Bioperl-l] [Summary] Re: suggestions for suitable 'taxon' object In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> Message-ID: <458DAB01.6080200@sfu.ca> The replies I've received so far indicate I should look into Bio::Taxon. I will probably come back with further questions/discussions as to how to link and cross reference taxa, sequences and nodes, but for now I should first look at the Bio::Taxon api (and unpack my other Christmas gifts). Thank you for all comments and suggestions. Happy holidays! Rutger Rutger Vos wrote: > Hi all, > > I am looking for a bioperl object that can be abused to function as a > suitable 'taxon' object, where I mean 'taxon' as understood by the NEXUS > file format (i.e. not strictly an entity from a taxonomy, but more loosely > an OTU). > > The object would primarily function as a way to relate nodes in trees to > sequences in an alignment (a foreign key that both nodes and sequences refer > to), and secondarily as the keeper of the canonical name of the OTU, such > that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo > sapiens (constrained monophyly)' can still be understood to refer to the > same thing - the OTU 'Homo sapiens sapiens' (for example). > > I was thinking that a (possibly expanded) Bio::Species might work if there > was some sensible way of appending references to node and sequence objects > to it (or otherwise associate them with each other), but I am writing *to > solicit any and all suggestions*. I am looking for something similar to > Bio::Phylo::Taxa::Taxon. > > Any and all comments and suggestions greatly appreciated! > > Best wishes, > > Rutger Vos > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger A. Vos Postdoctoral research fellow University of British Columbia Personal site: http://www.sfu.ca/~rvosa CIPRES: http://www.phylo.org Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From paul.boutros at utoronto.ca Sat Dec 23 22:36:59 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Sat, 23 Dec 2006 22:36:59 -0500 Subject: [Bioperl-l] Bio::Graphics::Glyph::dna Message-ID: <20061223223659.7sgfofa44mw4okks@webmail.utoronto.ca> Hi, I've been trying to get the dna glyph working and have had some problems. I'm using a fasta file, and am having some problems. This is ActiveState perl 5.8.8 (build 819) and BioPerl 1.5.2 on WinXP. I'm starting with a FASTA file, so I've tried: $panel->add_track( $seq, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); where $seq is a Bio::Seq object and I've tried it using a GFF $segment: my $db = Bio::DB::GFF->new( -adaptor=> 'berkeleydb', -create => 1, -dsn => 'temp' ); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary)_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); From paul.boutros at utoronto.ca Sat Dec 23 22:46:27 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Sat, 23 Dec 2006 22:46:27 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? Message-ID: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> Hello, I'm trying to get the dna glyph of Bio::Graphics to work and am having some problems. I'm starting with a fasta file, and I am running perl 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 If I try simply using a Bio::Seq object like this: $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: Can't locate object method "start" via package "Bio::Seq" at C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. And if I try creating a Bio::DB::GFFSegment object like this: my $db = Bio::DB::GFF->new( -adaptor => 'berkeleydb', -create => 1, -dsn => '/usr/local/share/gff/dmel' ); $db->initialize(1); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not implemented b y package Bio::DB::GFF::Segment. This is not your fault - author of Bio::DB::GFF::Segment should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::RootI::throw_not_implemented C:/Perl/site/lib/Bio/Root/RootI.pm:522 STACK: Bio::FeatureHolderI::get_SeqFeatures C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 STACK: Bio::Graphics::Glyph::_subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 STACK: Bio::Graphics::Glyph::subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Panel::_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 STACK: Bio::Graphics::Panel::_do_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 STACK: Bio::Graphics::Panel::add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 STACK: create_figure.pl:147 ---------------------------------------------------------------- I'm really unsure what to try next, any suggestions much appreciated! Paul From lstein at cshl.edu Sun Dec 24 12:23:18 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Sun, 24 Dec 2006 12:23:18 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? In-Reply-To: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> References: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> Message-ID: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com> Hi, You need to use either a Bio::SeqFeature::Generic object (with an attached Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to create Bio::DB::GFF::Segment objects directly. e.g. my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000); my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800); $feature->attach_seq($dna); Best, Lincoln On 12/23/06, Paul Boutros wrote: > > Hello, > > I'm trying to get the dna glyph of Bio::Graphics to work and am having > some problems. I'm starting with a fasta file, and I am running perl > 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 > > If I try simply using a Bio::Seq object like this: > $panel->add_track( > $segment, > -glyph => 'dna', > -do_gc => 'true', > -gc_window => 'auto' > ); > > I get the error: > Can't locate object method "start" via package "Bio::Seq" at > C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. > > > And if I try creating a Bio::DB::GFFSegment object like this: > my $db = Bio::DB::GFF->new( > -adaptor => 'berkeleydb', > -create => 1, > -dsn => '/usr/local/share/gff/dmel' > ); > > $db->initialize(1); > > $db->load_sequence_string( > $seq->primary_id(), > $seq->seq() > ); > > my $segment = Bio::DB::GFF::Segment->new( > $db, > $seq->primary_id(), > $seq->primary_id(), > 1, > $seq->length() > ); > > $panel->add_track( > $segment, > -glyph => 'dna', > -do_gc => 'true', > -gc_window => 'auto' > ); > > I get the error: > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not > implemented b > y package Bio::DB::GFF::Segment. > This is not your fault - author of Bio::DB::GFF::Segment should be blamed! > > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::RootI::throw_not_implemented > C:/Perl/site/lib/Bio/Root/RootI.pm:522 > STACK: Bio::FeatureHolderI::get_SeqFeatures > C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 > STACK: Bio::Graphics::Glyph::_subfeat > C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 > STACK: Bio::Graphics::Glyph::subfeat > C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 > STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 > STACK: Bio::Graphics::Glyph::Factory::make_glyph > C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 > STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 > STACK: Bio::Graphics::Glyph::Factory::make_glyph > C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 > STACK: Bio::Graphics::Panel::_add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 > STACK: Bio::Graphics::Panel::_do_add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 > STACK: Bio::Graphics::Panel::add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 > STACK: create_figure.pl:147 > ---------------------------------------------------------------- > > I'm really unsure what to try next, any suggestions much appreciated! > Paul > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From tgenahmet at gmail.com Wed Dec 27 16:38:43 2006 From: tgenahmet at gmail.com (Ahmet Kurdoglu) Date: Wed, 27 Dec 2006 14:38:43 -0700 Subject: [Bioperl-l] get mRNA details for a gene Message-ID: <9d8d0e2a0612271338t7cb15a63v5a08f624888b3f7b@mail.gmail.com> Hi, This is my first message to the list. I hope I get it right. Here is what I'm trying to accomplish: Get the mRNA details for a given gene (ex. DNASE2B) from its GenBank file. Using the web-interface I can search with this query: DNASE2B [sym] AND homo sapiens [ORGN] (returns only one result if you search 'gene' database) and get the GenBank file by clicking on NC_000001.9 and I can see the details for its two mRNAs. (I eventually need to get exon locations for both of its transcripts) However trying to do this in Perl has proved to be very difficult for me. I've tried various methods, including get_Seq_by_id, get_Seq_by_gi, and get_Stream_by_query. Before I explain in detail what I did I'd like to hear your ideas on how to accomplish this. Thank you. From sdavis2 at mail.nih.gov Thu Dec 28 16:57:03 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 28 Dec 2006 16:57:03 -0500 Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers In-Reply-To: References: Message-ID: <45943DAF.70100@mail.nih.gov> Michael Muratet US-Huntsville wrote: > Sean > > Thanks. I did consider the bioconductor package and downloaded your > write-up after it was recommended by the GEO folks. I've looked at R a > few times, but I never got proficient at it. I'm a lot better with perl. > > I've been looking at MINiML, too. It looked like it might be easier to > parse the SOFT file since the data is in-line with the attributes and > I'd have to use a SAX parser (not enough memory for DOM) for MINiML. > > NCBI must have parsers to get the data into their databases. Do you know > what they use? > Michael, You might want to look more specifically at the MINiML format specs. The data tables are stored as separate tab-delimited files with an external reference in the XML, so DOM parsing is possible with just a few kB of memory. Of course, to read in all of the data into memory at once will take a large amount of memory for some datasets. If you are going to load into a database, I would suggest reading the MINiML using DOM and then stepping through the data files one at a time, loading as you go. As for their parsers, I'm not sure what language they use, but writing a parser for either SOFT or MINiML isn't at all difficult. GEO uses a very simplified MAGE schema. As for R vs. perl, if you are planning on doing analyses of microarray data, I would highly suggest looking again at the R/bioconductor project. It will save you reinventing many wheels, such as getting annotation like gene ontology and pathways, doing stats, plotting, maintaining MIAME-compliant data structures, converting from multiple microarray formats, etc. Sean From allenday at ucla.edu Thu Dec 28 18:21:07 2006 From: allenday at ucla.edu (Allen Day) Date: Thu, 28 Dec 2006 15:21:07 -0800 Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers In-Reply-To: <45943DAF.70100@mail.nih.gov> References: <45943DAF.70100@mail.nih.gov> Message-ID: <5c24dcc30612281521o58b9f256sfa36c403f4c30bfa@mail.gmail.com> > As for R vs. perl, if you are planning on doing analyses of microarray > data, I would highly suggest looking again at the R/bioconductor > project. It will save you reinventing many wheels, such as getting > annotation like gene ontology and pathways, doing stats, plotting, > maintaining MIAME-compliant data structures, converting from multiple > microarray formats, etc. I'll second this statement WRT the data analysis. I'm doing all my analysis in R, Perl is just not good at dealing with large matrices or plotting. OTOH, I have also found that R is particularly weak when it comes to pipelining data and system interfacing. If your goal is to do ETL to a local database you're better off using Perl. I've found they're both about equally clunky for dealing with the experimental metadata, with a slight preference for Perl. That's more a reflection of the baroque MAGE model though than the programming languages themselves. -Allen > > Sean > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Paul.Boutros at utoronto.ca Sat Dec 30 02:43:32 2006 From: Paul.Boutros at utoronto.ca (Paul Boutros) Date: Sat, 30 Dec 2006 02:43:32 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? In-Reply-To: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com> Message-ID: <000c01c72be6$34d07e60$ec02a8c0@main> Hi Lincoln, Thanks, that worked like a charm! Can I suggest adding the example/explanation you gave me to the pod for Bio::Graphics::Glyph::dna? Here's a patch against the 1.5.2 version of dna.pm to do that. Paul 266c266,274 < in response to the dna() method. --- > in response to the dna() method. For example, you can use a > Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq > like this: > my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 ); > my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end => 800 ); > $feature->attach_seq($dna); > $panel->add_track( $feature, -glyph => 'dna' ); > > A Bio::Graphics::Feature object may also be used. _____ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Sunday, December 24, 2006 12:23 PM To: Paul.Boutros at utoronto.ca Cc: BioPerl Mailing List Subject: Re: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? Hi, You need to use either a Bio::SeqFeature::Generic object (with an attached Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to create Bio::DB::GFF::Segment objects directly. e.g. my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000); my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800); $feature->attach_seq($dna); Best, Lincoln On 12/23/06, Paul Boutros wrote: Hello, I'm trying to get the dna glyph of Bio::Graphics to work and am having some problems. I'm starting with a fasta file, and I am running perl 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 If I try simply using a Bio::Seq object like this: $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: Can't locate object method "start" via package "Bio::Seq" at C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. And if I try creating a Bio::DB::GFFSegment object like this: my $db = Bio::DB::GFF->new( -adaptor => 'berkeleydb', -create => 1, -dsn => '/usr/local/share/gff/dmel' ); $db->initialize(1); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not implemented b y package Bio::DB::GFF::Segment. This is not your fault - author of Bio::DB::GFF::Segment should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::RootI::throw_not_implemented C:/Perl/site/lib/Bio/Root/RootI.pm:522 STACK: Bio::FeatureHolderI::get_SeqFeatures C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 STACK: Bio::Graphics::Glyph::_subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 STACK: Bio::Graphics::Glyph::subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Panel::_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 STACK: Bio::Graphics::Panel::_do_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 STACK: Bio::Graphics::Panel::add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 STACK: create_figure.pl:147 ---------------------------------------------------------------- I'm really unsure what to try next, any suggestions much appreciated! Paul _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From er at xs4all.nl Sat Dec 30 19:05:16 2006 From: er at xs4all.nl (Erik) Date: Sun, 31 Dec 2006 01:05:16 +0100 (CET) Subject: [Bioperl-l] acquiring a local refseq + index Message-ID: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Hi all, I downloaded the refseq files (.gbff) and want to index the lot with Bio::DB::Flat. It turns out that there are many cases where the SOURCE and ORGANISM lines are messed up, sometimes to a degree where the indexing fails on a Bio::SeqIO::genbank error. I'd like to change Bio::SeqIO::genbank to let this parsing go at least so far as to make the indexing of the refseq files possible, and hopefully improving the taxonomic output ($seq->species->binomial is often mutilated at the moment). Is it still worthwhile to change parsing modules like Bio::SeqIO::genbank? Is anyone already working on a rewrite? Because if this is the case I may be better off writing my own indexing scheme? Below is (outline of) my indexing program, which uses Bio::DB::Flat::DBD. If anyone knows of a better way to get a locally searchable refseq flat file index, I would be very interested. Thanks for your help, Erikjan ------------- use Bio::DB::Flat; my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; my $db=Bio::DB::Flat->new( -directory => $refseq_dir, -dbname => 'refseq', -format => 'genbank', -index => 'bdb', -write_flag => 1, ); my @files = getfiles($refseq_dir); for my $f (@files) { db->build_index($f); } From hlapp at gmx.net Sat Dec 30 20:48:33 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 30 Dec 2006 20:48:33 -0500 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Message-ID: Can you send examples and the resulting error messages? Also, I'm assuming you running the 1.5.2 release of Bioperl; if not that's what I would try first. -hilmar On Dec 30, 2006, at 7:05 PM, Erik wrote: > Hi all, > > I downloaded the refseq files (.gbff) and want to index the lot with > Bio::DB::Flat. > > It turns out that there are many cases where the SOURCE and > ORGANISM lines > are messed up, sometimes to a degree where the indexing fails on a > Bio::SeqIO::genbank error. > > I'd like to change Bio::SeqIO::genbank to let this parsing go at > least so > far as to make the indexing of the refseq files possible, and > hopefully > improving the taxonomic output ($seq->species->binomial is often > mutilated > at the moment). > > Is it still worthwhile to change parsing modules like > Bio::SeqIO::genbank? > Is anyone already working on a rewrite? Because if this is the > case I may > be better off writing my own indexing scheme? > > Below is (outline of) my indexing program, which uses > Bio::DB::Flat::DBD. > If anyone knows of a better way to get a locally searchable refseq > flat > file index, I would be very interested. > > Thanks for your help, > > Erikjan > > > ------------- > use Bio::DB::Flat; > > my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; > my $db=Bio::DB::Flat->new( > -directory => $refseq_dir, > -dbname => 'refseq', > -format => 'genbank', > -index => 'bdb', > -write_flag => 1, > ); > my @files = getfiles($refseq_dir); > for my $f (@files) { > db->build_index($f); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat Dec 30 21:33:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 30 Dec 2006 20:33:23 -0600 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Message-ID: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> Agree with Hilmar, in that we need examples. If you are referring to your submitted bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2167 we could add this in as long as it passes (I'll try giving it a workout with my local bacterial seqs tonight or tomorrow). However, in the not-too-distant future your patch would likely be rendered obsolete, as any parsing in Bio::SeqIO modules pertaining to Bio::Species-related matters will be deprecated in favor of simple parsing (more foolproof, less uncertainty) and Bio::Taxon (which has optional db lookups using NCBI Taxonomy). Bio::Species and anything related to it are considered marked for deprecation. Fair warning... chris On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote: > Can you send examples and the resulting error messages? Also, I'm > assuming you running the 1.5.2 release of Bioperl; if not that's what > I would try first. > > -hilmar > > On Dec 30, 2006, at 7:05 PM, Erik wrote: > >> Hi all, >> >> I downloaded the refseq files (.gbff) and want to index the lot with >> Bio::DB::Flat. >> >> It turns out that there are many cases where the SOURCE and >> ORGANISM lines >> are messed up, sometimes to a degree where the indexing fails on a >> Bio::SeqIO::genbank error. >> >> I'd like to change Bio::SeqIO::genbank to let this parsing go at >> least so >> far as to make the indexing of the refseq files possible, and >> hopefully >> improving the taxonomic output ($seq->species->binomial is often >> mutilated >> at the moment). >> >> Is it still worthwhile to change parsing modules like >> Bio::SeqIO::genbank? >> Is anyone already working on a rewrite? Because if this is the >> case I may >> be better off writing my own indexing scheme? >> >> Below is (outline of) my indexing program, which uses >> Bio::DB::Flat::DBD. >> If anyone knows of a better way to get a locally searchable refseq >> flat >> file index, I would be very interested. >> >> Thanks for your help, >> >> Erikjan >> >> >> ------------- >> use Bio::DB::Flat; >> >> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; >> my $db=Bio::DB::Flat->new( >> -directory => $refseq_dir, >> -dbname => 'refseq', >> -format => 'genbank', >> -index => 'bdb', >> -write_flag => 1, >> ); >> my @files = getfiles($refseq_dir); >> for my $f (@files) { >> db->build_index($f); >> } >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Dec 31 14:36:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 31 Dec 2006 13:36:47 -0600 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> Message-ID: <37FB5BDF-25A9-44F0-9E82-964684A73A58@uiuc.edu> As a followup, I have committed the fix Erik had in Bugzilla. I don't know if this helps with the below issue Erik describes (they sound unrelated). chris On Dec 30, 2006, at 8:33 PM, Chris Fields wrote: > Agree with Hilmar, in that we need examples. If you are referring to > your submitted bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2167 > > we could add this in as long as it passes (I'll try giving it a > workout with my local bacterial seqs tonight or tomorrow). However, > in the not-too-distant future your patch would likely be rendered > obsolete, as any parsing in Bio::SeqIO modules pertaining to > Bio::Species-related matters will be deprecated in favor of simple > parsing (more foolproof, less uncertainty) and Bio::Taxon (which has > optional db lookups using NCBI Taxonomy). Bio::Species and anything > related to it are considered marked for deprecation. Fair warning... > > chris > > On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote: > >> Can you send examples and the resulting error messages? Also, I'm >> assuming you running the 1.5.2 release of Bioperl; if not that's what >> I would try first. >> >> -hilmar >> >> On Dec 30, 2006, at 7:05 PM, Erik wrote: >> >>> Hi all, >>> >>> I downloaded the refseq files (.gbff) and want to index the lot with >>> Bio::DB::Flat. >>> >>> It turns out that there are many cases where the SOURCE and >>> ORGANISM lines >>> are messed up, sometimes to a degree where the indexing fails on a >>> Bio::SeqIO::genbank error. >>> >>> I'd like to change Bio::SeqIO::genbank to let this parsing go at >>> least so >>> far as to make the indexing of the refseq files possible, and >>> hopefully >>> improving the taxonomic output ($seq->species->binomial is often >>> mutilated >>> at the moment). >>> >>> Is it still worthwhile to change parsing modules like >>> Bio::SeqIO::genbank? >>> Is anyone already working on a rewrite? Because if this is the >>> case I may >>> be better off writing my own indexing scheme? >>> >>> Below is (outline of) my indexing program, which uses >>> Bio::DB::Flat::DBD. >>> If anyone knows of a better way to get a locally searchable refseq >>> flat >>> file index, I would be very interested. >>> >>> Thanks for your help, >>> >>> Erikjan >>> >>> >>> ------------- >>> use Bio::DB::Flat; >>> >>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; >>> my $db=Bio::DB::Flat->new( >>> -directory => $refseq_dir, >>> -dbname => 'refseq', >>> -format => 'genbank', >>> -index => 'bdb', >>> -write_flag => 1, >>> ); >>> my @files = getfiles($refseq_dir); >>> for my $f (@files) { >>> db->build_index($f); >>> } >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Fri Dec 1 02:47:03 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 07:47:03 +0000 Subject: [Bioperl-l] Upgrading my BioPerl RC via ppm? In-Reply-To: <519167.29410.qm@web50804.mail.yahoo.com> References: <519167.29410.qm@web50804.mail.yahoo.com> Message-ID: <456FDDF7.1080403@sheffield.ac.uk> Caitlin wrote: > Hi all. > > I'm currently using BioPerl 1.5.2 RC2 but I've seen multiple references > to 1.5.2 RC5. Can anyone tell me how to upgrade to the latest version? > The ppm GUI (ActivePerl Build 819) doesn't include any BioPerl packages > among those deemed upgradable. > > Thanks, > > ~Katie > > > Hi Katie, Currently there is not an RC5 PPM package available - we are hoping to have the official 1.5.2 release out pretty soon and there will definitely be a PPM package for that! Are you experiencing any problems with your current version of bioperl? If not, there is no need to worry, once we've released an updated PPM package your PPM GUI should then be able to see it as an upgrade - hopefully! :o) Sendu, I know you were working on automatically generating PPM packages - what is the current situation with regards to this? Nath --- avast! Antivirus: Inbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 07:46:58 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 07:47:04 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 04:00:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:00:18 +0000 Subject: [Bioperl-l] BLASTing with a seqio/seq object... In-Reply-To: <456F27E9.70205@york.ac.uk> References: <01ba01c714a2$b9659c10$15327e82@pyrimidine> <456F27E9.70205@york.ac.uk> Message-ID: <456FEF22.4090004@sendu.me.uk> Samantha Thompson wrote: You missed a step... > use strict; > use Bio::Perl; > use Bio::Seq; > use Bio::SeqIO; > > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > > #seq bit > > #$seq_obj = Bio::Seq->new(-format => 'fasta'); > > my $seqio_obj = Bio::SeqIO->new(-file => > "/biol/people/mres/st537/MalEfasta.txt", -format => 'fasta'); > > my $seq_obj = $seqio_obj->next_seq; > > > > #blast bit > > my $remote_blast = Bio::Tools::Run::RemoteBlast->new ( > -prog => 'blastp', -db => 'nr', -expect => '1e-15' ); > > my $blast_report = $remote_blast->submit_blast($seq_obj); Go back to the Bptutorial: http://www.bioperl.org/wiki/Bptutorial.pl#Running_BLAST_.28using_RemoteBlast.pm.29 And you'll see that submit_blast doesn't return a SearchIO object. For a complete working example see the synopsis for RemoteBlast: http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/RemoteBlast.html > #new part for SearchIO... > > while( my $result = $blast_report->next_result ) { > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > if( $hsp->length('total') > 100 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Hit= ", $hit->name, > ",Length=", $hsp->length('total'), > ",Percent_id=", $hsp->percent_identity, "\n"; > } > } > } > } > } From bix at sendu.me.uk Fri Dec 1 04:03:13 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:03:13 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> Message-ID: <456FEFD1.4070704@sendu.me.uk> pelikan at cs.pitt.edu wrote: > Hello all, > > I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, > without Cygwin. The "make test"s have all completed without error. This > is my first time dealing with bioperl, so bear with me. > > I've successfully loaded the most recent taxonomy information using the > biosql-schema scripts. After this, I attempted to load the most recent > release of the uniprot flat file dataset with the following command: > > load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass > ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat > > I am subsequently greeted by many of the following errors: > > Could not store Q7N3Q6: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: The supplied lineage does not start near 'Photorhabdus luminescens > subsp. laumondii' In your uniprot_sprot.dat file there'll be some kind of entry with that Photorhabdus species. Can you post that entry (sans sequence if it has one) so I can take a look at it? Maybe post a few that cause problems, and a few that don't. From bix at sendu.me.uk Fri Dec 1 04:19:09 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:19:09 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <000301c714b4$7846e790$15327e82@pyrimidine> References: <000301c714b4$7846e790$15327e82@pyrimidine> Message-ID: <456FF38D.3070508@sendu.me.uk> Chris Fields wrote: >> Nathan S. Haigh wrote: >>> More updates: >>> >>> After the failed install I updating Module::Build, and re-ran the >>> install, I get: >>> >>> -- snip -- >>> Creating new 'Build' script for 'bioperl' version '1.005002005' >>> Warning: while trying to determine prerequisites for >>> S/SE/SENDU/bioperl-1.5.2_005-RCb.tar.gz wi th the help of >>> Module::Build the following error occurred: 'Failed to re-load >>> 'ModuleBuildBiope >>> rl': Can't locate ModuleBuildBioperl.pm in @INC (@INC contains: >>> _build\lib C:\Perl\site\lib C:\ >>> Perl\lib C:\Documents and Settings\test) at (eval 105) line 1. >>> ' >>> >>> Falling back to META.yml for prerequisites 'YAML' not installed, >>> cannot parse 'C:\Perl\cpan\build\bioperl-1.5.2_005-RC\META.yml' >>> -- snip -- >> I had that problem fleetingly and it drove me crazy because >> later I couldn't reproduce it. Is it reproducible on your end? > > During Module::Build installation I see this: > > ... > t\metadata........ok > 8/43 skipped: YAML_support feature is not enabled You were pointing out the YAML issue? I think I'm less concerned with that (solution: install YAML) and much more concerned with why it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The module in question is in the same dir as the Build script, so it should be found automatically. The only thing I can think of is that CPAN doesn't manage to chdir to the directory. Hopefully I'll be able to reproduce this and then I can investigate further. From n.haigh at sheffield.ac.uk Fri Dec 1 04:26:22 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 09:26:22 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF233.6040704@sendu.me.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> Message-ID: <456FF53E.90907@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: >> >> I know that setting up the PPM is a pain, but I have to say it is >> much faster, and all required PPMs are available. Which makes me >> curious: why bother with trying out a CPAN installation process at >> this point, especially when you have to use PPM to install some of >> the prereqs properly anyway? > > Firstly, problems discovered and resulting fixes will help all > platforms, not just Windows. So thanks for trying it out and reporting > back. Secondly, the PPM method, like Bundle::BioPerl, is > all-or-nothing. The CPAN installation method allows an interactive > choice of which optional things to install. > > If what you say about DB_File is true, then that's a great shame! > > > So I can do further trouble-shooting of my own, what is the sure-fire > way to completely clean-out an ActivePerl install, including any > modules you might have installed with PPMs or CPAN? > > In addition, using CPAN allows you to run the test suite easily without the need to download it separately and run it after a PPM install. I don't know of a way to clean out ActivePerl - I use VMWare Workstation and have a virtual machine with a fresh install of WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas? Nath --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 09:26:23 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 04:13:23 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:13:23 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> Message-ID: <456FF233.6040704@sendu.me.uk> Chris Fields wrote: > > I know that setting up the PPM is a pain, but I have to say it is much > faster, and all required PPMs are available. Which makes me curious: > why bother with trying out a CPAN installation process at this point, > especially when you have to use PPM to install some of the prereqs > properly anyway? Firstly, problems discovered and resulting fixes will help all platforms, not just Windows. So thanks for trying it out and reporting back. Secondly, the PPM method, like Bundle::BioPerl, is all-or-nothing. The CPAN installation method allows an interactive choice of which optional things to install. If what you say about DB_File is true, then that's a great shame! So I can do further trouble-shooting of my own, what is the sure-fire way to completely clean-out an ActivePerl install, including any modules you might have installed with PPMs or CPAN? From cjfields at uiuc.edu Fri Dec 1 09:08:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 08:08:55 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF233.6040704@sendu.me.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> Message-ID: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote: > Chris Fields wrote: >> I know that setting up the PPM is a pain, but I have to say it is >> much faster, and all required PPMs are available. Which makes me >> curious: why bother with trying out a CPAN installation process at >> this point, especially when you have to use PPM to install some of >> the prereqs properly anyway? > > Firstly, problems discovered and resulting fixes will help all > platforms, not just Windows. So thanks for trying it out and > reporting back. Secondly, the PPM method, like Bundle::BioPerl, is > all-or-nothing. The CPAN installation method allows an interactive > choice of which optional things to install. Yes, I understand that. My point is, you are generally forced to use PPM anyway due to several modules not installing properly (all the 'trouble' distributions, like DB_File, are available via PPM). I can see using CPAN as an alternative way of installing Bioperl for a distribution, or as the primary method via CVS or manually, but not for distributions. At least not until the kinks are worked out for Windows users. What are the significant issues for a bioperl PPM installation, based on the last PPM Nathan set up? If there is a redirection problem, could we just modify the installation docs to address that ('due to problem X, you must install the following modules prior to installing BioPerl 1.5.2...'). > If what you say about DB_File is true, then that's a great shame! We need to go through the various prereqs to see which ones need PPM vs CPAN. In general, anything that requires C code compilation (and thus needs a recent VC++) will likely be an issue. > So I can do further trouble-shooting of my own, what is the sure- > fire way to completely clean-out an ActivePerl install, including > any modules you might have installed with PPMs or CPAN? Not sure, beyond uninstalling and cleaning out the Perl directory (I think you might be able to delete the site/ directory, but I haven't tried it). ActivePerl comes preloaded with a number of non-core modules which makes it tricky to uninstall them one-by-one. chris From cjfields at uiuc.edu Fri Dec 1 09:10:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 08:10:34 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <456FF38D.3070508@sendu.me.uk> References: <000301c714b4$7846e790$15327e82@pyrimidine> <456FF38D.3070508@sendu.me.uk> Message-ID: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote: > You were pointing out the YAML issue? I think I'm less concerned > with that (solution: install YAML) and much more concerned with why > it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The > module in question is in the same dir as the Build script, so it > should be found automatically. > > The only thing I can think of is that CPAN doesn't manage to chdir > to the directory. Hopefully I'll be able to reproduce this and then > I can investigate further. My thought was the two were related in some way. I'm not sure to tell the truth. -chris From bix at sendu.me.uk Fri Dec 1 09:17:41 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 14:17:41 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> Message-ID: <45703985.5050203@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> I know that setting up the PPM is a pain, but I have to say it is >>> much faster, and all required PPMs are available. Which makes me >>> curious: why bother with trying out a CPAN installation process at >>> this point, especially when you have to use PPM to install some of >>> the prereqs properly anyway? >> >> Firstly, problems discovered and resulting fixes will help all >> platforms, not just Windows. So thanks for trying it out and reporting >> back. Secondly, the PPM method, like Bundle::BioPerl, is >> all-or-nothing. The CPAN installation method allows an interactive >> choice of which optional things to install. > > Yes, I understand that. My point is, you are generally forced to use > PPM anyway due to several modules not installing properly (all the > 'trouble' distributions, like DB_File, are available via PPM). I can > see using CPAN as an alternative way of installing Bioperl for a > distribution, or as the primary method via CVS or manually, but not for > distributions. At least not until the kinks are worked out for Windows > users. CPAN isn't being suggested as the primary or preferred installation method for Windows. That will still be PPM. I'm mentioning CPAN / manual installation in the Windows INSTALL docs for the benefit of anyone who wants a simple install and test environment when checking out from CVS. > What are the significant issues for a bioperl PPM installation None that I'm aware of - I just need to find the time to start looking into generating an appropriate PPD. Hopefully Nathan's wiki page on the subject will be all I need. From bix at sendu.me.uk Fri Dec 1 09:18:43 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 14:18:43 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> References: <000301c714b4$7846e790$15327e82@pyrimidine> <456FF38D.3070508@sendu.me.uk> <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> Message-ID: <457039C3.30907@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote: > >> You were pointing out the YAML issue? I think I'm less concerned with >> that (solution: install YAML) and much more concerned with why it >> can't reload ModuleBuildBioperl (claiming it isn't in @INC). The >> module in question is in the same dir as the Build script, so it >> should be found automatically. >> >> The only thing I can think of is that CPAN doesn't manage to chdir to >> the directory. Hopefully I'll be able to reproduce this and then I can >> investigate further. > > My thought was the two were related in some way. I'm not sure to tell > the truth. They weren't, using YAML is the fall-back position incase of earlier failure. I've fixed it now in any case. From gwu at molbio.mgh.harvard.edu Fri Dec 1 10:19:42 2006 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Fri, 01 Dec 2006 10:19:42 -0500 Subject: [Bioperl-l] One more load_seqdatabase.pl question In-Reply-To: <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net> References: <4a9ad8800611270907x64a4a4c0jad92bff6641e300@mail.gmail.com> <53C6D534-6E36-4061-B955-E74537839265@gmx.net> <456CA667.6010609@molbio.mgh.harvard.edu> <456F5648.6070207@molbio.mgh.harvard.edu> <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net> Message-ID: <4570480E.1020701@molbio.mgh.harvard.edu> Thanks Hilmar. I did include the -lookup switch on the command line. The warning messages say that the code failed to "INSERT" instead of "UPDATE", which sounds like a match was not found. But I was just loading the same Genbank file for the second time. To test if it actually updated the records, I made a minor modification on one of the COMMENT feature. Unfortunately it's not updated. By the way, the test genbank file has four "COMMENT" features but they are different. Any idea what's happening there? I wonder if it's a bad idea to "UPDATE" a sequence. Say I got a new sequence version with 5 features removed, 5 features modified and 5 features new. If only --lookup is included, according to the POD, the 5 new features will be inserted, the 5 modified features will be updated and the 5 removed features will be in the database untouched. This rendered the new sequence records a mixture of old and new versions. I did not see a reason anyone would like to have a sequence like this. Either include -remove to replace the old version if only one version is needed, or put the new version under a different name space if multiple versions are needed. Do I have the correct understanding of these issues? I deeply appreciate your help. Gang Hilmar Lapp wrote: > Right. You need to tell it to lookup sequences first if you know that > you are loading sequences which may be in the database already (see > the POD of load_seqdatabase.pl, switch --lookup; there are several > other command line options that control what will happen if a sequence > entry is already present in the database.). > > The messages in you report are warnings, not errors. It looks like > some of the comments are duplicated for a sequence, it doesn't look > like reason for concern. Is not so good if you get errors thrown. > > -hilmar > > On Nov 30, 2006, at 5:08 PM, gang wu wrote: > >> Thanks Hilmar. Do you mean the NVL() clause will make >> load_seqdatabase.pl not work when update? >> >> I have problem with updating. Seems load_seqdatabase.pl only tries to >> insert instead of update. I used one of the test genbank file coming >> whith bioperl-db. Please take a look at the attached output. >> >> Thanks. >> >> Gang >> >> ========================================= >> >perl load_seqdatabase.pl -lookup -host elegans -driver Oracle >> -dbname sparc -dbuser biosqldb-sgowner -dbpass PASS -format genbank >> -namespace test >> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb >> Loading >> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb >> ... >> >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, >> values were ("This sequence was reannotated via the Ensembl system. >> Please visit the Ensembl web site, http://www.ensembl.org/ for more >> information. ","1") FKs (389109) >> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated >> (DBD ERROR: OCIStmtExecute) >> --------------------------------------------------- >> >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, >> values were ("The /gene indicates a unique id for a gene, /cds a >> unique id for a translation and a /exon a unique id for an exon. >> These ids are maintained wherever possible between versions. For more >> information on how to interpret the feature table, please visit >> http://www.ensembl.org/Docs/embl.html. ","2") FKs (389109) >> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated >> (DBD ERROR: OCIStmtExecute) >> --------------------------------------------------- >> ... >> ... >> ========================================================== >> Hilmar Lapp wrote: >>> These are the protein translations stored in the feature table as >>> tags of features, right? You can change the type of the column >>> (although there may be some issues when you update the column >>> because the NVL() clause won't work if I recall that correctly), but >>> doing so will deprive you of any 'normal' searches against that >>> column. (You can still use functions >from the DBMS_LOB package, but >>> they will be much slower and are completely non-standard.) It is up >>> to you whether that is too big of a price to pay for having some >>> redundant protein translations (translating the feature's DNA >>> sequence should give you the same) in the database. I always trimmed >>> those feature tags off (using a custom SeqProcessor). An alternative >>> is to convert these feature tags into actual bioentries (i.e., >>> Bio::Seq objects; again, a custom SeqProcessor will allow you to do >>> that). -hilmar On Nov 28, 2006, at 4:13 PM, gang wu wrote: >>>> Hi everyone, I'm using load_seqdatabase.pl to upload some Genbank >>>> genome sequences to my Oracle BioSQL database. I saw some >>>> errors(See attached warning message) related to >>>> seqfeature_qualifier_value (SG_SEQFEATURE_QUALIFIER_ASSOC.VALUE >>>> column), which has Varchar2 data type of maximum 4000 bytes. Did >>>> anybody mention this issue before? Should I just modify the column >>>> to a type being able store more data such as LONG or CLOB? Thanks. >>>> Gang Log information: ============================================ >>>> load_seqdatabase.pl -host elegans -driver Oracle -dbname sparc >>>> -dbuser biosqldb-sgowner -dbpass PASS -format genbank -namespace >>>> genbank /genomeseq/arabidopsis//NC_003070.gbk Loading >>>> /genomeseq/arabidopsis//NC_003070.gbk ... -------------------- >>>> WARNING --------------------- MSG: SimpleValueAdaptor::add_assoc: >>>> unexpected failure of statement execution: ORA-01461: can bind a >>>> LONG value only for insert into a LONG column (DBD ERROR: error >>>> possibly near <*> indicator at char 12 in 'INSERT INTO >>>> <*>seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) >>>> VALUES (:p1, :p2, :p3, :p4)') name: INSERT ASSOC [2] >>>> Bio::SeqFeature::Generic;Bio::Annotation::SimpleValue values: >>>> FK[Bio::SeqFeature::Generic]:14898, >>>> FK[Bio::Annotation::SimpleValue]:800, >>>> value:"MVAVTGEVLHLLRRYLGEYVHGLSTEALRISVWKGDVVLKDLKLKAEALNSLKLPVAVKSGFV >>>> GTITLKVPWKSLGKEPVIVLIDRVFVLAYPAPDDRTLKFFTLVGTEFAYTNYIPGGRQGKASRNQASADR >>>> GTSYFWLMELHGYEAETATLEARAKSKLGSPPQGNSWLGSIIATIIGNLKVSISNVHIRYEDSTRDSSEI >>>> LASFFSYFNNICSSNPGHPFAAGITLAKLAAVTMDEEGNETFDTSGALDKLRKSLQLERLALYHDSNSFP >>>> WEIEKQWDNITPEEWIEMFEDGIKEQTEHKIKSKWALNRHYLLSPINGSLKYHRLGNQERNNPEIPFERA >>>> SVILNDVNVTITEEQYHDWIKLVEVVSRYKTYIEISHLRPMVPVSEAPRLWWRFAAQASLQQKRLWYTRY >>>> IQLYANFLQQSSDVNYPEMREIEKDLDSKVILLWRLLAHAKVESVKSKEAAEQRKLKKGGWFSFNWRTEA >>>> EDDPEVDSVAGGSKLMEERLTKDEWKAINKLLSHQPDEEMNLYSGKDMQNMTHFLVTVSIGQGAARIVDI >>>> NQTEVLCGRFEQLDVTTKFRHRSTQCDVSLRFYGLSAPEGSLAQSVSSERKTNALMASFVNAPIGENIDW >>>> RLSATISPCHATIWTESYDRVLEFVKRSNAVSPTVALETAAVLQMKLEEVTRRAQEQLQIVLEEQSRFAL >>>> DIDIDAPKVRIPLRASGSSKCSSHFLLDFGNFTLTTMDTRSEEQRQNLYSRFCISGRDIAAFFTDCGSDN >>>> QGCSLVMEDFTNQPILSPILEKADNVYSLIDRCGMAVIVDQIKVPHPSYPSTRISIQVPNIGVHFSPTRY >>>> MRIMQLFDILYGAMKTYSQAPVDHMPDGIQPWSPTDLASDARILVWKGIGNSVATWQSCRLVLSGLYLYT >>>> FESEKSLDYQRYLCMAGRQVFEVPPANIGGSPYCLAVGVRGTDLKKALESSSTWIIEFQGEEKAAWLRGL >>>> VQATYQASA! >>>> PLSGDVLGQTSDGDGDFHEPQTRNMKAADLVITGALVETKLYLYGKIKNECDEQVEEVLLLKVLASGGKV >>>> HLISSESGLTVRTKLHSLKIKDELQQQQSGSAQYLAYSVLKNEDIQESLGTCDSFDKEMPVGHADDEDAY >>>> TDALPEFLSPTEPGTPDMDMIQCSMMMDSDEHVGLEDTEGGFHEKDTSQGKSLCDEVFYEVQGGEFSDFV >>>> SVVFLTRSSSSHDYNGIDTQMSIRMSKLEFFCSRPTVVALIGFGFDLSTASYIENDKDANTLVPEKSDSE >>>> KETNDESGRIEGLLGYGKDRVVFYLNMNVDNVTVFLNKEDGSQLAMFVQERFVLDIKVHPSSLSVEGTLG >>>> NFKLCDKSLDSGNCWSWLCDIRDPGVESLIKFKFSSYSAGDDDYEGYDYSLSGKLSAVRIVFLYRFVQEV >>>> TAYFMGLATPHSEEVIKLVDKVGGFEWLIQKDEMDGATAVKLDLSLDTPIIVVPRDSLSKDYIQLDLGQL >>>> EVSNEISWHGCPEKDATAVRVDVLHAKILGLNMSVGINGSIGKPMIREGQGLDIFVRRSLRDVFKKVPTL >>>> SVEVKIDFLHAVMSDKEYDIIVSCTSMNLFEEPKLPPDFRGSSSGPKAKMRLLADKVNLNSQMIMSRTVT >>>> ILAVDINYALLELRNSVNEESSLAHVAVRASEPNSSISWMTSLSETDLYVSVPKVSVLDIRPNTKPEMRL >>>> MLGSSVDASKQASSESLPFSLNKGSFKRANSRAVLDFDAPCSTMLLMDYRWRASSQSCVLRVQQPRILAV >>>> PDFLLAVGEFFVPALRAITGRDETLDPTNDPITRSRGIVLSEPLYKQTEDVVHLSPRRQLVADSLGIDEY >>>> TYDGCGKVISLSEQGEKDLNVGRLEPIIIVGHGKKLRFVNVKIKNGSLLSKCIYLSNDSSCLFSPEDGVD >>>> ISMLENASSNPENVLSNAHKSSDVSDTCQYDSKSGQSFTFEAQVVSPEFTFFDGTKSSLDDSSAVEKLLR >>>> VKLDFNFM! >>>> YASKEKDIWVRALLKNLVVETGSGLIILDPVDISGGYTSVKEKTNMSLTSTDIYMHLSLSALSLLLNLQS >>>> QVTGALQSGNAIPLASCTNFDRIWVSPKENGPRNNLTIWRPQAPSNYVILGDCVTSRAIPPTQAVMAVSN >>>> TYGRVRKPIGFNRIGLFSVIQGLEGDNVQHSHNSNECSLWMPVAPVGYTAMGCVANIGSEQPPDHIVYCL >>>> SIWRADNVLGAFYAHTSTAAPSKKYSPGLSHCLLWNPLQSKTSSSSDPSSTSGSRSEQSSDQTGNSSGWD >>>> ILRSISKATSYHVSTPNFERIWWDKGGDLRRPVSIWRPVPRPGFAILGDSITEGLEPPALGILFKADDSE >>>> IAAKPVQFNKVAHIVGKGFDEVFCWFPVAPPGYVSLGCVLSKFDEAPHVDSFCCPRIDLVNQANIYEASV >>>> TRSSSSKSSQLWSIWKVDNQACTFLARSDLKRPPSRMAFAVGESVKPKTQENVNAEIKLRCFSLTLLDGL >>>> HGMMTPLFDTTVTNIKLATHGRPEAMNAVLISSIAASTFNPQLEAWEPLLEPFDGIFKLETYDTALNQSS >>>> KPGKRLRIAATNILNINVSAANLETLGDAVVSWRRQLELEERAAKMKEESAASRESGDLSAFSALDEDDF >>>> QTIVVENKLGRDIYLKKLEENSDVVVKLCHDENTSVWVPPPRFSNRLNVADSSREARNYMTVQILEAKGL >>>> HIIDDGNSHSFFCTLRLVVDSQGAEPQKLFPQSARTKCVKPSTTIVNDLMECTSKWNELFIFEIPRKGVA >>>> RLEVEVTNLAAKAGKGEVVGSLSFPVGHGESTLRKVASVRMLHQSSDAENISSYTLQRKNAEDKHDNGCL >>>> LISTSYFEKTTIPNTLRNMESKDFVDGDTGFWIGVRPDDSWHSIRSLLPLCIAPKSLQNDFIAMEVSMRN >>>> GRKHATFRCLATVVNDSDVNLEISISSDQNVSSGVSNHNAVIASRSSYVLPWGCLSKDNEQCLHIRPKVE >>>> NSHHSYAWGYCIAVSSGCGKDQPFVDQGLLTRQNTIKQSSRASTFFLRLNQLEKKDMLFCCQPSTGSKPL >>>> WLSVGADAS! >>>> VLHTDLNTPVYDWKISISSPLKLENRLPCPVKFTVWEKTKEGTYLERQHGVVSSRKSAHVYSADIQRPVY >>>> LTLAVHGGWALEKDPIPVLDISSNDSVSSFWFVHQQSKRRLRVSIERDVGETGAAPKTIRFFVPYWITND >>>> SYLPLSYRVVEIEPSENVEAGSPCLTRASKSFKKNPVFSMERRHQKKNVRVLESIEDTSPMPSMLSPQES >>>> AGRSGVVLFPSQKDSYVSPRIGIAVAARDSDSYSPGISLLELEKKERIDVKAFCKDASYYMLSAVLNMTS >>>> DRTKVIHLQPHTLFINRVGVSICLQQCDCQTEEWINPSDPPKLFGWQSSTRLELLKLRVKGYRWSTPFSV >>>> FSEGTMRVPVPKEDGTDQLQLRVQVRSGTKNSRYEVIFRPNSISGPYRIENRSMFLPIRYRQVEGVSESW >>>> QFLPPNAAASFYWENLGRRHLFELLVDGNDPSNSEKFDIDKIGDYPPRSESGPTRPIRVTILKEDKKNIV >>>> RISDWMPAIEPTSSISRRLPASSLSELSGNESQQSHLLASEDSEFHVIVELAELGISVIDHAPEEILYMS >>>> VQNLFVAYSTGLGSGLSRFKLRMQGIQVDNQLPLAPMPVLFRPQRTGDKADYILKFSVTLQSNAGLDLRV >>>> YPYIDFQGRENTAFLINIHEPIIWRIHEMIQQANLSRLSDPNSTAVSVDPFIQIGVLNFSEVRFRVSMAM >>>> SPSQRPRGVLGFWSSLMTALGNTENMPVRISERFHENISMRQSTMINNAIRNVKKDLLGQPLQLLSGVDI >>>> LGNASSALGHMSQGIAALSMDKKFIQSRQRQENKGVEDFGDIIREGGGALAKGLFRGVTGILTKPLEGAK >>>> SSGVEGFVSGFGKGIIGAAAQPVSGVLDLLSKTTEGANAMRMKIAAAITSDEQLLRRRLPRAVGADSLLR >>>> PYNDYRAQGQVILQLAESGSFLGQVDLFKVRGKFALTDAYESHFILPKGKVLMITHRRVILLQQPSNIMG >>>> QRKFIPAK! >>>> DACSIQWDILWNDLVTMELSDGKKDPPNSPPSRLILYLKAKPHDPKEQFRVVKCIPNSKQAFDVYSAIDQ >>>> AINLYGQNALKGMVKNKVTRPYSPISESSWAEGASQQMPASVTPSSTFGTSPTTSSS", >>>> rank:"1" -------------------------------------------------- >>>> ============================================= >>>> _______________________________________________ Bioperl-l mailing >>>> list Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From bosborne11 at verizon.net Fri Dec 1 09:55:18 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 01 Dec 2006 09:55:18 -0500 Subject: [Bioperl-l] An announcement Message-ID: bioperl-l, I would like to call your attention to a job posting and in doing so I realize that I?m probably breaking a rule of this list. I apologize and and acknowledge that I?ve transgressed. The reason I do this is because this is an interesting job that is relevant to a lot of what we do in this mailing list, and some of you might want to consider it. The posting is here: http://www.nescent.org/main/employment.html#gmodhelpdesk I encourage you to pass this on to anyone who you think might be interested. Thanks again, Brian O. From cjfields at uiuc.edu Fri Dec 1 11:49:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 10:49:32 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF53E.90907@sheffield.ac.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> <456FF53E.90907@sheffield.ac.uk> Message-ID: On Dec 1, 2006, at 3:26 AM, Nathan S. Haigh wrote: ... > In addition, using CPAN allows you to run the test suite easily > without the need to download it separately and run it after a PPM > install. A PPM, by design, is supposed to imply that the distribution passes tests for the specified platform, at that point in time, after all prereqs are installed and any additional postinstall operations (install C libraries, modify config files, etc) are complete. The ActiveState automated PPM building process dictates that; if it fails any test, it will not be made into a PPM. It's sort of a 'stamp of approval' that all tests pass, so you don't need to run them. However, a test may fail (and a PPM may not get generated) for pretty superficial reasons, such as the makefile not specifying that a module is needed, server issues, etc, so the automated process isn't fullproof. That's why Kobes and the other repositories are available, where the PPM/PPD is manually generated and made to work specifically for Windows (or whatever other platform). Saying that, it is completely up to the person packaging the distribution to follow those rules if one were to make a PPM manually. You don't even have to run tests prior to using 'nmake ppd'. We can currently state, though, that all tests pass when all prereqs are installed for this distribution. At least at this point in time! > I don't know of a way to clean out ActivePerl - I use VMWare > Workstation and have a virtual machine with a fresh install of > WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas? I haven't tried it that way. I have Parallels on Mac OS X (I run a SigmaPlot/Excel combo off it). My tests were using a native WinXP installation (i.e. not virtually) on my old Dell. It shouldn't make a difference; VMWare, Parallels, and the like should all run ActivePerl for WinXP since it's a virtual machine. Windows Vista, on the other hand... I think with PPM4 you can install to a custom directory. It may be possible to install all new modules to that directory, then you would at least have an idea of what was there (though I don't think you can delete it directly w/o screwing up the PPM database). chris From bix at sendu.me.uk Fri Dec 1 12:12:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 17:12:49 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> Message-ID: <45706291.80201@sendu.me.uk> pelikan at cs.pitt.edu wrote: > Hello all, > > I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, > without Cygwin. The "make test"s have all completed without error. This > is my first time dealing with bioperl, so bear with me. > > I've successfully loaded the most recent taxonomy information using the > biosql-schema scripts. After this, I attempted to load the most recent > release of the uniprot flat file dataset with the following command: > > load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass > ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat > > I am subsequently greeted by many of the following errors: > > Could not store Q7N3Q6: I extracted just Q7N3Q6 from ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz and was able to load it in using load_seqdatabase.pl under linux with no errors. If you make a file with just that sequence do you still get the error? Is anyone else able to reproduce the problem? From cjfields at uiuc.edu Fri Dec 1 12:57:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 11:57:18 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <45703985.5050203@sendu.me.uk> Message-ID: <006301c71572$24be8830$15327e82@pyrimidine> > Chris Fields wrote: > PPM). I can > > see using CPAN as an alternative way of installing Bioperl for a > > distribution, or as the primary method via CVS or manually, but not > > for distributions. At least not until the kinks are worked out for > > Windows users. > > CPAN isn't being suggested as the primary or preferred > installation method for Windows. That will still be PPM. I'm > mentioning CPAN / manual installation in the Windows INSTALL > docs for the benefit of anyone who wants a simple install and > test environment when checking out from CVS. That's fine by me. I think the focus is making sure the PPM works, but that shouldn't hold up the final 1.5.2 release. The PPM for previous releases was never released concurrently with the distribution (if at all); it generally followed by a few weeks to a few months past a final release. > > What are the significant issues for a bioperl PPM installation > > None that I'm aware of - I just need to find the time to > start looking into generating an appropriate PPD. Hopefully > Nathan's wiki page on the subject will be all I need. I'll try testing it out today and next week (the more people we have looking into the issue the better). I'm sure that Module::Build hasn't updated to using PPM4 XML formatting, but the tags are similar enough. I can always create a local PPM database using a similar directory structure to bioperl.org/DIST and test an installation from it. chris From n.haigh at sheffield.ac.uk Fri Dec 1 13:52:55 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 18:52:55 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine> References: <006301c71572$24be8830$15327e82@pyrimidine> Message-ID: <45707A07.7000106@sheffield.ac.uk> Chris Fields wrote: >> Chris Fields wrote: >> PPM). I can >> >>> see using CPAN as an alternative way of installing Bioperl for a >>> distribution, or as the primary method via CVS or manually, but not >>> for distributions. At least not until the kinks are worked out for >>> Windows users. >>> >> CPAN isn't being suggested as the primary or preferred >> installation method for Windows. That will still be PPM. I'm >> mentioning CPAN / manual installation in the Windows INSTALL >> docs for the benefit of anyone who wants a simple install and >> test environment when checking out from CVS. >> > > That's fine by me. I think the focus is making sure the PPM works, but that > shouldn't hold up the final 1.5.2 release. The PPM for previous releases > was never released concurrently with the distribution (if at all); it > generally followed by a few weeks to a few months past a final release. > > >>> What are the significant issues for a bioperl PPM installation >>> >> None that I'm aware of - I just need to find the time to >> start looking into generating an appropriate PPD. Hopefully >> Nathan's wiki page on the subject will be all I need. >> > > I'll try testing it out today and next week (the more people we have looking > into the issue the better). I'm sure that Module::Build hasn't updated to > using PPM4 XML formatting, but the tags are similar enough. I can always > create a local PPM database using a similar directory structure to > bioperl.org/DIST and test an installation from it. > > chris > To clarify a few things about PPM4 XML and to highlight the main differences: 1) The use of PROVIDE and REQUIRE tags 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules. 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma separated tuples like PPM3 XML 4) the VERSION in PROVIDE and REQUIRE are used internally to do version comparisons for upgrades and solving prereqs etc 5) Module names should all contain '::' either natively according their namespace, if it doesn't have one natively, then one is appended to the end e.g. "GD::" 6) the VERSION in the SOFTPKG key is for human readability only 7) the NAME in SOFTPKG is used to identify which packages are actually the same. Nath --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 18:52:57 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 13:52:44 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 18:52:44 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: <457079FC.7010209@sendu.me.uk> Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: [snip] >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux with no > errors. If you make a file with just that sequence do you still get the > error? > > Is anyone else able to reproduce the problem? In fact, if I just try and load it again I reproduce the problem. The situation is similar to http://bugzilla.bioperl.org/show_bug.cgi?id=2092 And I have a tentative fix that extends Brian's fix there. Committed to HEAD only atm. I don't know anything about bioperl-db and don't have the faintest clue why this is happening, nor the time to figure it out. Can someone please have a proper look at this and decide if my fix is sane? All I can say is the the test suites for bioperl-live and bioperl-db continue to pass, but that isn't really saying much. PS. having used load_seqdatabase.pl to load a sequence, how do I remove it afterwards? From cjfields at uiuc.edu Fri Dec 1 14:00:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:00:13 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: >> Hello all, >> >> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >> without Cygwin. The "make test"s have all completed without error. >> This >> is my first time dealing with bioperl, so bear with me. >> >> I've successfully loaded the most recent taxonomy information >> using the >> biosql-schema scripts. After this, I attempted to load the most >> recent >> release of the uniprot flat file dataset with the following command: >> >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - >> dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/ > knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux > with no > errors. If you make a file with just that sequence do you still get > the > error? > > Is anyone else able to reproduce the problem? I can reproduce on both WinXP and Mac OS X using the latest bioperl- db/bioperl-live and a BioSQL database preloaded with taxonomy. Notably the bug doesn't show up with a database lacking taxonomy, where no lookup is used (I guess). Here's some overly verbose debugging (apologies): Loading saved.flat ... attempting to load adaptor class for Bio::Seq::RichSeq attempting to load module Bio::DB::BioSQL::RichSeqAdaptor attempting to load adaptor class for Bio::Seq attempting to load module Bio::DB::BioSQL::SeqAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor attempting to load adaptor class for Bio::Species attempting to load module Bio::DB::BioSQL::SpeciesAdaptor instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor attempting to load adaptor class for Bio::Tree::Tree attempting to load module Bio::DB::BioSQL::TreeAdaptor attempting to load adaptor class for Bio::Root::Root attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::Root::RootI attempting to load module Bio::DB::BioSQL::RootIAdaptor attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::Tree::TreeI attempting to load module Bio::DB::BioSQL::TreeIAdaptor attempting to load module Bio::DB::BioSQL::TreeAdaptor attempting to load adaptor class for Bio::Tree::NodeI attempting to load module Bio::DB::BioSQL::NodeIAdaptor attempting to load module Bio::DB::BioSQL::NodeAdaptor attempting to load adaptor class for Bio::Tree::TreeFunctionsI attempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor attempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor no adaptor found for class Bio::Tree::Tree attempting to load adaptor class for Bio::DB::Taxonomy::list attempting to load module Bio::DB::BioSQL::listAdaptor attempting to load adaptor class for Bio::DB::Taxonomy attempting to load module Bio::DB::BioSQL::TaxonomyAdaptor no adaptor found for class Bio::DB::Taxonomy::list attempting to load adaptor class for Bio::Annotation::Collection attempting to load module Bio::DB::BioSQL::CollectionAdaptor attempting to load adaptor class for Bio::AnnotationCollectionI attempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor attempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor attempting to load adaptor class for Bio::Annotation::TypeManager attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for Bio::Annotation::SimpleValue attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor attempting to load adaptor class for Bio::Annotation::Reference attempting to load module Bio::DB::BioSQL::ReferenceAdaptor instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor attempting to load adaptor class for Bio::Annotation::Comment attempting to load module Bio::DB::BioSQL::CommentAdaptor instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor attempting to load adaptor class for Bio::Annotation::DBLink attempting to load module Bio::DB::BioSQL::DBLinkAdaptor instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor attempting to load adaptor class for Bio::PrimarySeq attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor attempting to load adaptor class for Bio::SeqFeature::Generic attempting to load module Bio::DB::BioSQL::GenericAdaptor attempting to load adaptor class for Bio::SeqFeatureI attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor attempting to load adaptor class for Bio::Location::Simple attempting to load module Bio::DB::BioSQL::SimpleAdaptor attempting to load adaptor class for Bio::Location::Atomic attempting to load module Bio::DB::BioSQL::AtomicAdaptor attempting to load adaptor class for Bio::LocationI attempting to load module Bio::DB::BioSQL::LocationIAdaptor attempting to load module Bio::DB::BioSQL::LocationAdaptor instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for BioNamespace attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load driver for adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BioNamespaceAdaptor preparing UK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ? BioNamespaceAdaptor: binding UK column 1 to "Swiss-Prot" (namespace) preparing INSERT statement: INSERT INTO biodatabase (name, authority) VALUES (?, ?) BioNamespaceAdaptor::insert: binding column 1 to "Swiss- Prot" (namespace) BioNamespaceAdaptor::insert: binding column 2 to "" (authority) no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for Bio::DB::BioSQL::SpeciesAdaptor preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND ncbi_taxon_id = ? SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) SpeciesAdaptor: binding UK column 2 to "141679" (ncbi_taxid) prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM taxon node, taxon taxon, taxon_name name WHERE name.taxon_id = node.taxon_id AND taxon.left_value BETWEEN node.left_value AND node.right_value AND taxon.taxon_id = ? AND name.name_class = 'scientific name' ORDER BY node.left_value attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqAdaptor Could not store Q7N3Q6: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The supplied lineage does not start near 'Photorhabdus luminescens subsp. laumondii' STACK: Error::throw STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ Root/Root.pm:359 STACK: Bio::Species::classification /Users/cjfields/src/bioperl-live/ Bio/Species.pm:166 STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:552 STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /Library/ Perl/5.8.6/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:271 STACK: load_seqdatabase.pl:620 ----------------------------------------------------------- at load_seqdatabase.pl line 633 chris From cjfields at uiuc.edu Fri Dec 1 14:01:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:01:59 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <45707A07.7000106@sheffield.ac.uk> References: <006301c71572$24be8830$15327e82@pyrimidine> <45707A07.7000106@sheffield.ac.uk> Message-ID: On Dec 1, 2006, at 12:52 PM, Nathan S. Haigh wrote: > Chris Fields wrote: >>> Chris Fields wrote: >>> PPM). I can >>>> see using CPAN as an alternative way of installing Bioperl for a >>>> distribution, or as the primary method via CVS or manually, but >>>> not for distributions. At least not until the kinks are worked >>>> out for Windows users. >>>> >>> CPAN isn't being suggested as the primary or preferred >>> installation method for Windows. That will still be PPM. I'm >>> mentioning CPAN / manual installation in the Windows INSTALL docs >>> for the benefit of anyone who wants a simple install and test >>> environment when checking out from CVS. >>> >> >> That's fine by me. I think the focus is making sure the PPM >> works, but that >> shouldn't hold up the final 1.5.2 release. The PPM for previous >> releases >> was never released concurrently with the distribution (if at all); it >> generally followed by a few weeks to a few months past a final >> release. >> >> >>>> What are the significant issues for a bioperl PPM installation >>>> >>> None that I'm aware of - I just need to find the time to start >>> looking into generating an appropriate PPD. Hopefully Nathan's >>> wiki page on the subject will be all I need. >>> >> >> I'll try testing it out today and next week (the more people we >> have looking >> into the issue the better). I'm sure that Module::Build hasn't >> updated to >> using PPM4 XML formatting, but the tags are similar enough. I can >> always >> create a local PPM database using a similar directory structure to >> bioperl.org/DIST and test an installation from it. >> >> chris >> > > To clarify a few things about PPM4 XML and to highlight the main > differences: > > 1) The use of PROVIDE and REQUIRE tags > 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules. > 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma > separated tuples like PPM3 XML > 4) the VERSION in PROVIDE and REQUIRE are used internally to do > version comparisons for upgrades and solving prereqs etc > 5) Module names should all contain '::' either natively according > their namespace, if it doesn't have one natively, then one is > appended to the end e.g. "GD::" > 6) the VERSION in the SOFTPKG key is for human readability only > 7) the NAME in SOFTPKG is used to identify which packages are > actually the same. > > Nath Okay. Maybe place this in the wiki (PPM page) for future reference? Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Fri Dec 1 14:05:38 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 19:05:38 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine> References: <006301c71572$24be8830$15327e82@pyrimidine> Message-ID: <45707D02.9070504@sheffield.ac.uk> Chris Fields wrote: >> Chris Fields wrote: >> PPM). I can >> >>> see using CPAN as an alternative way of installing Bioperl for a >>> distribution, or as the primary method via CVS or manually, but not >>> for distributions. At least not until the kinks are worked out for >>> Windows users. >>> >> CPAN isn't being suggested as the primary or preferred >> installation method for Windows. That will still be PPM. I'm >> mentioning CPAN / manual installation in the Windows INSTALL >> docs for the benefit of anyone who wants a simple install and >> test environment when checking out from CVS. >> > > That's fine by me. I think the focus is making sure the PPM works, but that > shouldn't hold up the final 1.5.2 release. The PPM for previous releases > was never released concurrently with the distribution (if at all); it > generally followed by a few weeks to a few months past a final release. > > Forgot to say, one really annoying thing about PPM is that it seems to display all the versions of Bioperl defined in the XML file. An addition, I think a bug in PPM4 means that if a package is available in ActiveStates repo PPM4 always want to install it rather than a more recent version in a different repo (this includes upgrades). This results in this annoying behaviour: 1) If activestate and bioperl repos are active, searching for bioperl lists several versions 2) If you are using PPM4 GUI, and have installed a non activestate version, then it says you can upgrade to the version in activestates repo (even if it's actually a downgrade). 3) Using ppm-shell, if you choose "install bioperl" or "upgrade bioperl" it will always install the version in the activestate repo. 4) I'm sure there are also some other annoyances. In the end, it means the best way to install and upgrade bioperl, is to search for bioperl packages and install the latest version by eye rather than relying in the "upgrade feature" (at least for the time being). You may also need to remove an old version of bioperl before installing a more recent version. NOTE: using "upgrade" runs the risk of installing bioperl 1.2.3 from activestate and not the latest version in any other repo! I'll update the wiki when I have time. Nath >>> What are the significant issues for a bioperl PPM installation >>> >> None that I'm aware of - I just need to find the time to >> start looking into generating an appropriate PPD. Hopefully >> Nathan's wiki page on the subject will be all I need. >> > > I'll try testing it out today and next week (the more people we have looking > into the issue the better). I'm sure that Module::Build hasn't updated to > using PPM4 XML formatting, but the tags are similar enough. I can always > create a local PPM database using a similar directory structure to > bioperl.org/DIST and test an installation from it. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0652-4, 30/11/2006 > Tested on: 01/12/2006 18:29:23 > avast! - copyright (c) 1988-2006 ALWIL Software. > http://www.avast.com > > > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 19:05:39 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From cjfields at uiuc.edu Fri Dec 1 14:06:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:06:53 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: >> Hello all, >> >> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >> without Cygwin. The "make test"s have all completed without error. >> This >> is my first time dealing with bioperl, so bear with me. >> >> I've successfully loaded the most recent taxonomy information >> using the >> biosql-schema scripts. After this, I attempted to load the most >> recent >> release of the uniprot flat file dataset with the following command: >> >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - >> dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/ > knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux > with no > errors. If you make a file with just that sequence do you still get > the > error? > > Is anyone else able to reproduce the problem? Okay, just updated to get your latest CVS fixes for bioperl-live and it passes now for both Mac OS X and WinXP. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Dec 1 14:09:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:09:15 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457079FC.7010209@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk> Message-ID: On Dec 1, 2006, at 12:52 PM, Sendu Bala wrote: > > PS. having used load_seqdatabase.pl to load a sequence, how do I > remove > it afterwards? There's not much documentation on it, but it demonstrated several times in the test suite. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Dec 1 14:39:17 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 19:39:17 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> Message-ID: <457084E5.2050300@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > >> pelikan at cs.pitt.edu wrote: >>> Hello all, >>> >>> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >>> without Cygwin. The "make test"s have all completed without error. This >>> is my first time dealing with bioperl, so bear with me. >>> >>> I've successfully loaded the most recent taxonomy information >>> using the >>> biosql-schema scripts. After this, I attempted to load the most recent >>> release of the uniprot flat file dataset with the following command: >>> >>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass >>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >>> >>> I am subsequently greeted by many of the following errors: >>> >>> Could not store Q7N3Q6: >> >> I extracted just Q7N3Q6 from >> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz >> >> and was able to load it in using load_seqdatabase.pl under linux with no >> errors. If you make a file with just that sequence do you still get the >> error? >> >> Is anyone else able to reproduce the problem? > > Okay, just updated to get your latest CVS fixes for bioperl-live and it > passes now for both Mac OS X and WinXP. Can you confirm if it is actually working correctly though? Like, having stored a previously-problem sequence, can you get it back out from the database and is its ->species() correct? From cjfields at uiuc.edu Fri Dec 1 14:52:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:52:13 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457084E5.2050300@sendu.me.uk> Message-ID: <000001c71582$329d4d50$15327e82@pyrimidine> > > > > Okay, just updated to get your latest CVS fixes for > bioperl-live and > > it passes now for both Mac OS X and WinXP. > > Can you confirm if it is actually working correctly though? > Like, having stored a previously-problem sequence, can you > get it back out from the database and is its ->species() correct? I would assume so, if we can trust the species tests. I will have to try it again over the weekend. I planned on loading a ton of protein sequences in anyway, most of which are bacterial; if anything breaks it will probably be with those. I think Jason and Hilmar were going to get together about the BioSQL paper at the hackathon. That may be a good place to bring some of the species issues, if they persist. chris From hlapp at gmx.net Fri Dec 1 20:42:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Dec 2006 20:42:05 -0500 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457079FC.7010209@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk> Message-ID: <8414723F-BA02-4936-8F53-781276C3B526@gmx.net> Either using SQL: -- theoretically you should convince yourself first that there -- is only one such record (the UK is over acc,version,namespace) SQL> DELETE FROM bioentry WHERE accession = 'Q7N3Q6'; or through bioperl-db (see the delete test for examples): my $db = Bio::DB::BioDB->new(....); my $seq = Bio::PrimarySeq->new(-accession_number=>'Q7N3Q6', -namespace=>'whatever you used when loading'); my $adp = $db->get_persistence_adaptor($seq); my $pseq = $adp->find_by_unique_key($seq); $pseq->remove(); $pseq->commit(); -hilmar On Dec 1, 2006, at 1:52 PM, Sendu Bala wrote: > PS. having used load_seqdatabase.pl to load a sequence, how do I > remove > it afterwards? -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From chhalling at verizon.net Sun Dec 3 20:56:51 2006 From: chhalling at verizon.net (Conrad Halling) Date: Sun, 03 Dec 2006 20:56:51 -0500 Subject: [Bioperl-l] BioPerl Wiki is down Message-ID: <45738063.1070504@verizon.net> When I attempted to navigate to http://www.bioperl.org/, I got the following message: A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "MediaWikiBagOStuff::_doquery". MySQL returned error "1205: Lock wait timeout exceeded; try restarting transaction (localhost)". -- Conrad Halling chhalling at verizon.net From rbirnie at totalise.co.uk Sun Dec 3 16:38:02 2006 From: rbirnie at totalise.co.uk (richard) Date: Sun, 3 Dec 2006 21:38:02 +0000 Subject: [Bioperl-l] confused by Bio::Graphics Message-ID: <200612032138.02522.rbirnie@totalise.co.uk> Hi all, I'm having a little trouble getting Bio::Graphics to give me the correct output and I'm looking for some help. I am trying to extend from example 5 of the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. Eventually I intend the script to follow example 6 but I thought I'd try the simpler version first. The basic aim of the script is that it takes as input a file containing a list of GenBank IDs plus some other info for alternative transcripts of a gene. This information is stored in a hash and the GenBank IDs are used to retrieve the appropriate entries from GenBank. I then want to use Bio::Graphics to generate a figure from the feature tables showing the CDSs from the alternative transcripts. So far I have managed to retrieve the GenBank entries extract the feature tables and store a reference to these in the hash mentioned above. I've also got Bio::Graphics to draw a basic image but some of the details aren't right and I don't understand why. I have attached the code I have so far, the input file and the output image to this mail. I didn't want to display it all in the main message but I'm not actually sure which bit is causing the problem. The code is very rough and in need of polishing but I need to get it to work correctly first. These are the problems: 1) As I understand it this: my $wholeseq = Bio::SeqFeature::Generic->new ( -start => 1, -end => $refseq->length, -display_name =>$refseq->display_name ); should display the name of the gene (CD133/Prominin1) near the top of image. It doesn't, am I misunderstanding or is there an error in the code? 2) In the quoted example the CDS is broken up into smaller regions which are then linked together in example 6. This isn't happening in my code and I think it should be, I get one solid block for the CDS. I don't understand why this is because I'm not clear which parts of the feature table are used to define where the CDS should be split. I think this is the relevant bit of code: foreach my $alt_trans (keys %main) { foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { my $feature = $main{$alt_trans}{'features'}{$tag}; $panel->add_track($feature, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'black', -key => $alt_trans, -bump => +1, -height => 8, -label => 1, -description => 1, ) if ($tag eq 'CDS'); } } Can anyone tell me what I am doing wrong? RefSeq entry for the gene of interest is here: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386 If I understand correctly the example file used in the HOWTO is this gene: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=116805320 Final question, does bioperl come with example scripts and is so where whould they normally be found on a Linux system? If anyone is still reading this thanks for your patience. Any clarification will be appreciated. regards, Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: CD133_graphic_code Type: application/x-perl Size: 2702 bytes Desc: not available URL: -------------- next part -------------- sequence_ID Exon_Boundary Assay_location Amplicon_length NM_006017 9 - 10 1118 106 AF027208.1 9 - 10 1118 106 AK027420.1 9 - 10 1312 106 AK027422.1 9 - 10 1334 106 BC012089.1 9 - 10 1289 106 AY449689.1 8 - 9 1054 106 AY449690.1 8 - 9 1054 106 AY449691.1 8 - 9 1054 106 AY449692.1 9 - 10 1081 106 AY449693.1 9 - 10 1081 106 AF507034.1 8 - 9 1091 106 AK075411.1 9 - 10 1289 106 AF117225.1 9 - 10 1334 106 AK226033.1 - 1312 106 DQ895452.1 - 1054 106 -------------- next part -------------- A non-text attachment was scrubbed... Name: CD133.png Type: image/png Size: 4322 bytes Desc: not available URL: From cjfields at uiuc.edu Sun Dec 3 22:35:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 3 Dec 2006 21:35:17 -0600 Subject: [Bioperl-l] BioPerl Wiki is down In-Reply-To: <45738063.1070504@verizon.net> References: <45738063.1070504@verizon.net> Message-ID: <41422FC7-B579-4B45-B8CC-341B8F462BCB@uiuc.edu> On Dec 3, 2006, at 7:56 PM, Conrad Halling wrote: > When I attempted to navigate to http://www.bioperl.org/, I got the > following message: > > A database query syntax error has occurred. This may indicate a bug in > the software. The last attempted database query was: > > (SQL query hidden) > > from within function "MediaWikiBagOStuff::_doquery". MySQL returned > error "1205: Lock wait timeout exceeded; try restarting transaction > (localhost)". > > -- Conrad Halling > chhalling at verizon.net This has been an ongoing problem with the server; I have reported it previously to open-bio support. There have been a few attempts to fix it which seem to work short-term but something else must be wrong. Jason? Chris D? For my part, Googling found the following link, which indicates that this error may be due to heavy server load: http://tibia.erig.net/TibiaWiki:Bug_reports Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Derek.Fairley at bll.n-i.nhs.uk Mon Dec 4 05:18:37 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Mon, 4 Dec 2006 10:18:37 -0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk> Message-ID: Richard, You can find instructions for installing the example scripts directory here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_SCRIPTS or you can get individual scripts from here: http://www.bioperl.org/wiki/Bioperl_scripts11 Derek. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of richard Sent: 03 December 2006 21:38 To: Bioperl list Subject: [Bioperl-l] confused by Bio::Graphics Hi all, I'm having a little trouble getting Bio::Graphics to give me the correct output and I'm looking for some help. I am trying to extend from example 5 of the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. Eventually I intend the script to follow example 6 but I thought I'd try the simpler version first. The basic aim of the script is that it takes as input a file containing a list of GenBank IDs plus some other info for alternative transcripts of a gene. This information is stored in a hash and the GenBank IDs are used to retrieve the appropriate entries from GenBank. I then want to use Bio::Graphics to generate a figure from the feature tables showing the CDSs from the alternative transcripts. So far I have managed to retrieve the GenBank entries extract the feature tables and store a reference to these in the hash mentioned above. I've also got Bio::Graphics to draw a basic image but some of the details aren't right and I don't understand why. I have attached the code I have so far, the input file and the output image to this mail. I didn't want to display it all in the main message but I'm not actually sure which bit is causing the problem. The code is very rough and in need of polishing but I need to get it to work correctly first. These are the problems: 1) As I understand it this: my $wholeseq = Bio::SeqFeature::Generic->new ( -start => 1, -end => $refseq->length, -display_name =>$refseq->display_name ); should display the name of the gene (CD133/Prominin1) near the top of image. It doesn't, am I misunderstanding or is there an error in the code? 2) In the quoted example the CDS is broken up into smaller regions which are then linked together in example 6. This isn't happening in my code and I think it should be, I get one solid block for the CDS. I don't understand why this is because I'm not clear which parts of the feature table are used to define where the CDS should be split. I think this is the relevant bit of code: foreach my $alt_trans (keys %main) { foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { my $feature = $main{$alt_trans}{'features'}{$tag}; $panel->add_track($feature, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'black', -key => $alt_trans, -bump => +1, -height => 8, -label => 1, -description => 1, ) if ($tag eq 'CDS'); } } Can anyone tell me what I am doing wrong? RefSeq entry for the gene of interest is here: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386 If I understand correctly the example file used in the HOWTO is this gene: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=1168053 20 Final question, does bioperl come with example scripts and is so where whould they normally be found on a Linux system? If anyone is still reading this thanks for your patience. Any clarification will be appreciated. regards, Richard From rbirnie at totalise.co.uk Mon Dec 4 04:30:36 2006 From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk) Date: 04 Dec 2006 09:30:36 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From bix at sendu.me.uk Mon Dec 4 09:37:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 14:37:16 +0000 Subject: [Bioperl-l] BLASTing with a seqio/seq object... In-Reply-To: <45706671.9000201@york.ac.uk> References: <01ba01c714a2$b9659c10$15327e82@pyrimidine> <456F27E9.70205@york.ac.uk> <456FEF22.4090004@sendu.me.uk> <45706671.9000201@york.ac.uk> Message-ID: <4574329C.2030905@sendu.me.uk> Samantha Thompson wrote: > Hi, > Thanks for all your help so far, I am still trying to understand a > couple of things... You should make sure your replies are sent to the list, as you're likely to get a faster response. [where $blast_report is the value returned by Bio::Tools::Run::RemoteBlast->submit_blast($seq_object)] > when I run this line.. > > $searchio = Bio::SearchIO->new(-format => 'blast', > -file => $blast_report); > > between submitting the blast search and trying to to process the searchio object like I was attempting before I get the following errors back: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open 1: No such file or directory [snip] > Does this mean that my BLAST is failing when I submit it? No, the -file option of SearchIO->new() takes, unsurprisingly, a filename. I'd tell you to pay careful attention to the docs, but sadly the RemoteBlast docs are currently wrong. submit_blast() claims to return 'Blast report object' (which in any case certainly wouldn't be a filename) when in fact it returns, as you discovered, a (for our purposes) meaningless number. As I suggested before, you need to look at the synopsis for Bio::Tools::Run::RemoteBlast instead. (having called submit_blast you must do the each_rid loop) Does anyone care to go through the POD for RemoteBlast and update it to an accurate state? From bix at sendu.me.uk Mon Dec 4 09:40:27 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 14:40:27 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: References: Message-ID: <4574335B.805@sendu.me.uk> rbirnie at totalise.co.uk wrote: > Hi all, > > I've just seen my previous mail come through on the digest and I noticed > that the code I attached has been scrubbed which means that the message > won't make much sense. If I've contravened list rules by posting > attachments then apologies, I did look for a posting guide but couldn't > see one on the wiki. I deliberatley didn't put the whole code in the > main message because it's quite long. I'm not sure which part is wrong > so I don't know which part to post I'm just not seeing the output I > would expect from the example. What is the best thing for me to do? I saw a few attachments on your post (including your code example), so I think what you did was fine. From cjfields at uiuc.edu Mon Dec 4 10:40:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 09:40:20 -0600 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <4574335B.805@sendu.me.uk> Message-ID: <002001c717ba$823c1500$15327e82@pyrimidine> > rbirnie at totalise.co.uk wrote: > > Hi all, > > > > I've just seen my previous mail come through on the digest and I > > noticed that the code I attached has been scrubbed which means that > > the message won't make much sense. If I've contravened list > rules by > > posting attachments then apologies, I did look for a > posting guide but > > couldn't see one on the wiki. I deliberatley didn't put the > whole code > > in the main message because it's quite long. I'm not sure > which part > > is wrong so I don't know which part to post I'm just not seeing the > > output I would expect from the example. What is the best > thing for me to do? > > I saw a few attachments on your post (including your code > example), so I think what you did was fine. Same here. I received a PNG file and two text files (a script and a data file). chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From rbirnie at totalise.co.uk Mon Dec 4 11:06:51 2006 From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk) Date: 04 Dec 2006 16:06:51 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <002001c717ba$823c1500$15327e82@pyrimidine> References: <002001c717ba$823c1500$15327e82@pyrimidine> Message-ID: An HTML attachment was scrubbed... URL: From dmessina at wustl.edu Mon Dec 4 11:46:16 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 4 Dec 2006 10:46:16 -0600 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk> References: <200612032138.02522.rbirnie@totalise.co.uk> Message-ID: Hi Richard, > [richard] > > These are the problems: > 1) As I understand it this: > > my $wholeseq = Bio::SeqFeature::Generic->new ( > -start => 1, > -end => $refseq->length, > -display_name =>$refseq->display_name > ); > > should display the name of the gene (CD133/Prominin1) near the top > of image. > It doesn't, am I misunderstanding or is there an error in the code? The contents of a sequence object's display_name varies depending on the type of sequence record; for a sequence object created from a Genbank record, it's the value of the LOCUS field on the first line of the record. If you want the gene name, you'll have to dig it out of the feature table. If you look at the Genbank record for your first sequence, you'll see that under both the gene and CDS primary features, the HUGO gene abbreviation is stored under the "gene" secondary tag, and various synonyms are under the "note" and "product" secondary tags. LOCUS NM_006017 3794 bp mRNA linear PRI 17-NOV-2006 DEFINITION Homo sapiens prominin 1 (PROM1), mRNA. ACCESSION NM_006017 VERSION NM_006017.1 GI:5174386 [...skipping irrelevant part of the Genbank record...] FEATURES Location/Qualifiers source 1..3794 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /chromosome="4" /map="4p15.32" gene 1..3794 /gene="PROM1" /note="prominin 1; synonyms: AC133, CD133, PROML1, MSTP061" /db_xref="GeneID:8842" /db_xref="HGNC:9454" /db_xref="HPRD:HPRD_05079" /db_xref="MIM:604365" CDS 38..2635 /gene="PROM1" /go_component="integral to plasma membrane [pmid 9389720]; membrane" /go_process="response to stimulus; visual perception" /note="hProminin; prominin (mouse)-like 1; hematopoietic stem cell antigen" /codon_start=1 /product="prominin 1" /protein_id="NP_006008.1" /db_xref="GI:5174387" /db_xref="GeneID:8842" /db_xref="HGNC:9454" /db_xref="HPRD:HPRD_05079" /db_xref="MIM:604365" [....more...] In your script, you grab the primary features between lines 34-60. You can grab the secondary feature you want with something like: [cribbed from the Feature-Annotation HOWTO] for my $feat_object ($seq_object->get_SeqFeatures) { push @ids, $feat_object->get_tag_values("gene") if ($feat_object- >has_tag("gene")); } > 2) In the quoted example the CDS is broken up into smaller regions > which are > then linked together in example 6. This isn't happening in my code > and I > think it should be, I get one solid block for the CDS. I don't > understand why > this is because I'm not clear which parts of the feature table are > used to > define where the CDS should be split. I think this is the relevant > bit of > code: > > foreach my $alt_trans (keys %main) { > foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { > > my $feature = $main{$alt_trans}{'features'}{$tag}; > > $panel->add_track($feature, > -glyph => 'generic', > -bgcolor => $colors[$idx++ % @colors], > -fgcolor => 'black', > -font2color => 'black', > -key => $alt_trans, > -bump => +1, > -height => 8, > -label => 1, > -description => 1, > ) if ($tag eq 'CDS'); > > } > } The problem here is that RefSeq mRNA records don't contain intron- exon boundary information. I think you'll have to get that from an assembly record. From the Entrez gene page for PROM1, I obtained a Genbank record for the PROM1 genomic locus: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? val=NC_000004.10&from=15578955&to=15686664&strand=2&dopt=gb Saving that as 'PROM1.gb' (the suffix is important), and running the bp_embl2picture.pl script on it, I got an image similar to Figure 6 (attached). Hope this helps, Dave ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PROM1.png Type: image/png Size: 8646 bytes Desc: not available URL: From bix at sendu.me.uk Mon Dec 4 14:37:13 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 19:37:13 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <000001c717db$3ca7b910$15327e82@pyrimidine> References: <000001c717db$3ca7b910$15327e82@pyrimidine> Message-ID: <457478E9.3060405@sendu.me.uk> Chris Fields wrote: > Sendu, > > Are current plans to still try getting the final 1.5.2 release out > before the hackathon next week? Yes, I seriously hope so. I was kind of hoping to see test results from you and Nathan on the wiki though... > There are a few commits I want to make, but I may wait until after > 1.5.2 is out before I add them. But don't let the release stop you. As long as you don't commit to the 1.5.2 branch it will be fine. From cjfields at uiuc.edu Mon Dec 4 14:34:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 13:34:34 -0600 Subject: [Bioperl-l] Timeline on the 1.5.2 release? Message-ID: <000001c717db$3ca7b910$15327e82@pyrimidine> Sendu, Are current plans to still try getting the final 1.5.2 release out before the hackathon next week? There are a few commits I want to make, but I may wait until after 1.5.2 is out before I add them. chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Dec 4 15:23:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 14:23:45 -0600 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> Message-ID: <000001c717e2$19d18e00$15327e82@pyrimidine> > Chris Fields wrote: > > Sendu, > > > > Are current plans to still try getting the final 1.5.2 release out > > before the hackathon next week? > > Yes, I seriously hope so. I was kind of hoping to see test > results from you and Nathan on the wiki though... Ah, forgot to post those! Working on that now... > > There are a few commits I want to make, but I may wait until after > > 1.5.2 is out before I add them. > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. There are a few things I plan on adding over the next few weeks, including some things for Bio::Location::SplitLocation. However I'm sure some of the latter will break tests, so I'll be adding it in a bit at a time. It all depends when I can squeeze time in to work on them! chris From pelikan at cs.pitt.edu Mon Dec 4 17:34:59 2006 From: pelikan at cs.pitt.edu (pelikan at cs.pitt.edu) Date: Mon, 4 Dec 2006 17:34:59 -0500 (EST) Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries Message-ID: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> Hello, My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, and the latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB memory. "make test"s past fine. The problem is that I'm not getting similar numbers of anything when I load datasets using load_seqdatabase.pl. For instance, if I want to load only protiens from Homo Sapiens, I go to UniProt, use the database search function, do a text search for Homo Sapiens (returns 70914 hits), export the hits to flat file format (--format swiss) using the data set manager, and load it using load_seqdatabase.pl. The result of "select count(*) from bioentry;" results in only 1003 entries. Moreover it seems like the entries don't go past the B's in the alphabet - I can't find bioentry.descriptions like '%cytochrome%' or '%myoglobin%', but I can find apolipoproteins, for example. I know this is an annoying question, but if someone has more experience in dealing with this issue, I would be grateful for any assistance. I don't get any error messages, so it's difficult for me to tell what's going on. -Richard From n.haigh at sheffield.ac.uk Tue Dec 5 01:53:34 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 05 Dec 2006 06:53:34 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> Message-ID: <4575176E.3020906@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: > >> Sendu, >> >> Are current plans to still try getting the final 1.5.2 release out >> before the hackathon next week? >> > > Yes, I seriously hope so. I was kind of hoping to see test results from > you and Nathan on the wiki though... > > > OK, I'll get onto this today. >> There are a few commits I want to make, but I may wait until after >> 1.5.2 is out before I add them. >> > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > A: Yes. >> Q: Are you sure? >> >>> A: Because it reverses the logical flow of conversation. >>> >>>> Q: Why is top posting frowned upon? >>>> Get Thunderbird From n.haigh at sheffield.ac.uk Tue Dec 5 06:43:16 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 05 Dec 2006 11:43:16 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> Message-ID: <45755B54.7080902@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: > >> Sendu, >> >> Are current plans to still try getting the final 1.5.2 release out >> before the hackathon next week? >> > > Yes, I seriously hope so. I was kind of hoping to see test results from > you and Nathan on the wiki though... > > > I've added my test results for Debian to the wiki. Nath >> There are a few commits I want to make, but I may wait until after >> 1.5.2 is out before I add them. >> > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > A: Yes. >> Q: Are you sure? >> >>> A: Because it reverses the logical flow of conversation. >>> >>>> Q: Why is top posting frowned upon? >>>> Get Thunderbird From bix at sendu.me.uk Tue Dec 5 06:47:06 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 05 Dec 2006 11:47:06 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <45755B54.7080902@sheffield.ac.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> <45755B54.7080902@sheffield.ac.uk> Message-ID: <45755C3A.9050903@sendu.me.uk> Nathan S. Haigh wrote: > Sendu Bala wrote: >> Chris Fields wrote: >> >>> Sendu, >>> >>> Are current plans to still try getting the final 1.5.2 release out >>> before the hackathon next week? >>> >> Yes, I seriously hope so. I was kind of hoping to see test results from >> you and Nathan on the wiki though... > > I've added my test results for Debian to the wiki. Thanks (and to Chris as well). I can't tell you how much I loath and despise TCoffee and Tmhmm now ;) From cjfields at uiuc.edu Tue Dec 5 11:04:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Dec 2006 10:04:38 -0600 Subject: [Bioperl-l] Build.PL changes Message-ID: <001b01c71887$10be3160$15327e82@pyrimidine> Sendu, I think the Build.PL commits which force installation of XML::SAX::Expat should be rolled back. XML::Simple works with any XML::SAX backend, not just XML::SAX::Expat, which hasn't been actively maintained since 2003 and is deprecated in favor of XML::SAX::ExpatXS. In fact, forcing XML::SAX::Expat to install as the default XML::SAX backend currently breaks blastxml parsing. Note that forcing this also forces one to install the Expat library (now at v 2), which now has some compatibility problems with XML::SAX::Expat (but not ExpatXS). Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From qetzal at tutopia.com.br Wed Dec 6 10:21:20 2006 From: qetzal at tutopia.com.br (giovani) Date: Wed, 06 Dec 2006 10:21:20 -0500 Subject: [Bioperl-l] Biodiversity graphic Message-ID: An HTML attachment was scrubbed... URL: From benoit at ebi.ac.uk Wed Dec 6 12:30:12 2006 From: benoit at ebi.ac.uk (Benoit Ballester) Date: Wed, 06 Dec 2006 17:30:12 +0000 Subject: [Bioperl-l] Biodiversity graphic In-Reply-To: References: Message-ID: <4576FE24.1030807@ebi.ac.uk> giovani wrote: > > Hello there. I'm trying to write a programa to set a graphic with two > axis and two data sets to each axis. Anyone know some tool similar to > the GD module to set this graphic, because with GD I'm having troubles. > here is an example of what I want to do: > http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm > using with GD module. It looks to me that the graph you pointing too has been made by gnuplot. Why don't you use gnuplot or R instead ? Ben > > #!/usr/bin/perl -w > > use GD::Graph::mixed; > @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 3, 4, 14, 30, 12, 8, 7, 20, 15], > [ 2, 8, 2, 5, 3, 1, 3, 4, 1], > [ 5, 12, 24, 33, 19, 8, 6, 15, 21], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > ); > > $my_graph = new GD::Graph::mixed( ); > $my_graph->set( > x_label => 'X Label', > y1_label => 'Y1 label', > y2_label => 'Y2 label', > title => 'Using two axes', > y1_max_value => 40, > y2_max_value => 8, > y_tick_number => 8, > y_label_skip => 2, > long_ticks => 1, > two_axes => 1, > use_axis => [1,2,1,2], > legend_placement => 'BR', > x_labels_vertical => 1, > x_label_position => 1/2, > ); > > $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY'); > my $gd = $my_graph->plot(\@data) or die $my_graph->error; > open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n"; > binmode IMG; > print IMG $gd->gif; > close IMG; > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gwu at molbio.mgh.harvard.edu Wed Dec 6 16:12:57 2006 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Wed, 06 Dec 2006 16:12:57 -0500 Subject: [Bioperl-l] Biodiversity graphic In-Reply-To: References: Message-ID: <45773259.3010405@molbio.mgh.harvard.edu> Do you mean the GD code can not run or it does not generate image as you wanted? Gang giovani wrote: > > > Hello there. I'm trying to write a programa to set a graphic with two > axis and two data sets to each axis. Anyone know some tool similar to > the GD module to set this graphic, because with GD I'm having > troubles. here is an example of what I want to do: > http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm > using with GD module. > > #!/usr/bin/perl -w > > use GD::Graph::mixed; > @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 3, 4, 14, 30, 12, 8, 7, 20, 15], > [ 2, 8, 2, 5, 3, 1, 3, 4, 1], > [ 5, 12, 24, 33, 19, 8, 6, 15, 21], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > ); > > $my_graph = new GD::Graph::mixed( ); > $my_graph->set( > x_label => 'X Label', > y1_label => 'Y1 label', > y2_label => 'Y2 label', > title => 'Using two axes', > y1_max_value => 40, > y2_max_value => 8, > y_tick_number => 8, > y_label_skip => 2, > long_ticks => 1, > two_axes => 1, > use_axis => [1,2,1,2], > legend_placement => 'BR', > x_labels_vertical => 1, > x_label_position => 1/2, > ); > > $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY'); > my $gd = $my_graph->plot(\@data) or die $my_graph->error; > open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n"; > binmode IMG; > print IMG $gd->gif; > close IMG; > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Wed Dec 6 17:39:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 06 Dec 2006 22:39:49 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 Release Message-ID: <457746B5.2020006@sendu.me.uk> I am proud to announce the final release of Bioperl 1.5.2. http://www.bioperl.org/wiki/Release_1.5.2 bioperl (core): cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-1.5.2_100.zip bioperl-run: cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip bioperl-db: cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip bioperl-network: cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip http://bioperl.org/DIST/SIGNATURES.md5 (all are also available via CVS, and for Windows users, using the Perl Package Manager - see the wiki for details) The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree and bioperl-pipeline) did not see a unified release for 1.5.2. This release represents a developer release which has been thoroughly tested. We consider it the most stable (in terms of bugs) version of Bioperl and believe it to be suitable for most people. It is marked 'developer' or even 'unstable' because its API may change on short notice. It will also not be maintained or supported beyond the next bioperl release. 1.5.2 introduces the following new (core) features: * Taxonomy (Bio::Species) overhaul * Bio::Map improvements * Bio::SearchIO speedup * Build.PL installation For details, and a complete change log, see the wiki. API documentation is available here: http://doc.bioperl.org/ Acknowledgements: Enumerable thanks are due for the tireless efforts of Christopher Fields (bug fixing, testing, documentation, discussion), Nathan Haigh (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra (testing, documentation, support). Feedback and ideas provided by Hilmar Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and elsewhere proved invaluable. None of this would have been possible without the behind-the-scenes work of the open-bio support team. I'd also like to acknowledge Andreas J. Koenig for his help with CPAN matters. Finally, thank you to everyone who tried out the release candidates, and especially those that took the time to file bug reports or report problems. Remember, Bioperl can only go from strength to strength with /your/ help. If you'd like to experience the fame and fortune that naturally follow becoming a Bioperl developer (?!), become one! http://www.bioperl.org/wiki/Becoming_a_developer On behalf of the Bioperl team, Sendu Bala. From cjfields at uiuc.edu Wed Dec 6 21:30:44 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 6 Dec 2006 20:30:44 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c719a7$b48beb90$15327e82@pyrimidine> Great job Sendu! A bit of icing on the cake: all the WinXP PPMs (core, db, network, run) installed w/o a hitch following normal instructions using PPM4 (GUI and command line shell) using clean ActiveState installations. Looks like all the correct prereqs were installed with shell (only XML::SAX::ExpatXS was left out in the GUI installation for reasons outlined before). I'll run more tests tomorrow to see if tests pass with the installed bioperl (this should catch any prereq issues with PPM installation we missed). chris > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using > the Perl Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, > bioperl-pedigree and bioperl-pipeline) did not see a unified > release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of > Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas > provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the > mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with > CPAN matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or > report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. From hlapp at gmx.net Wed Dec 6 22:20:14 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Dec 2006 22:20:14 -0500 Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries In-Reply-To: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> References: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> Message-ID: <8E15592D-6475-4A4D-BA6D-BD669C4233C3@gmx.net> I seriously doubt that load_seqdatabase.pl would have deliberately stopped loading the file. Either there was an error in loading an entry (which you should see, and you can also ask the script to just keep going by providing the --safe option), or the file only contained 1003 entries. Note that you can get progress logging by using the --logchunk option, which will also give you a final count of the number of sequences loaded. I'm not sure how you ran your search and your download on Uniprot. If I try what you describe I get 70491 hits, and if I try to export them using the data set manager I get the message: This download mechanism only supports 1000 proteins. The first 1000 proteins have been added from the selected. Which perfectly explains what you see. Did you convince yourself that the file contains 70491 entries? If you don't have grep and wc on your windows machine, you can use perl one-liners directly, e.g., perl -n -e '/^ID / && ++$n; END {print "$n entries\n";}' -hilmar On Dec 4, 2006, at 5:34 PM, pelikan at cs.pitt.edu wrote: > Hello, > > My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, > and the > latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB > memory. "make test"s past fine. > > The problem is that I'm not getting similar numbers of anything when I > load datasets using load_seqdatabase.pl. For instance, if I want to > load > only protiens from Homo Sapiens, > I go to UniProt, > use the database search function, > do a text search for Homo Sapiens (returns 70914 hits), > export the hits to flat file format (--format swiss) using the data > set > manager, > and load it using load_seqdatabase.pl. > > The result of "select count(*) from bioentry;" results in only > 1003 entries. > Moreover it seems like the entries don't go past the B's in the > alphabet - > I can't find bioentry.descriptions like '%cytochrome%' or '% > myoglobin%', > but I can find apolipoproteins, for example. > > I know this is an annoying question, but if someone has more > experience in > dealing with this issue, I would be grateful for any assistance. I > don't > get any error messages, so it's difficult for me to tell what's > going on. > > -Richard > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lzhtom at hotmail.com Wed Dec 6 22:13:47 2006 From: lzhtom at hotmail.com (zhihua li) Date: Thu, 07 Dec 2006 03:13:47 +0000 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? Message-ID: Hi netters, Recently I found this: For constructing a new SeqI object, I had to write: $seq_obj=Bio::SeqIO->new( -file => '/home/myfile', -format => 'Fasta'); #Note the dash before the two arguments. If I omitted the dash: $seq_obj=Bio::SeqIO->new( file => '/home/myfile', format => 'Fasta'); I'd get error: MSG: Unknown format given or could not determine it [] STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 So it seems to me that the dashes before the arguments are essential. However, when I tried to build a factory for StandaloneBlast, I found the other way around. If the script had the dash: $blast_obj=Bio::Tools::Run::StandAloneBlast->new( -program => 'blastn', -database => '/home/mydatabase'); I'd get the error message: MSG: Unallowed parameter: - ! STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 If I left out the dash by saying: $blast_obj=Bio::Tools::Run::StandAloneBlast->new( program => 'blastn', database => '/home/mydatabase'); Everyting is fine. Now I'm confused. Why sometimes I have to add the dash, while sometimes I'm not allowed to? Thanks in advance! _________________________________________________________________ ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn From hlapp at gmx.net Wed Dec 6 22:56:44 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Dec 2006 22:56:44 -0500 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: Congrats! Great work, Sendu! Don't forget to celebrate. -hilmar On Dec 6, 2006, at 5:39 PM, Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher > Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by > Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list > and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN > matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or report > problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From arareko at campus.iztacala.unam.mx Wed Dec 6 22:53:21 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 06 Dec 2006 21:53:21 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <45779031.3050202@campus.iztacala.unam.mx> This has been a great effort. Congrats and thanks to everyone involved! Mauricio. Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN matters. > > Finally, thank you to everyone who tried out the release candidates, and > especially those that took the time to file bug reports or report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From jason at bioperl.org Thu Dec 7 00:06:36 2006 From: jason at bioperl.org (Jason Stajich) Date: Wed, 6 Dec 2006 21:06:36 -0800 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <41A863C9-1B69-4C7B-9271-C577EDD011BB@bioperl.org> hear! hear! Excellent work. Thanks for leading the effort on this release and all of the behind the scenes work, attention to detail, and cat herding work it took make this possible. -jason On Dec 6, 2006, at 2:39 PM, Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher > Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by > Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list > and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN > matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or report > problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From n.haigh at sheffield.ac.uk Thu Dec 7 02:23:47 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 07 Dec 2006 07:23:47 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <4577C183.7010501@sheffield.ac.uk> I know I'm very new to Bioperl development and don't know very much yet, so I'm probably not the best person to express the views of the Bioperl developers or users. However, I'm sure I'm safe in saying that on behalf of everyone associated with Bioperl a *huge* thank you must go out to Sendu for the gargantuan effort he has put into this release. Just looking over some of the e-mails he's sent over the past few weeks alone, it's clear that he has devoted a huge amount of time to the effort and in some cases with little sleep. Since there is very little (or should I say no) monetary recognition in such an important and time consuming role as "Release Pumpkin", I hope Sendu has a warm glow, safe in the knowledge that his efforts have helped enormously and are clearly recognised and fully appreciated by the Bioperl community. Therefore, I'd just like to iterate what others have already said.....Well done, excellent work!!! Nath From valiente at lsi.upc.edu Thu Dec 7 03:25:27 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Thu, 7 Dec 2006 09:25:27 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: Message-ID: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> The following popped out when input more the 110 species to taxonomy2tree script version 1.4: (in cleanup) ------------- EXCEPTION ------------- MSG: Must supply a Bio::Taxon STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ flatfile.pm:260 STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 STACK (eval) taxonomy2tree.pl:0 STACK toplevel taxonomy2tree.pl:0 Any clues? Thanks, Gabriel From bix at sendu.me.uk Thu Dec 7 04:24:39 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Dec 2006 09:24:39 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> Message-ID: <4577DDD7.7060208@sendu.me.uk> Gabriel Valiente wrote: > The following popped out when input more the 110 species to > taxonomy2tree script version 1.4: > > (in cleanup) > ------------- EXCEPTION ------------- > MSG: Must supply a Bio::Taxon > STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ > flatfile.pm:260 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 > STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 > STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 > STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 > STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 > STACK (eval) taxonomy2tree.pl:0 > STACK toplevel taxonomy2tree.pl:0 > > Any clues? Thanks, Are you able to narrow the problem down? What was your command line, what species were you using? Does it work with the first 110 species you tried? Is there anything special about the 111th? Do I understand correctly that this was a problem during cleanup only, and didn't affect the correctness and completeness of the result? From bix at sendu.me.uk Thu Dec 7 04:33:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Dec 2006 09:33:18 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> Message-ID: <4577DFDE.6000500@sendu.me.uk> Gabriel Valiente wrote: > The following popped out when input more the 110 species to > taxonomy2tree script version 1.4: > > (in cleanup) > ------------- EXCEPTION ------------- > MSG: Must supply a Bio::Taxon > STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ > flatfile.pm:260 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 > STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 > STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 > STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 > STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 > STACK (eval) taxonomy2tree.pl:0 > STACK toplevel taxonomy2tree.pl:0 > > Any clues? Thanks, Oh, does it work with option -e? Or does it work if you delete your old indexes of the nodes and names files and let it re-create them? From valiente at lsi.upc.edu Thu Dec 7 04:38:03 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Thu, 7 Dec 2006 10:38:03 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4577DDD7.7060208@sendu.me.uk> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> <4577DDD7.7060208@sendu.me.uk> Message-ID: Hi, If you run the attached shell script you should be able to reproduce the problem. It is not about any species in particular, but about the total number of species: it crushes with more than 120 species. The resulting tree is not correct, I'm checking it further now. Thanks, Gabriel -------------- next part -------------- A non-text attachment was scrubbed... Name: fetch-bork.sh Type: application/octet-stream Size: 7378 bytes Desc: not available URL: -------------- next part -------------- On Dec 7, 2006, at 10:24 AM, Sendu Bala wrote: > Gabriel Valiente wrote: >> The following popped out when input more the 110 species to >> taxonomy2tree script version 1.4: >> (in cleanup) >> ------------- EXCEPTION ------------- >> MSG: Must supply a Bio::Taxon >> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ >> flatfile.pm:260 >> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 >> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 >> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 >> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 >> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 >> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 >> STACK (eval) taxonomy2tree.pl:0 >> STACK toplevel taxonomy2tree.pl:0 >> Any clues? Thanks, > > Are you able to narrow the problem down? What was your command > line, what species were you using? Does it work with the first 110 > species you tried? Is there anything special about the 111th? > > Do I understand correctly that this was a problem during cleanup > only, and didn't affect the correctness and completeness of the > result? From cjfields at uiuc.edu Thu Dec 7 10:22:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 09:22:47 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110species In-Reply-To: Message-ID: <000001c71a13$8feec840$15327e82@pyrimidine> > Hi, > > If you run the attached shell script you should be able to > reproduce the problem. It is not about any species in > particular, but about the total number of species: it crushes > with more than 120 species. The resulting tree is not > correct, I'm checking it further now. Thanks, > > Gabriel Gabriel, My guess is this may have to do with using an old taxonomy dump file. I got this to work on winXP using the latest NCBI taxonomy. I had to modify taxonomy2tree and your shell script to get it to play nice with Windows, but I didn't get the error and I did get a tree (abbreviated for brevity): (((((("Agrobacterium tumefaciens str. C58","Sinorhizobium meliloti")Rhizobiaceae,... chris From cjfields at uiuc.edu Thu Dec 7 13:44:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 12:44:32 -0600 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: References: Message-ID: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu> On Dec 6, 2006, at 9:13 PM, zhihua li wrote: > Hi netters, > > Recently I found this: > > For constructing a new SeqI object, I had to write: > $seq_obj=Bio::SeqIO->new( > -file => '/home/myfile', > -format => 'Fasta'); #Note the dash before the > two arguments. > > If I omitted the dash: > $seq_obj=Bio::SeqIO->new( > file => '/home/myfile', > format => 'Fasta'); > I'd get error: > MSG: Unknown format given or could not determine it [] > STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 > > So it seems to me that the dashes before the arguments are > essential. However, when I tried to build a factory for > StandaloneBlast, I found the other way around. > > If the script had the dash: > $blast_obj=Bio::Tools::Run::StandAloneBlast->new( > -program => 'blastn', > -database => '/home/mydatabase'); > > I'd get the error message: MSG: Unallowed parameter: - ! > STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ > site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 > STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ > site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 > > If I left out the dash by saying: > $blast_obj=Bio::Tools::Run::StandAloneBlast->new( > program => 'blastn', > database => '/home/mydatabase'); > > Everyting is fine. > > Now I'm confused. Why sometimes I have to add the dash, while > sometimes I'm not allowed to? > > Thanks in advance! I agree that this should be more consistent. Does anyone know the reasoning for this? Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Thu Dec 7 14:32:21 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 07 Dec 2006 14:32:21 -0500 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu> Message-ID: Chris, The latest StandAloneBlast takes "dashed parameters", as in: @params = (-database => 'swissprot',-outfile => 'blast1.out'); $factory = Bio::Tools::Run::StandAloneBlast->new(@params); Or my $factory = Bio::Tools::Run::StandAloneBlast->new(-program =>"wublastp", -database=>"swissprot", -e => 1e-20); So that's why I asked "what version?" Someone made the change to allow dashes in @params a few months ago and I believe that that someone was you! Brian O. On 12/7/06 1:44 PM, "Chris Fields" wrote: > > On Dec 6, 2006, at 9:13 PM, zhihua li wrote: > >> Hi netters, >> >> Recently I found this: >> >> For constructing a new SeqI object, I had to write: >> $seq_obj=Bio::SeqIO->new( >> -file => '/home/myfile', >> -format => 'Fasta'); #Note the dash before the >> two arguments. >> >> If I omitted the dash: >> $seq_obj=Bio::SeqIO->new( >> file => '/home/myfile', >> format => 'Fasta'); >> I'd get error: >> MSG: Unknown format given or could not determine it [] >> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 >> >> So it seems to me that the dashes before the arguments are >> essential. However, when I tried to build a factory for >> StandaloneBlast, I found the other way around. >> >> If the script had the dash: >> $blast_obj=Bio::Tools::Run::StandAloneBlast->new( >> -program => 'blastn', >> -database => '/home/mydatabase'); >> >> I'd get the error message: MSG: Unallowed parameter: - ! >> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ >> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 >> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ >> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 >> >> If I left out the dash by saying: >> $blast_obj=Bio::Tools::Run::StandAloneBlast->new( >> program => 'blastn', >> database => '/home/mydatabase'); >> >> Everyting is fine. >> >> Now I'm confused. Why sometimes I have to add the dash, while >> sometimes I'm not allowed to? >> >> Thanks in advance! > > I agree that this should be more consistent. Does anyone know the > reasoning for this? > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Dec 7 14:44:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 13:44:19 -0600 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: References: Message-ID: On Dec 7, 2006, at 1:32 PM, Brian Osborne wrote: > Chris, > > The latest StandAloneBlast takes "dashed parameters", as in: > > @params = (-database => 'swissprot',-outfile => 'blast1.out'); > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > Or > > my $factory = Bio::Tools::Run::StandAloneBlast->new(-program > =>"wublastp", > - > database=>"swissprot", > -e => 1e-20); > > So that's why I asked "what version?" > > Someone made the change to allow dashes in @params a few months ago > and I > believe that that someone was you! > > Brian O. Nope, I plead innocent (at least to this!). I haven't made any commits to StandAloneBlast. These were added in by Torsten (see commits 1.59, 1.60), so you'll need to blame/thank him... http://tinyurl.com/y7ym9g So they're now a bit more consistent. That's not to say StandAloneBlast doesn't need some major revisions.... BTW, I didn't see a post from you asking about the version. Chris From akarger at CGR.Harvard.edu Thu Dec 7 16:32:51 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu, 7 Dec 2006 16:32:51 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq Message-ID: I need to know how to get the frame information in exon features (created by Bio::Tools::GFF) into a whole-gene feature that will be translated into a protein. I'm reading in some fungal GFFs generated by Jason Stajich. I - Use Bio::Tools::GFF to create a feature for each exon in a gene - Create a Bio::Location::Split object containing each feature's location - Create a Bio::SeqFeature::Generic object whose location is the above BL::Split - Attach my contig Bio::Seq to the feature - get the protein with feature->spliced_seq->translate->seq (Code below) Unfortunately, I get the wrong result when the GFF features have frame != 0. This happens for only a few percent of the exons, but when it does, I end up translating in the wrong frame. If I read the docs correctly, Location objects don't have a frame. So how do I get the correct spliced_seq, which skips one or two bp at the beginning of certain exons? I suspect the answer to this is that I'm going about this in completely the wrong way, in which case, please tell me how I ought to be doing it. Thanks, - Amir Karger Research Computing Life Sciences Division Harvard University P.S. In case you want to see actual code, here it is. After using Bio::Tools::GFF to create a sorted list of features for each exon (basically stolen from the module POD), I: # Create a new object representing the exons' gene my $coding_loc_obj = new Bio::Location::Split; foreach my $exon (@sorted_exons) { $coding_loc_obj->add_sub_Location($exon->location); } # Build a spliced feature representing the whole gene my $spliced_feat = new Bio::SeqFeature::Generic( -start => $coding_loc_obj->start, -end => $coding_loc_obj->end, -strand => $strand_num, -primary=> "splicedGene", ); $spliced_feat->location($coding_loc_obj); # Attach a contig object containing the sequence $spliced_feat->attach_seq($contig_obj->bioperl_object); # Get the spliced seq and translate to protein: my $coding_seq = $spliced_feat->spliced_seq->seq; my $protein = $spliced_feat->spliced_seq->translate->seq; From bix at sendu.me.uk Thu Dec 7 17:45:32 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 7 Dec 2006 15:45:32 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release Message-ID: <000001c71a51$671a79d0$6400a8c0@CodonSolutions.local> I am proud to announce the final release of Bioperl 1.5.2. http://www.bioperl.org/wiki/Release_1.5.2 bioperl (core): cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-1.5.2_100.zip bioperl-run: cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip bioperl-db: cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip bioperl-network: cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip http://bioperl.org/DIST/SIGNATURES.md5 (all are also available via CVS, and for Windows users, using the Perl Package Manager - see the wiki for details) The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree and bioperl-pipeline) did not see a unified release for 1.5.2. This release represents a developer release which has been thoroughly tested. We consider it the most stable (in terms of bugs) version of Bioperl and believe it to be suitable for most people. It is marked 'developer' or even 'unstable' because its API may change on short notice. It will also not be maintained or supported beyond the next bioperl release. 1.5.2 introduces the following new (core) features: * Taxonomy (Bio::Species) overhaul * Bio::Map improvements * Bio::SearchIO speedup * Build.PL installation For details, and a complete change log, see the wiki. API documentation is available here: http://doc.bioperl.org/ Acknowledgements: Enumerable thanks are due for the tireless efforts of Christopher Fields (bug fixing, testing, documentation, discussion), Nathan Haigh (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra (testing, documentation, support). Feedback and ideas provided by Hilmar Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and elsewhere proved invaluable. None of this would have been possible without the behind-the-scenes work of the open-bio support team. I'd also like to acknowledge Andreas J. Koenig for his help with CPAN matters. Finally, thank you to everyone who tried out the release candidates, and especially those that took the time to file bug reports or report problems. Remember, Bioperl can only go from strength to strength with /your/ help. If you'd like to experience the fame and fortune that naturally follow becoming a Bioperl developer (?!), become one! http://www.bioperl.org/wiki/Becoming_a_developer On behalf of the Bioperl team, Sendu Bala. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From cjfields at uiuc.edu Thu Dec 7 18:00:43 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 16:00:43 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c71a53$85cb4f10$6400a8c0@CodonSolutions.local> Great job Sendu! A bit of icing on the cake: all the WinXP PPMs (core, db, network, run) installed w/o a hitch following normal instructions using PPM4 (GUI and command line shell) using clean ActiveState installations. Looks like all the correct prereqs were installed with shell (only XML::SAX::ExpatXS was left out in the GUI installation for reasons outlined before). I'll run more tests tomorrow to see if tests pass with the installed bioperl (this should catch any prereq issues with PPM installation we missed). chris > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using > the Perl Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, > bioperl-pedigree and bioperl-pipeline) did not see a unified > release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of > Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas > provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the > mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with > CPAN matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or > report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From kaboroev at sfu.ca Thu Dec 7 17:26:35 2006 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Thu, 07 Dec 2006 14:26:35 -0800 Subject: [Bioperl-l] Bio::Graphics xyplot Message-ID: <4578951B.5050206@sfu.ca> Hi everyone, I'm attempting to add an xyplot of the phred quality scores to an Bio::Graphics image, and cannot get it to work. I have the panel with a track for both the scale and the DNA displaying properly. When I attempt to add the xyplot i just get a garbled track of, what looks like, timy xyplots for each datapoint. I have the cvs (updated today) of bioperl-live running. I think what I am missing is the creation of a "Sequence Feature Group" to hold the individual points of the plot. However, I cannot seem to find such an object. This is what I attempted: -------BEGIN---CODE----------- # start panel my $panel = Bio::Graphics::Panel->new(-length => $f_seqlen, -width => $f_seqlen*10, -pad_left => 10, -pad_right => 10, -grid => 1 ); # add scale $panel->add_track(arrow => Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen), -double => 1, -tick => 2, -fgcolor => 'black'); # add DNA ($feature is of type Bio::SeqFeature::Annotated) $panel->add_track(dna => $feature); # get list of quality scores from database my ($pqs_value) = $dbh->selectrow_array($sql); my @pqs_value = split(/\s/,$pqs_value); # create track my $track = $panel->add_track(-glyph => 'xyplot', -graph_type => 'points', -point_symbol => 'point', -max_score => 100, -min_score => 0, -scale => 'none'); # add "subfeatures" to for (my $i=0;$i<$f_seqlen;$i++) { $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i])); } print $panel->png(); $panel->finished; ------END---CODE---------- I also attempted to create an array of the point features and passed that by reference to the panel "add_track" as it describes in the xyplot documentation, but that resulted in the exact same image. keith -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 From arareko at campus.iztacala.unam.mx Thu Dec 7 18:15:53 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 7 Dec 2006 16:15:53 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c71a55$a479da60$6400a8c0@CodonSolutions.local> This has been a great effort. Congrats and thanks to everyone involved! Mauricio. Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN matters. > > Finally, thank you to everyone who tried out the release candidates, and > especially those that took the time to file bug reports or report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From cain at cshl.edu Thu Dec 7 17:46:09 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 07 Dec 2006 17:46:09 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq In-Reply-To: References: Message-ID: <1165531569.2569.49.camel@localhost.localdomain> Amir, I don't know for sure what the problem is, but here is one possibility: the number in column 8 of a GFF file is not the frame, it is the phase. See the GFF3 spec for a description of what the phase is: http://www.sequenceontology.org/gff3.shtml (It doesn't matter if you are using GFF3 or GFF2, as the phase is the same in both). Scott On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > I need to know how to get the frame information in exon features > (created by Bio::Tools::GFF) into a whole-gene feature that will be > translated into a protein. > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > - Create a Bio::Location::Split object containing each feature's > location > - Create a Bio::SeqFeature::Generic object whose location is the above > BL::Split > - Attach my contig Bio::Seq to the feature > - get the protein with feature->spliced_seq->translate->seq > > (Code below) > > Unfortunately, I get the wrong result when the GFF features have frame > != 0. This happens for only a few percent of the exons, but when it > does, I end up translating in the wrong frame. > > If I read the docs correctly, Location objects don't have a frame. So > how do I get the correct spliced_seq, which skips one or two bp at the > beginning of certain exons? > > I suspect the answer to this is that I'm going about this in completely > the wrong way, in which case, please tell me how I ought to be doing it. > > Thanks, > - Amir Karger > Research Computing > Life Sciences Division > Harvard University > > P.S. In case you want to see actual code, here it is. After using > Bio::Tools::GFF to create a sorted list of features for each exon > (basically stolen from the module POD), I: > # Create a new object representing the exons' gene > my $coding_loc_obj = new Bio::Location::Split; > foreach my $exon (@sorted_exons) { > $coding_loc_obj->add_sub_Location($exon->location); > } > > # Build a spliced feature representing the whole gene > my $spliced_feat = new Bio::SeqFeature::Generic( > -start => $coding_loc_obj->start, > -end => $coding_loc_obj->end, > -strand => $strand_num, > -primary=> "splicedGene", > ); > $spliced_feat->location($coding_loc_obj); > > # Attach a contig object containing the sequence > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > # Get the spliced seq and translate to protein: > my $coding_seq = $spliced_feat->spliced_seq->seq; > my $protein = $spliced_feat->spliced_seq->translate->seq; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Thu Dec 7 21:52:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 20:52:47 -0600 Subject: [Bioperl-l] Using frame info from GFF in gettinga Seq->spliced_seq In-Reply-To: <1165531569.2569.49.camel@localhost.localdomain> Message-ID: <002d01c71a73$f16ecc40$15327e82@pyrimidine> Another issue is the splittype() is not defined, though I don't think that would kill anything as currently implemented. However, one thing we have passingly discussed is having Bio::Location::Split objects possibly exhibit different (but expected) behaviors based upon the splittype() (order, join, or bond). It's one of the things I want to work out for the next release. If Scott's fix doesn't work and the problem persists, you should file a bug report with some sample data for us to test out. chris > Amir, > > I don't know for sure what the problem is, but here is one > possibility: > the number in column 8 of a GFF file is not the frame, it is > the phase. > See the GFF3 spec for a description of what the phase is: > > http://www.sequenceontology.org/gff3.shtml > > (It doesn't matter if you are using GFF3 or GFF2, as the > phase is the same in both). > > Scott > > > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > > I need to know how to get the frame information in exon features > > (created by Bio::Tools::GFF) into a whole-gene feature that will be > > translated into a protein. > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > - Create a Bio::Location::Split object containing each feature's > > location > > - Create a Bio::SeqFeature::Generic object whose location > is the above > > BL::Split > > - Attach my contig Bio::Seq to the feature > > - get the protein with feature->spliced_seq->translate->seq > > > > (Code below) > > > > Unfortunately, I get the wrong result when the GFF features > have frame > > != 0. This happens for only a few percent of the exons, but when it > > does, I end up translating in the wrong frame. > > > > If I read the docs correctly, Location objects don't have a > frame. So > > how do I get the correct spliced_seq, which skips one or > two bp at the > > beginning of certain exons? > > > > I suspect the answer to this is that I'm going about this in > > completely the wrong way, in which case, please tell me how > I ought to be doing it. > > > > Thanks, > > - Amir Karger > > Research Computing > > Life Sciences Division > > Harvard University > > > > P.S. In case you want to see actual code, here it is. After using > > Bio::Tools::GFF to create a sorted list of features for each exon > > (basically stolen from the module POD), I: > > # Create a new object representing the exons' gene > > my $coding_loc_obj = new Bio::Location::Split; > > foreach my $exon (@sorted_exons) { > > $coding_loc_obj->add_sub_Location($exon->location); > > } > > > > # Build a spliced feature representing the whole gene > > my $spliced_feat = new Bio::SeqFeature::Generic( > > -start => $coding_loc_obj->start, > > -end => $coding_loc_obj->end, > > -strand => $strand_num, > > -primary=> "splicedGene", > > ); > > $spliced_feat->location($coding_loc_obj); > > > > # Attach a contig object containing the sequence > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > # Get the spliced seq and translate to protein: > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > my $protein = $spliced_feat->spliced_seq->translate->seq; From jason at bioperl.org Thu Dec 7 21:01:33 2006 From: jason at bioperl.org (Jason Stajich) Date: Thu, 7 Dec 2006 18:01:33 -0800 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq In-Reply-To: References: Message-ID: <866F6CEE-62BB-4880-9B13-6DDE29EAF94E@bioperl.org> This was a problem in the gene prediction output I suspect, more recent versions of the program should have fixed this. I do not currently have free time to deal with the errors in the small number of ORFs where this has happened. I think you just need to do start -= start- (frame*strand) for 1st exons. You can also probably provide the 1st exon's frame to the translate function as another possibility but you should try and get the CDS correct first depending on your downstream analyses. -jason On Dec 7, 2006, at 1:32 PM, Amir Karger wrote: > I need to know how to get the frame information in exon features > (created by Bio::Tools::GFF) into a whole-gene feature that will be > translated into a protein. > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > - Create a Bio::Location::Split object containing each feature's > location > - Create a Bio::SeqFeature::Generic object whose location is the above > BL::Split > - Attach my contig Bio::Seq to the feature > - get the protein with feature->spliced_seq->translate->seq > > (Code below) > > Unfortunately, I get the wrong result when the GFF features have frame > != 0. This happens for only a few percent of the exons, but when it > does, I end up translating in the wrong frame. > > If I read the docs correctly, Location objects don't have a frame. So > how do I get the correct spliced_seq, which skips one or two bp at the > beginning of certain exons? > > I suspect the answer to this is that I'm going about this in > completely > the wrong way, in which case, please tell me how I ought to be > doing it. > > Thanks, > - Amir Karger > Research Computing > Life Sciences Division > Harvard University > > P.S. In case you want to see actual code, here it is. After using > Bio::Tools::GFF to create a sorted list of features for each exon > (basically stolen from the module POD), I: > # Create a new object representing the exons' gene > my $coding_loc_obj = new Bio::Location::Split; > foreach my $exon (@sorted_exons) { > $coding_loc_obj->add_sub_Location($exon->location); > } > > # Build a spliced feature representing the whole gene > my $spliced_feat = new Bio::SeqFeature::Generic( > -start => $coding_loc_obj->start, > -end => $coding_loc_obj->end, > -strand => $strand_num, > -primary=> "splicedGene", > ); > $spliced_feat->location($coding_loc_obj); > > # Attach a contig object containing the sequence > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > # Get the spliced seq and translate to protein: > my $coding_seq = $spliced_feat->spliced_seq->seq; > my $protein = $spliced_feat->spliced_seq->translate->seq; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From neetisomaiya at gmail.com Fri Dec 8 05:21:50 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 8 Dec 2006 15:51:50 +0530 Subject: [Bioperl-l] need help with phrap parser Message-ID: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> Can anyone point me to a Phrap parser which parses the ace file to extract what reads make up each contig (eg. read_a and read_b make contig1; read_d read_e and read_z make contig2, and other information of the reads (like whether the read is complemented or not with respect to the contig, what region of the contig does each read contribute etc), basically the AF and BS lines of the ACE output. -- -Neeti Even my blood says, B positive From pmiguel at purdue.edu Fri Dec 8 09:17:02 2006 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 08 Dec 2006 09:17:02 -0500 Subject: [Bioperl-l] need help with phrap parser In-Reply-To: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> References: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> Message-ID: <457973DE.6050900@purdue.edu> neeti somaiya wrote: > Can anyone point me to a Phrap parser which parses the ace file to extract > what reads make up each contig (eg. read_a and read_b make contig1; read_d > read_e and read_z make contig2, and other information of the reads (like > whether the read is complemented or not with respect to the contig, what > region of the contig does each read contribute etc), basically the AF and BS > lines of the ACE output. > > neeti, To find the reads that went into each contig, you do *not* want the BS tagged records. My understanding is that BS is just what consed uses to populate its consensus line from the ace file. I write this because of an email sent me by David Gordon in 2001 included here without his permission: > > Phrap writes BS lines which > > indicate, for each consensus position, which read phrap uses at that > > position to become the consensus. These BS ("base segments") are > > manipulated by Consed when there are changes to the assembly, such as > > joins, tears, removing reads, or changing the consensus. > The simplest way is: egrep '^CO|AF|RD' acefilename if you are on a unix system. Or with perl while (<>) { print if (/^CO|AF|RD/); } But then you would need to parse the fields of interest. You get the position/strand in the contig from AF, then you get the length of the read from RD. There does look like there is a part of bioperl that meant to perform this task--including Bio::Assembly::IO::ace but it looks like it was started, but never completed. From cjfields at uiuc.edu Fri Dec 8 10:17:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 09:17:31 -0600 Subject: [Bioperl-l] NAR Database Issue Papers Message-ID: <000601c71adb$fdd60490$15327e82@pyrimidine> For those interested, the Nucleic Acids Research Database issue papers have been popping up in the Advance Access section of the NAR website: http://nar.oxfordjournals.org/papbyrecent.dtl Ensembl, UCSC Browser, Entrez Gene, and a number of others of possible are represented. Of particular note are a few mentions of formatting changes to UniProt, EMBL, and other records, which should be taken care of in the latest BioPerl release (fingers crossed!). chris From cjfields at uiuc.edu Fri Dec 8 10:31:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 09:31:19 -0600 Subject: [Bioperl-l] need help with phrap parser In-Reply-To: <457973DE.6050900@purdue.edu> Message-ID: <000001c71add$ec7147d0$15327e82@pyrimidine> ... > But then you would need to parse the fields of interest. You get the > position/strand in the contig from AF, then you get the length of the > read from RD. > > There does look like there is a part of bioperl that meant to perform > this task--including Bio::Assembly::IO::ace but it looks like it was > started, but never completed. ...and if anyone wants to chip in and work on it, let us know! The various Bio::Assembly modules are one of many areas that needs some updating. chris From akarger at CGR.Harvard.edu Fri Dec 8 13:25:47 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 8 Dec 2006 13:25:47 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq Message-ID: > This was a problem in the gene prediction output I suspect, more > recent versions of the program should have fixed this. I do not > currently have free time to deal with the errors in the small number > of ORFs where this has happened. > > I think you just need to do > start -= start- (frame*strand) > for 1st exons. I used if (strand==1) {start += exon->frame} else {end -= exon->frame} This took me from 90 translations that had * within the sequence to just 9, out of 5500 CDS in S bayanus. > You can also probably provide the 1st exon's frame to the translate > function as another possibility but you should try and get the CDS > correct first depending on your downstream analyses. Yes, I think. Scott Cain pointed out that GFF column 8 is the "phase", which I had never heard of before. My current, very limited, understanding is that sometimes you'll have an exon with, say, 31 bp, followed by an exon with 29 bp. When the intron gets spliced out, you eventually get an mRNA of 60 bp, which translates to a protein of 20 aa. But the second exon has a phase of 1, not 0, because you can't just start translating at the first bp of the second exon and expect to get nice amino acids. By the way, whether or not phase is the same thing as frame, when I call the frame() method on the features created by Bio::Tools::GFF, I get the phase info. I assume that's a feature (no pun intended), not a bug? I'm still confused as to why you would have a phase in the first exon, though. Why not just say the CDS starts 1 or 2 bp later? (This is probably a bio question, not a bioperl question, but a quick Google didn't get me an answer. "Phase" isn't a very good search term.) I guess the real question here, which Jason alludes to, is whether SeqFeature->spliced_seq ought to take into account the phase information of the first exon. Right now, it doesn't, so when you call SeqFeature->spliced_seq->translate, you get gibberish. Are there cases where you would want spliced_seq to include the first bp or two? Should there be an option to spliced_seq for whether you want to take phase information into account? I can't submit a bug report until we confirm it's a bug. Thanks, -Amir Karger > -jason > On Dec 7, 2006, at 1:32 PM, Amir Karger wrote: > > > I need to know how to get the frame information in exon features > > (created by Bio::Tools::GFF) into a whole-gene feature that will be > > translated into a protein. > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > - Create a Bio::Location::Split object containing each feature's > > location > > - Create a Bio::SeqFeature::Generic object whose location > is the above > > BL::Split > > - Attach my contig Bio::Seq to the feature > > - get the protein with feature->spliced_seq->translate->seq > > > > (Code below) > > > > Unfortunately, I get the wrong result when the GFF features > have frame > > != 0. This happens for only a few percent of the exons, but when it > > does, I end up translating in the wrong frame. > > > > If I read the docs correctly, Location objects don't have a > frame. So > > how do I get the correct spliced_seq, which skips one or > two bp at the > > beginning of certain exons? > > > > I suspect the answer to this is that I'm going about this in > > completely > > the wrong way, in which case, please tell me how I ought to be > > doing it. > > > > Thanks, > > - Amir Karger > > Research Computing > > Life Sciences Division > > Harvard University > > > > P.S. In case you want to see actual code, here it is. After using > > Bio::Tools::GFF to create a sorted list of features for each exon > > (basically stolen from the module POD), I: > > # Create a new object representing the exons' gene > > my $coding_loc_obj = new Bio::Location::Split; > > foreach my $exon (@sorted_exons) { > > $coding_loc_obj->add_sub_Location($exon->location); > > } > > > > # Build a spliced feature representing the whole gene > > my $spliced_feat = new Bio::SeqFeature::Generic( > > -start => $coding_loc_obj->start, > > -end => $coding_loc_obj->end, > > -strand => $strand_num, > > -primary=> "splicedGene", > > ); > > $spliced_feat->location($coding_loc_obj); > > > > # Attach a contig object containing the sequence > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > # Get the spliced seq and translate to protein: > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > my $protein = $spliced_feat->spliced_seq->translate->seq; > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From akarger at CGR.Harvard.edu Fri Dec 8 13:33:09 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 8 Dec 2006 13:33:09 -0500 Subject: [Bioperl-l] Using frame info from GFF in gettinga Seq->spliced_seq Message-ID: > Another issue is the splittype() is not defined, though I > don't think that > would kill anything as currently implemented. However, one > thing we have > passingly discussed is having Bio::Location::Split objects > possibly exhibit > different (but expected) behaviors based upon the splittype() > (order, join, > or bond). It's one of the things I want to work out for the > next release. Should I be writing -splittype => "JOIN" or some such in my new()? -Amir Karger > > chris > > > Amir, > > > > I don't know for sure what the problem is, but here is one > > possibility: > > the number in column 8 of a GFF file is not the frame, it is > > the phase. > > See the GFF3 spec for a description of what the phase is: > > > > http://www.sequenceontology.org/gff3.shtml > > > > (It doesn't matter if you are using GFF3 or GFF2, as the > > phase is the same in both). > > > > Scott > > > > > > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > > > I need to know how to get the frame information in exon features > > > (created by Bio::Tools::GFF) into a whole-gene feature > that will be > > > translated into a protein. > > > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > > - Create a Bio::Location::Split object containing each feature's > > > location > > > - Create a Bio::SeqFeature::Generic object whose location > > is the above > > > BL::Split > > > - Attach my contig Bio::Seq to the feature > > > - get the protein with feature->spliced_seq->translate->seq > > > > > > (Code below) > > > > > > Unfortunately, I get the wrong result when the GFF features > > have frame > > > != 0. This happens for only a few percent of the exons, > but when it > > > does, I end up translating in the wrong frame. > > > > > > If I read the docs correctly, Location objects don't have a > > frame. So > > > how do I get the correct spliced_seq, which skips one or > > two bp at the > > > beginning of certain exons? > > > > > > I suspect the answer to this is that I'm going about this in > > > completely the wrong way, in which case, please tell me how > > I ought to be doing it. > > > > > > Thanks, > > > - Amir Karger > > > Research Computing > > > Life Sciences Division > > > Harvard University > > > > > > P.S. In case you want to see actual code, here it is. After using > > > Bio::Tools::GFF to create a sorted list of features for each exon > > > (basically stolen from the module POD), I: > > > # Create a new object representing the exons' gene > > > my $coding_loc_obj = new Bio::Location::Split; > > > foreach my $exon (@sorted_exons) { > > > $coding_loc_obj->add_sub_Location($exon->location); > > > } > > > > > > # Build a spliced feature representing the whole gene > > > my $spliced_feat = new Bio::SeqFeature::Generic( > > > -start => $coding_loc_obj->start, > > > -end => $coding_loc_obj->end, > > > -strand => $strand_num, > > > -primary=> "splicedGene", > > > ); > > > $spliced_feat->location($coding_loc_obj); > > > > > > # Attach a contig object containing the sequence > > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > > > # Get the spliced seq and translate to protein: > > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > > my $protein = $spliced_feat->spliced_seq->translate->seq; > > > > From cjfields at uiuc.edu Fri Dec 8 14:04:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 13:04:55 -0600 Subject: [Bioperl-l] Using frame info from GFF ingettinga Seq->spliced_seq In-Reply-To: Message-ID: <000901c71afb$bf504210$15327e82@pyrimidine> > > Another issue is the splittype() is not defined, though I > don't think > > that would kill anything as currently implemented. > However, one thing > > we have passingly discussed is having Bio::Location::Split objects > > possibly exhibit different (but expected) behaviors based upon the > > splittype() (order, join, or bond). It's one of the things > I want to > > work out for the next release. > > Should I be writing -splittype => "JOIN" or some such in my new()? > > -Amir Karger I missed the fact that 'JOIN' is the default splittype() from looking at the constructor in Location::Split, so you actually don't have to explicitly set it; apologies for that. If we make any changes that affect how Location::Split behaves we'll likely leave the default splittype() as 'JOIN' as it's by far the most common join operator. chris From cjfields at uiuc.edu Fri Dec 8 15:03:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 14:03:16 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: Message-ID: <000001c71b03$e6741e90$15327e82@pyrimidine> > Yes, I think. Scott Cain pointed out that GFF column 8 is the > "phase", which I had never heard of before. My current, very > limited, understanding is that sometimes you'll have an exon > with, say, 31 bp, followed by an exon with 29 bp. When the > intron gets spliced out, you eventually get an mRNA of 60 bp, > which translates to a protein of 20 aa. > But the second exon has a phase of 1, not 0, because you > can't just start translating at the first bp of the second > exon and expect to get nice amino acids. I think the use of 'frame' here is meant relative to the DNA sequence (i.e. ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e. translation, three frames). At least I think that's what is meant! > By the way, whether or not phase is the same thing as frame, > when I call the frame() method on the features created by > Bio::Tools::GFF, I get the phase info. I assume that's a > feature (no pun intended), not a bug? > > I'm still confused as to why you would have a phase in the > first exon, though. Why not just say the CDS starts 1 or 2 bp > later? (This is probably a bio question, not a bioperl > question, but a quick Google didn't get me an answer. "Phase" > isn't a very good search term.) It could be b/c the location coordinates delineate the exon coding boundary. It's conceivable the first exon in a sequence record is not the first exon of the mRNA (i.e. there may be one or more exons prior to or past the exon of interest that are in 'remote' sequence records). Like this admittedly extreme example (GB acc AF130134): join(AF130124.1:2563..2964,AF130125.1:21..157,AF130126.1:12..174, AF130127.1:21..112,AF130128.1:21..162,AF130128.1:281..595, AF130128.1:661..842,AF130128.1:916..1030,AF130129.1:21..115, AF130130.1:21..165,AF130131.1:21..125,AF130132.1:21..428, AF130132.1:492..746,AF130133.1:21..168,AF130133.1:232..401, AF130133.1:475..906,AF130133.1:970..1107,AF130133.1:1176..1367,21..128) Also, the ends of the lcoation may be uncertain ('fuzzy'): join(complement(1009..>1260),complement(AF081827.1:<1..177)) > I guess the real question here, which Jason alludes to, is whether > SeqFeature->spliced_seq ought to take into account the phase > information > of the first exon. Right now, it doesn't, so when you call > SeqFeature->spliced_seq->translate, you get gibberish. Are there cases > where you would want spliced_seq to include the first bp or > two? Should there be an option to spliced_seq for whether you > want to take phase information into account? > > I can't submit a bug report until we confirm it's a bug. > > Thanks, > -Amir Karger You can already pass the frame or an offset to PrimarySeqI::translate(). Here are the args: Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 The offset comes from some GenBank seqfeatures which have an '\codon_start' tag indicating which nucleotide to start translation from (1,2,3). This is essentially just the phase+1. We could add a '-phase' argument for convenience which accepts 0,1,2. chris From bobfreemanma at speakeasy.net Fri Dec 8 15:47:15 2006 From: bobfreemanma at speakeasy.net (Bob Freeman) Date: Fri, 8 Dec 2006 15:47:15 -0500 Subject: [Bioperl-l] writing blastxml In-Reply-To: <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com> <000301c6f846$d6227760$15327e82@pyrimidine> <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> Message-ID: Can't seem to find a good post on this to answer my question: Does anyone know a good way to (re)write BLAST reports in XML format? I've got about 30,000 reports I need to rewrite for a (good!) piece of java software that will only import xml formatted BLAST reports. Right now, all mine are plain text. I don't think bioperl can do this yet, correct? If not, any suggestions, besides reblasting all 30,000? I'd like to save a few trees and lumps of coal. TIA, Bob -- ----------------------------------------------------- Bob Freeman, Ph.D. Bioinformatics consultant 51 Downer Avenue, #2 Dorchester, MA 02125 617/699.7057, vox If brains were taxed, he'd get a refund. -- Anonymous From camp_boot at hotmail.com Sun Dec 10 05:00:55 2006 From: camp_boot at hotmail.com (synapse) Date: Sun, 10 Dec 2006 10:00:55 +0000 (UTC) Subject: [Bioperl-l] Driver program for PestFind.pm Message-ID: Dear All, I apologize in advance for my almost total lack of knowledge of perl as a programming language. I need to use PestFind program, part of the biop_run package of bioperl. My understanding is that I will need a simple wrapper program that will read arguments from the command line, and pass them to that module. - Is there such program available that I can just use? - Does anyone know if pestfind can work on multiple sequence files (in fasta format), or does it only process single sequence files? Thanks a lot for the feedback. From cjfields at uiuc.edu Sun Dec 10 13:45:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 12:45:26 -0600 Subject: [Bioperl-l] writing blastxml In-Reply-To: References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com> <000301c6f846$d6227760$15327e82@pyrimidine> <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> Message-ID: <7FB4EBB9-BEDC-4250-BE2F-3F695D36F350@uiuc.edu> On Dec 8, 2006, at 2:47 PM, Bob Freeman wrote: > Can't seem to find a good post on this to answer my question: > > Does anyone know a good way to (re)write BLAST reports in XML format? > I've got about 30,000 reports I need to rewrite for a (good!) piece > of java software that will only import xml formatted BLAST reports. > Right now, all mine are plain text. > > I don't think bioperl can do this yet, correct? If not, any > suggestions, besides reblasting all 30,000? I'd like to save a few > trees and lumps of coal. > > TIA, > Bob The only BioPerl writers for BLAST reports are in BSML and HTML, not BLAST XML. I don't think there there have been any requests for it, and no one has really stepped forward to submit one. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Dec 10 13:55:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 12:55:16 -0600 Subject: [Bioperl-l] Driver program for PestFind.pm In-Reply-To: References: Message-ID: <32B0F15D-4144-43B6-AA81-5ED9BA848F45@uiuc.edu> On Dec 10, 2006, at 4:00 AM, synapse wrote: > Dear All, > > I apologize in advance for my almost total lack of knowledge of > perl as a > programming language. > > I need to use PestFind program, part of the biop_run package of > bioperl. My > understanding is that I will need a simple wrapper program that > will read > arguments from the command line, and pass them to that module. PestFind is part of the EMBOSS suite of programs: http://emboss.sourceforge.net/ The PestFind module in bioperl-run is actually used via Pise. > - Is there such program available that I can just use? See above > - Does anyone know if pestfind can work on multiple sequence > files (in fasta > format), or does it only process single sequence files? > > Thanks a lot for the feedback. No idea there, but the EMBOSS docs should tell you. chris From cjfields at uiuc.edu Mon Dec 11 00:38:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 23:38:32 -0600 Subject: [Bioperl-l] bioperl-run parameter question Message-ID: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> I am writing up a few bioperl-run modules and have a simple question, though I don't know if anyone knows the answer. I was curious as to why parameters for most (all?) bioperl-run modules lack the '-' preceding them. This came up re: StandAloneBlast last week (something Torsten fixed), but I noticed just about every bioperl-run module uses the dashless parameters. chris From n.haigh at sheffield.ac.uk Mon Dec 11 01:44:25 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Mon, 11 Dec 2006 06:44:25 +0000 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> Message-ID: <457CFE49.5010201@sheffield.ac.uk> Chris Fields wrote: > I am writing up a few bioperl-run modules and have a simple question, > though I don't know if anyone knows the answer. I was curious as to > why parameters for most (all?) bioperl-run modules lack the '-' > preceding them. This came up re: StandAloneBlast last week > (something Torsten fixed), but I noticed just about every bioperl-run > module uses the dashless parameters. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > No idea! Is there any reason for/against using dashed/dashless parameters? I suppose dshed parameters allow you to easy see which tokens on the command line are parameters and which are values. Should modules be able to accept both? Should dashed be preferred? Nath From cjfields at uiuc.edu Mon Dec 11 08:06:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 07:06:32 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457CFE49.5010201@sheffield.ac.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457CFE49.5010201@sheffield.ac.uk> Message-ID: On Dec 11, 2006, at 12:44 AM, Nathan S. Haigh wrote: > Chris Fields wrote: >> I am writing up a few bioperl-run modules and have a simple question, >> though I don't know if anyone knows the answer. I was curious as to >> why parameters for most (all?) bioperl-run modules lack the '-' >> preceding them. This came up re: StandAloneBlast last week >> (something Torsten fixed), but I noticed just about every bioperl-run >> module uses the dashless parameters. >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > No idea! > > Is there any reason for/against using dashed/dashless parameters? I > suppose dshed parameters allow you to easy see which tokens on the > command line are parameters and which are values. Should modules be > able > to accept both? Should dashed be preferred? > > Nath > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l I'm thinking about it from the point of consistency. When using a mix of core and run modules it can be a bit confusing, particularly when (as pointed out in the previous thread on StandAloneBlast) you can use only dashed parameters with core modules, while most (all?) run modules only accept dashless ones (in most cases some exception is thrown). Torsten fixed this in StandAloneBlast so it accepts both, but shouldn't this rule also apply to all run modules? Much of this probably is probably due to the donated nature of much of the bioperl-run code and Jason's 'cat-herding', and I understand that it would be a lot of work to change this for all run modules. However, we could at least try to start enforcing some loose rules with new bioperl-run wrappers (e.g. implement WrapperBase, use core- like parameters, etc). chris From akarger at CGR.Harvard.edu Mon Dec 11 11:20:03 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 11 Dec 2006 11:20:03 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq Message-ID: Chris Fields wrote: > > > Yes, I think. Scott Cain pointed out that GFF column 8 is the > > "phase", which I had never heard of before. My current, very > > limited, understanding is that sometimes you'll have an exon > > with, say, 31 bp, followed by an exon with 29 bp. When the > > intron gets spliced out, you eventually get an mRNA of 60 bp, > > which translates to a protein of 20 aa. > > But the second exon has a phase of 1, not 0, because you > > can't just start translating at the first bp of the second > > exon and expect to get nice amino acids. > > I think the use of 'frame' here is meant relative to the DNA > sequence (i.e. > ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e. > translation, three frames). At least I think that's what is meant! I agree. By the way, I'd love a reference to a simple bio-explanation of what's happening here. Google searches for "coding sequence phase" are not all that relevant. > > I'm still confused as to why you would have a phase in the > > first exon, though. Why not just say the CDS starts 1 or 2 bp > > later? (This is probably a bio question, not a bioperl > > question, but a quick Google didn't get me an answer. "Phase" > > isn't a very good search term.) > > It could be b/c the location coordinates delineate the exon > coding boundary. > It's conceivable the first exon in a sequence record is not > the first exon > of the mRNA (i.e. there may be one or more exons prior to or > past the exon > of interest that are in 'remote' sequence records). That's certainly not the case here, because the files have the entire genomes in them. > Also, the ends of the lcoation may be uncertain ('fuzzy'): > > join(complement(1009..>1260),complement(AF081827.1:<1..177)) Also not the case here. These locations aren't listed as fuzzy. Any other thoughts? > > I guess the real question here, which Jason alludes to, is whether > > SeqFeature->spliced_seq ought to take into account the phase > > information > > of the first exon. Right now, it doesn't, so when you call > > SeqFeature->spliced_seq->translate, you get gibberish. Are > there cases > > where you would want spliced_seq to include the first bp or > > two? Should there be an option to spliced_seq for whether you > > want to take phase information into account? > > You can already pass the frame or an offset to > PrimarySeqI::translate(). > We could add a '-phase' argument for > convenience which accepts 0,1,2. But as Jason pointed out, you should find the problem earlier. What if I want to get the RNA sequence that will become the protein? then having a phase arg to translate() doesn't help. Should there be a phase arg to spliced_seq? Which raises another bio question: at what point are the first 1 or 2 bp dropped when you have a phase of 1 or 2? Do they appear in the mRNA? -Amir Karger From bix at sendu.me.uk Mon Dec 11 13:21:42 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 11 Dec 2006 13:21:42 -0500 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> Message-ID: <457DA1B6.1060706@sendu.me.uk> Chris Fields wrote: > I am writing up a few bioperl-run modules and have a simple question, > though I don't know if anyone knows the answer. I was curious as to > why parameters for most (all?) bioperl-run modules lack the '-' > preceding them. This came up re: StandAloneBlast last week > (something Torsten fixed), but I noticed just about every bioperl-run > module uses the dashless parameters. I didn't follow that particular thread, but from my experience there is a useful distinction between bioperl options using the - as normal for full consistency with core (eg. -verbose), whilst the options that belong to the program the run module is a wrapper for do not take dashes. Again, this seems consistent within the run package. I'd suggest sticking to the current pattern. Cheers, Sendu. From cjfields at uiuc.edu Mon Dec 11 15:07:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 14:07:16 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457DA1B6.1060706@sendu.me.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> Message-ID: On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote: > Chris Fields wrote: >> I am writing up a few bioperl-run modules and have a simple >> question, though I don't know if anyone knows the answer. I was >> curious as to why parameters for most (all?) bioperl-run modules >> lack the '-' preceding them. This came up re: StandAloneBlast >> last week (something Torsten fixed), but I noticed just about >> every bioperl-run module uses the dashless parameters. > > I didn't follow that particular thread, but from my experience > there is a useful distinction between bioperl options using the - > as normal for full consistency with core (eg. -verbose), whilst the > options that belong to the program the run module is a wrapper for > do not take dashes. Again, this seems consistent within the run > package. I respectfully disagree that this is a 'useful' distinction. My main point is consistency. To me, it's counterintuitive to have two Bioperl classes, both which inherit Bio::Root::Root, use two different syntaxes for any parameters passed to the constructor, even if some are 'program' parameters. It's also not consistent with StandAloneBlast or RemoteBlast, both which are considered bioperl-run modules even though they are in core, and both or which use dashed parameters (StandAloneBlast actually allows both). In fact, it isn't consistent within bioperl-run itself. Bio::Tools::Run::EMBOSSApplication uses dashes for parameters in a hashref! Okay, judging by the previous examples, 'consistency' isn't a word I would use to describe bioperl-run as a whole (back to Jason's 'cat- herding' analogy). It would be easier to let it slide for now, especially since changing them would be a serious pain, not to mention an API issue. But shouldn't there be some consistency? And what about new modules? Do we follow the historical (possibly confusing) 'dashless' route, or use the core-like dashed approach (thus breaking from the other run modules)? > I'd suggest sticking to the current pattern. > > > Cheers, > Sendu. I'll allow for both, ala StandAloneBlast. Doesn't hurt to be safe. ; > Have fun at the hackathon! chris From bix at sendu.me.uk Mon Dec 11 16:19:55 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 11 Dec 2006 16:19:55 -0500 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> Message-ID: <457DCB7B.8050500@sendu.me.uk> Chris Fields wrote: > > On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> I am writing up a few bioperl-run modules and have a simple >>> question, though I don't know if anyone knows the answer. I was >>> curious as to why parameters for most (all?) bioperl-run modules >>> lack the '-' preceding them. This came up re: StandAloneBlast last >>> week (something Torsten fixed), but I noticed just about every >>> bioperl-run module uses the dashless parameters. >> >> I didn't follow that particular thread, but from my experience there >> is a useful distinction between bioperl options using the - as normal >> for full consistency with core (eg. -verbose), whilst the options that >> belong to the program the run module is a wrapper for do not take >> dashes. Again, this seems consistent within the run package. > > I respectfully disagree that this is a 'useful' distinction. My main > point is consistency. [snip] We're on the same page in terms of what we think would be a Good Thing, and allowing both ways (dashed and dashless) sounds reasonable. I was just suggesting why bioperl-run might be the way it was. Further to that, there is the practical aspect that it is a lot simpler to figure out which are the program options so they can be farmed out to the AUTOLOAD methods - again something that isn't done in core. If you come up with some generic way of dealing with options and farming to AUTOLOAD, perhaps there's scope for applying it to all the run wrappers (ideally via one of their base classes), so they all instantly gain dashed-mode capability. From cjfields at uiuc.edu Mon Dec 11 17:05:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 16:05:56 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457DCB7B.8050500@sendu.me.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> <457DCB7B.8050500@sendu.me.uk> Message-ID: On Dec 11, 2006, at 3:19 PM, Sendu Bala wrote: ... >> >> I respectfully disagree that this is a 'useful' distinction. My main >> point is consistency. > [snip] > > We're on the same page in terms of what we think would be a Good > Thing, > and allowing both ways (dashed and dashless) sounds reasonable. I was > just suggesting why bioperl-run might be the way it was. Further to > that, there is the practical aspect that it is a lot simpler to figure > out which are the program options so they can be farmed out to the > AUTOLOAD methods - again something that isn't done in core. Maybe b/c AUTOLOAD is frowned upon for a number of reasons, mainly code maintenance. I'm somewhat neutral on the idea of using AUTOLOAD as a short-term solution, though using heredoc and an eval{} block works well for me (and shows up when using $self->can('method') or when checking for methods via Class::Inspector). > If you come up with some generic way of dealing with options and > farming > to AUTOLOAD, perhaps there's scope for applying it to all the run > wrappers (ideally via one of their base classes), so they all > instantly > gain dashed-mode capability. I think that's the crux of the problem; they do not all have the same base class (except Bio::Root::Root). Most use WrapperBase. I thought at one point a Run-specific root module would be a good idea, but WrapperBase already works well. I'll go ahead with my modules and think about it some more. You could ask the powers-that-be (jason, hilmar, etc) what they think as well. chris From bosborne11 at verizon.net Mon Dec 11 17:24:54 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 11 Dec 2006 17:24:54 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: Message-ID: Amir, Google "intron phase", you will see a number of useful links. Brian O. On 12/11/06 11:20 AM, "Amir Karger" wrote: > I agree. By the way, I'd love a reference to a simple bio-explanation of > what's happening here. Google searches for "coding sequence phase" are > not all that relevant. From cjfields at uiuc.edu Mon Dec 11 22:20:06 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 21:20:06 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: On Dec 11, 2006, at 10:20 AM, Amir Karger wrote: >> I think the use of 'frame' here is meant relative to the DNA >> sequence (i.e. >> ORF searching, 6 frames) and the 'phase' is relative to the mRNA >> (i.e. >> translation, three frames). At least I think that's what is meant! > > I agree. By the way, I'd love a reference to a simple bio- > explanation of > what's happening here. Google searches for "coding sequence phase" are > not all that relevant. Ah, Brian found some links I see... >> It could be b/c the location coordinates delineate the exon >> coding boundary. >> It's conceivable the first exon in a sequence record is not >> the first exon >> of the mRNA (i.e. there may be one or more exons prior to or >> past the exon >> of interest that are in 'remote' sequence records). > > That's certainly not the case here, because the files have the entire > genomes in them. > >> Also, the ends of the lcoation may be uncertain ('fuzzy'): >> >> join(complement(1009..>1260),complement(AF081827.1:<1..177)) > > Also not the case here. These locations aren't listed as fuzzy. > > Any other thoughts? Which GFF files did you use? More specifically, which genes in which GFF file? I saw a reference to S. bayanus, but it's hard to work out what could be the problem unless we know a bit more. >>> I guess the real question here, which Jason alludes to, is whether >>> SeqFeature->spliced_seq ought to take into account the phase >>> information >>> of the first exon. Right now, it doesn't, so when you call >>> SeqFeature->spliced_seq->translate, you get gibberish. Are >> there cases >>> where you would want spliced_seq to include the first bp or >>> two? Should there be an option to spliced_seq for whether you >>> want to take phase information into account? >> >> You can already pass the frame or an offset to >> PrimarySeqI::translate(). >> We could add a '-phase' argument for >> convenience which accepts 0,1,2. > > But as Jason pointed out, you should find the problem earlier. What > if I > want to get the RNA sequence that will become the protein? then > having a > phase arg to translate() doesn't help. Should there be a phase arg to > spliced_seq? You'll also note Jason mentioned there were possible errors in the gene prediction programs which produced the output spliced_seq() is supposed to return the DNA sequence of a split location by splicing together the sublocation sequences in their 'join' order. So, if the first exon was out of phase, once spliced they should all be out of phase to the same degree, assuming all exons are joined together correctly. Translating this using the phase should produce the correct amino acid sequence. Note that Jason suggested passing the frame/phase of the first exon to translate(), not spliced_seq(). I also suggested translate(). > Which raises another bio question: at what point are the first 1 or > 2 bp > dropped when you have a phase of 1 or 2? Do they appear in the mRNA? > > -Amir Karger Any sequence present in the sublocations (exons) would be in the spliced sequence. This would have to include those nucleotides in exons skipped b/c of the phase since they are part of the coding region. chris From neetisomaiya at gmail.com Tue Dec 12 07:06:20 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:36:20 +0530 Subject: [Bioperl-l] need help in phredPhrap Message-ID: <764978cf0612120406m796b116dncd3a9e6c82ffe682@mail.gmail.com> Hi, I am running phredPharp, which runs phred, phrap and polyphred. Please refer to the "Using a reference sequence" section of this link http://droog.mbt.washington.edu/poly_doc50.html#REFER. I am using the reference sequence as described in the link above. With this I am getting the SNP positions on the contig sequence as well as on the reference sequence. Does anyone know if there is some output file which can also give me mapping between contig sequence and reference sequence? -- -Neeti Even my blood says, B positive From akarger at CGR.Harvard.edu Tue Dec 12 11:05:43 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 12 Dec 2006 11:05:43 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq Message-ID: (sorry if this thread is boring people) Chris Fields wrote: > > I agree. By the way, I'd love a reference to a simple bio- > > explanation of > > what's happening here. Google searches for "coding sequence > phase" are > > not all that relevant. > > Ah, Brian found some links I see... Thanks, Brian! Amazing how "coding sequence phase" finds nothing but "intron phase" finds a ton. This is why you need to actually learn biology, rather than Googling it. > Which GFF files did you use? More specifically, which genes > in which > GFF file? I saw a reference to S. bayanus, but it's hard to > work out > what could be the problem unless we know a bit more. http://fungal.genome.duke.edu/annotations/sbay/gff/saccharomyces_bayanus .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!) c127 (for example) has two lines in that file: sbay_c127 AUGUSTUS mRNA 263 723 . + . ID=sbay_c127-g1.1 sbay_c127 AUGUSTUS CDS 263 723 . + 1 Parent=sbay_c127-g1.1 Now go to gbrowse page: http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/ Type "sbay_c127:250-300" in the search box. As you can see from the translation track, if you start at bp 263, you hit a stop codon after just a few aas. But if you use frame2/phase 1, you get no stop codons all the way to the end of the contig. > >> You can already pass the frame or an offset to > >> PrimarySeqI::translate(). > >> We could add a '-phase' argument for > >> convenience which accepts 0,1,2. > > > > What if I > > want to get the RNA sequence that will become the protein? then > > having a > > phase arg to translate() doesn't help. Should there be a > phase arg to > > spliced_seq? > > You'll also note Jason mentioned there were possible errors in the > gene prediction programs which produced the output That's certainly possible. No gene prediction program will be perfect. In this case, though, it's clear that it found a large region without stop codons in it, and correctly identified the place to start translating. I guess I'm just surprised that, if it found just one exon in a gene (in the whole contig) why it would say the exon starts at 263 with a phase 1, instead of just saying it starts at 264. > spliced_seq() is supposed to return the DNA sequence of a split > location by splicing together the sublocation sequences in their > 'join' order. So, if the first exon was out of phase, once spliced > they should all be out of phase to the same degree, assuming all > exons are joined together correctly. Translating this using the > phase should produce the correct amino acid sequence. > > Note that Jason suggested passing the frame/phase of the first exon > to translate(), not spliced_seq(). I also suggested translate(). You're right. This brings the number of translated polypeptide sequences that have lots of *s in them to 9 instead of 90. I guess I have two requests here. The first is, if a person wants to see exactly which bps are translated to aas -- a nucelotide sequece of exactly 3N bp starting (usually) with ATG -- then they might want an argument to spliced_seq that skips the first one or two bp when necessary. After all, they might want to study the DNA, not the peptides. The second request is for "intelligent objects". If my SeqFeatures know that they're in phase 1, then when I call spliced_seq I want the resulting objects to know that they're phase one, such that when I call translate, Bioperl automatically skips the first bp or two. Admittedly, there might be big ramifications to this. Both requests of course made in the knowledge that Bioperl is open source & developers have a lot to do with their time. -Amir Karger > > Which raises another bio question: at what point are the > first 1 or > > 2 bp > > dropped when you have a phase of 1 or 2? Do they appear in the mRNA? > > > > -Amir Karger > > Any sequence present in the sublocations (exons) would be in the > spliced sequence. This would have to include those nucleotides in > exons skipped b/c of the phase since they are part of the > coding region. > > chris > From neetisomaiya at gmail.com Tue Dec 12 07:14:10 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:44:10 +0530 Subject: [Bioperl-l] needle parser in bioperl? Message-ID: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Hi, Does anyone know of a bioperl parser for needle output, basically I won't where the target sequence aligns on the template (i.e. coordinate on the template where the taget aligns). -- -Neeti Even my blood says, B positive From cjfields at uiuc.edu Tue Dec 12 11:57:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 10:57:27 -0600 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > Hi, > > Does anyone know of a bioperl parser for needle output, basically I > won't > where the target sequence aligns on the template (i.e. coordinate > on the > template where the taget aligns). > > -- > -Neeti > Even my blood says, B positive I answered this a number of months back: http://tinyurl.com/yzlbx5 Basically, newer versions of EMBOSS have changed the output for the AlignIO::emboss parser (which parses needle). I don't believe the parser has been fixed to deal with that, but Jason has pointed out you can use MSF output when running needle, then parse using AlignIO with the format set to 'msf'. chris From bosborne11 at verizon.net Tue Dec 12 11:51:05 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 12 Dec 2006 11:51:05 -0500 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: Neeti, EMBOSS' needle and water produce alignments in what Bioperl calls 'emboss' format, so you can use AlignIO to get SimpleAlign objects. The best description of how to use SimpleAlign is the documentation in the module. Brian O. On 12/12/06 7:14 AM, "neeti somaiya" wrote: > Hi, > > Does anyone know of a bioperl parser for needle output, basically I won't > where the target sequence aligns on the template (i.e. coordinate on the > template where the taget aligns). From kaboroev at sfu.ca Tue Dec 12 12:14:39 2006 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Tue, 12 Dec 2006 09:14:39 -0800 Subject: [Bioperl-l] BLAST reports Message-ID: <457EE37F.4020000@sfu.ca> Hi everyone, I would like to manipulate my blast results with bioperl but would also like to have the html output of the blast. What would be the best way of going about this, as I don't see any write functions in any of the blast modules I have looked at. Would it be better to create my own html layout from the blast data then attempt to recover this from bioperl? keith p.s. - does anyone know what the most informative blast "alignment view" output is? xml i suppose? -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 From cjfields at uiuc.edu Tue Dec 12 13:45:05 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 12:45:05 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: On Dec 12, 2006, at 10:05 AM, Amir Karger wrote: ... > http://fungal.genome.duke.edu/annotations/sbay/gff/ > saccharomyces_bayanus > .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!) > > c127 (for example) has two lines in that file: > sbay_c127 AUGUSTUS mRNA 263 723 . + > . ID=sbay_c127-g1.1 > sbay_c127 AUGUSTUS CDS 263 723 . + > 1 Parent=sbay_c127-g1.1 > > Now go to gbrowse page: > http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/ > Type "sbay_c127:250-300" in the search box. > > As you can see from the translation track, if you start at bp 263, you > hit a stop codon after just a few aas. But if you use frame2/phase 1, > you get no stop codons all the way to the end of the contig. Yes, but there are two things. First, there is no distinct start codon. Second, this is what the top NCBI BLASTX hit for that particular exon is: >gi|6323195|ref|NP_013267.1| Gene info Essential 100kDa subunit of the exocyst complex (Sec3p, Sec5p, Sec6p, Sec8p, Sec10p, Sec15p, Exo70p, and Exo84p), which has the essential function of mediating polarized targeting of secretory vesicles to active sites of exocytosis; Sec10p [Saccharomyces cerevisiae] gi|2498891|sp|Q06245|SEC10_YEAST Gene info Exocyst complex component SEC10 gi|1234854|gb|AAB67490.1| Gene info L9362.12 gene product gi|1781307|emb|CAA70041.1| Gene info 100 kD exocyst complex component [Saccharomyces cerevisiae] Length=871 Score = 285 bits (728), Expect = 7e-77 Identities = 141/152 (92%), Positives = 149/152 (98%), Gaps = 0/152 (0%) Frame = +2 Query 2 FNDFYSMGKSDIVEQLRLSKNWKFNLKSVILMKNLLILSSKLETNSIPKTINTKLIIEKY 181 +NDFYSMGKSDIVEQLRLSKNWK NLKSV LMKNLLILSSKLET+SIPKTINTKL +IEKY Sbjct 168 YNDFYSMGKSDIVEQLRLSKNWKLNLKSVKLMKNLLILSSKLETSSIPKTINTKLVIEKY 227 Query 182 SEMMENKLLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE 361 SEMMEN +LLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE Sbjct 228 SEMMENELLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE 287 Query 362 NEFENVFIKNVKFKERLVDFESHSVIVEASMQ 457 NEFENVFIKNVKFKE+L+DFE+HSVI+E SMQ Sbjct 288 NEFENVFIKNVKFKEQLIDFENHSVIIETSMQ 319 Note the query start is well into the predicted coding sequence. Both the lack of a start codon and the above BLASTX hit suggest this is not actually the first exon in the coding region. Therefore the sequence retrieved from spliced_seq() is only part of the full coding region (it seems to lack at least one 3' exon as well). >>>> You can already pass the frame or an offset to >>>> PrimarySeqI::translate(). >>>> We could add a '-phase' argument for >>>> convenience which accepts 0,1,2. >>> >>> What if I >>> want to get the RNA sequence that will become the protein? then >>> having a >>> phase arg to translate() doesn't help. Should there be a >> phase arg to >>> spliced_seq? >> >> You'll also note Jason mentioned there were possible errors in the >> gene prediction programs which produced the output > > That's certainly possible. No gene prediction program will be perfect. > In this case, though, it's clear that it found a large region without > stop codons in it, and correctly identified the place to start > translating. I guess I'm just surprised that, if it found just one > exon > in a gene (in the whole contig) why it would say the exon starts at > 263 > with a phase 1, instead of just saying it starts at 264. Maybe the gene prediction didn't find the first exon, or didn't tie the predicted exons together. Not unusual considering the number of predictions made. >> spliced_seq() is supposed to return the DNA sequence of a split >> location by splicing together the sublocation sequences in their >> 'join' order. So, if the first exon was out of phase, once spliced >> they should all be out of phase to the same degree, assuming all >> exons are joined together correctly. Translating this using the >> phase should produce the correct amino acid sequence. >> >> Note that Jason suggested passing the frame/phase of the first exon >> to translate(), not spliced_seq(). I also suggested translate(). > > You're right. This brings the number of translated polypeptide > sequences > that have lots of *s in them to 9 instead of 90. > > I guess I have two requests here. The first is, if a person wants > to see > exactly which bps are translated to aas -- a nucelotide sequece of > exactly 3N bp starting (usually) with ATG -- then they might want an > argument to spliced_seq that skips the first one or two bp when > necessary. After all, they might want to study the DNA, not the > peptides. > > The second request is for "intelligent objects". If my SeqFeatures > know > that they're in phase 1, then when I call spliced_seq I want the > resulting objects to know that they're phase one, such that when I > call > translate, Bioperl automatically skips the first bp or two. > Admittedly, > there might be big ramifications to this. > > Both requests of course made in the knowledge that Bioperl is open > source & developers have a lot to do with their time. > > -Amir Karger You may want to post these as enhancement requests to Bugzilla just so we can keep track. I think passing a phase parameter to spliced_seq() can be easily accomplished; it's just a matter of returning a subseq of the spliced sequence based on the phase if set. In fact, I am testing it out now. The second may be more problematic, since there may be a time when one would want those extra nucleotides, so I don't think we would want removal of said nucleotides to be the default behavior. Chris From dmessina at wustl.edu Tue Dec 12 13:44:29 2006 From: dmessina at wustl.edu (David Messina) Date: Tue, 12 Dec 2006 12:44:29 -0600 Subject: [Bioperl-l] BLAST reports In-Reply-To: <457EE37F.4020000@sfu.ca> References: <457EE37F.4020000@sfu.ca> Message-ID: <083B4D17-CC7A-406C-9037-4DA5DC31AA05@wustl.edu> Hi Keith, Take a look at: http://www.bioperl.org/wiki/HOWTO:SearchIO You can read in a whole bunch of different blast formats (see Table 1), and it is possible to write out in HTML. See: http://www.bioperl.org/wiki/HOWTO:SearchIO#Writing_and_formatting_output I'm not sure what you mean by the most informative blast output. If you mean which one gives the most information, I'm pretty sure the standard Blast report has everything. Dave From neetisomaiya at gmail.com Tue Dec 12 07:09:39 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:39:39 +0530 Subject: [Bioperl-l] problem in running needle Message-ID: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> I am trying to run needle for the attached two sequence files, on a linux machine. It says "Uncaught exception: Assertion failed, raised at ajmem.c :187". Can anyone tell me what this could be coz of? -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: SEQ_1.REF Type: application/octet-stream Size: 44208 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: seq_of_contig11 Type: application/octet-stream Size: 44344 bytes Desc: not available URL: From cjfields at uiuc.edu Tue Dec 12 15:55:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 14:55:07 -0600 Subject: [Bioperl-l] problem in running needle In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> Message-ID: On Dec 12, 2006, at 6:09 AM, neeti somaiya wrote: > I am trying to run needle for the attached two sequence files, on a > linux > machine. It says "Uncaught exception: Assertion failed, raised at > ajmem.c > :187". > Can anyone tell me what this could be coz of? > > -- > -Neeti > Even my blood says, B positive > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l This would be an EMBOSS error, not a BioPerl error. Maybe the emboss list is the best place for this question? http://emboss.open-bio.org/mailman/listinfo/emboss Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Dec 12 16:30:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 15:30:30 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: <093AE0FF-3C88-4F97-B33F-836B295E3DE3@uiuc.edu> On Dec 12, 2006, at 10:05 AM, Amir Karger wrote: >> Note that Jason suggested passing the frame/phase of the first exon >> to translate(), not spliced_seq(). I also suggested translate(). > > You're right. This brings the number of translated polypeptide > sequences > that have lots of *s in them to 9 instead of 90. > > I guess I have two requests here. The first is, if a person wants > to see > exactly which bps are translated to aas -- a nucelotide sequece of > exactly 3N bp starting (usually) with ATG -- then they might want an > argument to spliced_seq that skips the first one or two bp when > necessary. After all, they might want to study the DNA, not the > peptides. > > The second request is for "intelligent objects". If my SeqFeatures > know > that they're in phase 1, then when I call spliced_seq I want the > resulting objects to know that they're phase one, such that when I > call > translate, Bioperl automatically skips the first bp or two. > Admittedly, > there might be big ramifications to this. > > Both requests of course made in the knowledge that Bioperl is open > source & developers have a lot to do with their time. > > -Amir Karger ... Amir, I committed some code to CVS where I added a -phase parameter option to SeqFeatureI::spliced_seq(). I also added some tests to SeqFeature.t. If you run the following after creating the SeqFeature object $sf (the seq object is $seq): $sf->attach_seq($seq); for my $phase (-1..3) { my $spliced = $sf->spliced_seq(-phase => $phase); print $spliced->seq,"\n"; print $spliced->translate->seq,"\n"; } You should get warnings for any other value than 0, 1, or 2. I'll also note that the sequence you are having trouble with (sbay_c127) is 712 bp, so it doesn't contain the complete coding region. I used it in the test case in SeqFeature.t. Chris From boris.steipe at utoronto.ca Tue Dec 12 16:26:14 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Tue, 12 Dec 2006 16:26:14 -0500 Subject: [Bioperl-l] problem in running needle In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> Message-ID: Looks like a memory allocation problem. Your whole sequence is in one single line, throwing a few linebreaks in there every 80th character or so will probably do the trick. HTH Boris On 12-Dec-06, at 7:09 AM, neeti somaiya wrote: > I am trying to run needle for the attached two sequence files, on a > linux > machine. It says "Uncaught exception: Assertion failed, raised at > ajmem.c > :187". > Can anyone tell me what this could be coz of? > > -- > -Neeti > Even my blood says, B positive > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Derek.Fairley at bll.n-i.nhs.uk Wed Dec 13 05:00:16 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Wed, 13 Dec 2006 10:00:16 -0000 Subject: [Bioperl-l] BLAST reports In-Reply-To: <457EE37F.4020000@sfu.ca> Message-ID: Hi Keith, >I would like to manipulate my blast results with bioperl but would also >like to have the html output of the blast. What would be the best way >of going about this, as I don't see any write functions in any of the >blast modules I have looked at. Would it be better to create my own >html layout from the blast data then attempt to recover this from bioperl? Take a look at some of the example scripts here: http://www.bioperl.org/wiki/Bioperl_scripts Depending on your Bioperl installation, you may already have these in your /scripts directory or similar. The /examples/searchio/htmlwriter.pl script may be a good starting point. >p.s. - does anyone know what the most informative blast "alignment view" >output is? xml i suppose? Assuming you want to get the HSPs, parsing blastxml reports seems to be the most reliable approach. Again, there's a useful script for this: take a look at /scripts/utilities/search2alnblocks.pls. Derek. -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Dec 13 13:02:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 13 Dec 2006 12:02:14 -0600 Subject: [Bioperl-l] Proposal for Meta data Message-ID: I am working on a few RNA-related things related to structure and have a few questions, specifically about Meta data. This is sort of a proposal, but I would like to get everybody's thoughts about this to gauge what everyone thinks. Jason, sorry to bug you but I thought it might be something that would be of use phylohackathon-wise. Heikki has several modules present which adds meta data to sequences (Bio::Seq::Meta). In this case, the meta data is stored as a string (Bio::Seq::Meta) or an array (Bio::Seq::Meta::Array). In both cases you can have multiple types of meta data for a sequence based on a particular tag. However, this also assumes that the meta data is somehow attached strictly to sequence data of some type. It also doesn't allow for having mixed meta data types for a single sequence, such as attaching array data and string data to the same sequence. Hence, I was thinking of a having a simple, generic meta data type (Bio::Meta), one which could encompass simple strings (Bio::Meta::Simple), arrays (Bio::Meta::Array), or any other structured type of data. This could be used to annotate any PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, maybe in a collection (similar to AnnotationCollection). I thought something like this may be of general use for any PrimarySeq (quality, structure), alignments like NEXUS and Stockholm, SeqFeatures where structure could be stored (tRNA or riboswitches), etc. However, this also seems to fall into the category of sequence annotation. So, would it be better to have a set of Bio::Annotation classes used for this purpose? Flames and jibes welcome; I'm wearing my asbestos suit today.... chris From stewarta at nmrc.navy.mil Wed Dec 13 20:06:14 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Wed, 13 Dec 2006 20:06:14 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects Message-ID: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> I am trying to StandAloneBlast->blastall an array or Bio::Seq objects. The documentation claims that blastall can be passed a file name, a Bio::Seq object, or an array of Bio::Seq objects, while the usage suggests that a reference to an array of Bio::Seq objects is what must be passed to blastall. (from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ Bio/Tools/Run/StandAloneBlast.html#POD5) Usage: $seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects $blast_report = $factory->blastall(\@seq_array); Should this be... $report = $factory->blastall(@seq_array); or $report = $factory->blastall(\@seq_array); ??? And if you are blastall'ing an array of Seq objects, then does blastall just return one big blast report or should I be expecting an array of blast reports? I've tried $report = $factory->blastall(@seq_array); which seems to work ok, except that when I process the results, there are only results for the first Seq object in the array. -Andrew -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From arareko at campus.iztacala.unam.mx Wed Dec 13 20:37:27 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 13 Dec 2006 19:37:27 -0600 Subject: [Bioperl-l] BioPerl page in Wikipedia Message-ID: <4580AAD7.3000900@campus.iztacala.unam.mx> Folks, I've updated a little bit of the BioPerl page in the Wikipedia. I think it would be nice if we expand the article a little bit more since it's tagged as a "stub". Here's the link: http://en.wikipedia.org/wiki/BioPerl Cheers, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From lubapardo at gmail.com Thu Dec 14 05:54:07 2006 From: lubapardo at gmail.com (Luba Pardo) Date: Thu, 14 Dec 2006 11:54:07 +0100 Subject: [Bioperl-l] (no subject) Message-ID: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> Hello, I am new bioperl and I have been trying to run the examples available in bptutorial.pl and other basic literature. I have installed the latest release of bioperl 1.5.2 in a usr/local/src directory. Any time I try to retrieve the SwissProt and EMBL databases it gives me an error. With genbank it seems to be fine. I wonder if the installation was not successful, as I would expect that these databases accesses were included in the modules of BioPerl Core. In addition, I would like to ask whether to run Clustaw within the setting of BioPerl I need to download and install it in the same directory in which I have installed bioperl, or is it included in the module of Bio::Align. I am not sure whether this is the best place to ask these very basic questions. If not, could anyone please refer me to the proper e mail account? Thank you very much in advance. Luba Pardo MD, PhD From bix at sendu.me.uk Thu Dec 14 09:10:43 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 09:10:43 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> Message-ID: <45815B63.1020003@sendu.me.uk> Andrew Stewart wrote: > I am trying to StandAloneBlast->blastall an array or Bio::Seq > objects. The documentation claims that blastall can be passed a file > name, You're referring to 'In addition, sequence input may be in the form of either a Bio::Seq object or or an array of Bio::Seq objects'? I agree its not clear, but supplying a reference to an array is still supplying an array. Anyway, I'll clarify it. In any case, the usage for the method is what you should pay attention to: > Usage: > $seq_array_ref = \@seq_array; # where @seq_array is an array of > Bio::Seq objects > $blast_report = $factory->blastall(\@seq_array); > > Should this be... > $report = $factory->blastall(@seq_array); > or > $report = $factory->blastall(\@seq_array); > ??? It should be exactly what it says. A reference to the array. > And if you are blastall'ing an array of Seq objects, then does > blastall just return one big blast report or should I be expecting an > array of blast reports? Returns : Reference to a Blast object or BPlite object containing the blast report. That means, just one big object, not an array. From bix at sendu.me.uk Thu Dec 14 09:42:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 09:42:18 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> References: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> Message-ID: <458162CA.5030803@sendu.me.uk> Luba Pardo wrote: > Hello, I am new bioperl and I have been trying to run the examples > available in bptutorial.pl and other basic literature. I have > installed the latest release of bioperl 1.5.2 in a usr/local/src > directory. Any time I try to retrieve the SwissProt and EMBL > databases it gives me an error. What exactly are you trying? Paste some relevant code along with the exact error message you get when running that code. > I wonder if the installation was not successful, as I would expect > that these databases accesses were included in the modules of BioPerl > Core. They should work with just core installed. In addition, I would like to ask whether to run Clustaw within > the setting of BioPerl I need to download and install it in the same > directory in which I have installed bioperl, or is it included in the > module of Bio::Align. The ClustalW module is in the bioperl-run package, so install that in the same way you installed bioperl (core). The actual ClustalW program you need to download and install according to its own instructions. You let Bioperl know about where you installed ClustalW by eg. setting an environment variable. See http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/Alignment/Clustalw.html#DESCRIPTION for details. > I am not sure whether this is the best place to ask these very basic > questions. If not, could anyone please refer me to the proper e mail > account? Its certainly the correct place, I hope we can resolve your problems. From neetisomaiya at gmail.com Thu Dec 14 03:02:37 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Thu, 14 Dec 2006 13:32:37 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle). I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: 3.out Type: application/octet-stream Size: 204960 bytes Desc: not available URL: From stewarta at nmrc.navy.mil Thu Dec 14 11:34:43 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 11:34:43 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <45815B63.1020003@sendu.me.uk> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> Message-ID: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Thanks for the reply, Sendu. So I've tried passing a reference to an array of Seq objects with the following code... push @blast_run, $factory->blastall(\@query); # where @query is an array of Bio::Seq objects (In case you're wondering, I'm pushing the report into an array of reports because I'm running several instances of blastall with different parameters each time.) ....and it throws me the following exception... ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ Bio/Tools/Run/StandAloneBlast.pm:557 STACK: main::run_blastall ./new_blast_script.pl:215 STACK: ./new_blast_script.pl:115 ----------------------------------------------------------- And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm returns... 757 my $status = system($commandstring); 758 759 $self->throw("$executable call crashed: $? $commandstring \n") 760 unless ($status==0) ; So it looks like the system call isn't returning a happy $status. At this point I'm pretty much stuck, though. Blastall works just fine if I only send it a single Seq object. Looking at _setinput, it appears a reference to an array of Seq objects should end up creating a multi-fasta file. The only possibilities I can think of to explain this is... - The -i file isn't be created for some reason when an (ref to) array of Seqs is passed - There is something wrong with the -i file that is created and sent to blastall. - Something else is wrong with the $commandstring being sent to the system call. Does anyone see something here that I don't? Thanks, Andrew On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote: > Andrew Stewart wrote: >> I am trying to StandAloneBlast->blastall an array or Bio::Seq >> objects. The documentation claims that blastall can be passed a >> file name, > > You're referring to 'In addition, sequence input may be in the form > of either a Bio::Seq object or or an array of Bio::Seq objects'? I > agree its not clear, but supplying a reference to an array is still > supplying an array. Anyway, I'll clarify it. > > > In any case, the usage for the method is what you should pay > attention to: > >> Usage: >> $seq_array_ref = \@seq_array; # where @seq_array is an array of >> Bio::Seq objects >> $blast_report = $factory->blastall(\@seq_array); >> Should this be... >> $report = $factory->blastall(@seq_array); >> or >> $report = $factory->blastall(\@seq_array); >> ??? > > It should be exactly what it says. A reference to the array. > > >> And if you are blastall'ing an array of Seq objects, then does >> blastall just return one big blast report or should I be expecting >> an array of blast reports? > > Returns : Reference to a Blast object or BPlite object > containing the blast report. > > That means, just one big object, not an array. -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From cjfields at uiuc.edu Thu Dec 14 12:03:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 11:03:12 -0600 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Message-ID: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote: > Thanks for the reply, Sendu. > > So I've tried passing a reference to an array of Seq objects with the > following code... > > push @blast_run, $factory->blastall(\@query); # where @query is an > array of Bio::Seq objects > > (In case you're wondering, I'm pushing the report into an array of > reports because I'm running several instances of blastall with > different parameters each time.) > > ....and it throws me the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ > common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E > > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 > STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 > STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ > Bio/Tools/Run/StandAloneBlast.pm:557 > STACK: main::run_blastall ./new_blast_script.pl:215 > STACK: ./new_blast_script.pl:115 > ----------------------------------------------------------- > > And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm > returns... > 757 my $status = system($commandstring); > 758 > 759 $self->throw("$executable call crashed: $? $commandstring > \n") > 760 unless ($status==0) ; > > So it looks like the system call isn't returning a happy $status. At > this point I'm pretty much stuck, though. Blastall works just fine > if I only send it a single Seq object. Looking at _setinput, it > appears a reference to an array of Seq objects should end up creating > a multi-fasta file. The only possibilities I can think of to explain > this is... > > - The -i file isn't be created for some reason when an (ref to) array > of Seqs is passed > - There is something wrong with the -i file that is created and sent > to blastall. > - Something else is wrong with the $commandstring being sent to the > system call. > > Does anyone see something here that I don't? The error pops up when the executable returns a bad status, so maybe it's choking on too many input sequences (i.e. Bioperl is doing everything correctly, but you are attempting to BLAST too many sequences in one go). How many sequences are you attempting to use as input? What happens when you use fewer input sequences? chris From stewarta at nmrc.navy.mil Thu Dec 14 12:49:45 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 12:49:45 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> Message-ID: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> > So can you look at the tempfile that is created and see if it is sane? > > Set -save_tempfiles => 1 whene you initialize the factory object or do > $factory->save_tempfiles(1) > before calling the blastall. > > -jason > Jason, I was actually wondering how to do that. Thanks. Odd though, it still doesn't seem to be saving the tempfiles. Might not matter though, because... > The error pops up when the executable returns a bad status, so > maybe it's choking on too many input sequences (i.e. Bioperl is > doing everything correctly, but you are attempting to BLAST too > many sequences in one go). How many sequences are you attempting > to use as input? What happens when you use fewer input sequences? > > chris > I was processing 738 sequences for input. I cut that down to 20 sequences and I'm getting some other exception thrown further downstream, so it appears you may be correct. You don't happen to know what the max number of sequences that blastall allows for input, would ya? ;) I suppose I'll have to break @query down into smaller doses or something. Thanks, Andrew On Dec 14, 2006, at 12:03 PM, Chris Fields wrote: > > On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote: > >> Thanks for the reply, Sendu. >> >> So I've tried passing a reference to an array of Seq objects with the >> following code... >> >> push @blast_run, $factory->blastall(\@query); # where @query is an >> array of Bio::Seq objects >> >> (In case you're wondering, I'm pushing the report into an array of >> reports because I'm running several instances of blastall with >> different parameters each time.) >> >> ....and it throws me the following exception... >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: blastall call crashed: 11 /common/bin/blastall -p blastp - >> d "/ >> common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: >> 328 >> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ >> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 >> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/ >> lib/ >> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 >> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/ >> perl5/5.8.6/ >> Bio/Tools/Run/StandAloneBlast.pm:557 >> STACK: main::run_blastall ./new_blast_script.pl:215 >> STACK: ./new_blast_script.pl:115 >> ----------------------------------------------------------- >> >> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm >> returns... >> 757 my $status = system($commandstring); >> 758 >> 759 $self->throw("$executable call crashed: $? $commandstring >> \n") >> 760 unless ($status==0) ; >> >> So it looks like the system call isn't returning a happy $status. At >> this point I'm pretty much stuck, though. Blastall works just fine >> if I only send it a single Seq object. Looking at _setinput, it >> appears a reference to an array of Seq objects should end up creating >> a multi-fasta file. The only possibilities I can think of to explain >> this is... >> >> - The -i file isn't be created for some reason when an (ref to) array >> of Seqs is passed >> - There is something wrong with the -i file that is created and sent >> to blastall. >> - Something else is wrong with the $commandstring being sent to the >> system call. >> >> Does anyone see something here that I don't? > > The error pops up when the executable returns a bad status, so > maybe it's choking on too many input sequences (i.e. Bioperl is > doing everything correctly, but you are attempting to BLAST too > many sequences in one go). How many sequences are you attempting > to use as input? What happens when you use fewer input sequences? > > chris > -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From Derek.Fairley at bll.n-i.nhs.uk Thu Dec 14 12:58:10 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Thu, 14 Dec 2006 17:58:10 -0000 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> Message-ID: Neeti, >From http://emboss.sourceforge.net/apps/cvs/needle.html: "The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats." Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format. HTH, Derek. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya Sent: 14 December 2006 08:03 To: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle). I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > -- -Neeti Even my blood says, B positive From cjfields at uiuc.edu Thu Dec 14 13:36:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 12:36:09 -0600 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> Message-ID: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote: >> So can you look at the tempfile that is created and see if it is >> sane? >> >> Set -save_tempfiles => 1 whene you initialize the factory object >> or do >> $factory->save_tempfiles(1) >> before calling the blastall. >> >> -jason >> > > Jason, > I was actually wondering how to do that. Thanks. Odd though, it > still doesn't seem to be saving the tempfiles. Might not matter That needs to be checked out. Can anyone verify that? >> The error pops up when the executable returns a bad status, so >> maybe it's choking on too many input sequences (i.e. Bioperl is >> doing everything correctly, but you are attempting to BLAST too >> many sequences in one go). How many sequences are you attempting >> to use as input? What happens when you use fewer input sequences? >> >> chris >> > > I was processing 738 sequences for input. I cut that down to 20 > sequences and I'm getting some other exception thrown further > downstream, so it appears you may be correct. You don't happen to > know what the max number of sequences that blastall allows for input, > would ya? ;) I suppose I'll have to break @query down into smaller > doses or something. > > Thanks, > Andrew It was a shot in the dark, really. The fact that the return status was bad could be due to a number of problems (permissions issues, bad data, etc). The fact that a single sequence worked indicated that permissions and output format likely weren't to blame. The only other thing left was a problem with blastall itself. BTW, the blast docs do not indicate whether there is a maximum number of sequences. There may be a point where available memory becomes the limiting issue. chris From vaughn at cshl.edu Thu Dec 14 14:09:34 2006 From: vaughn at cshl.edu (Matthew Vaughn) Date: Thu, 14 Dec 2006 14:09:34 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking Message-ID: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> Dear all, I'm trying to bring some of my code into compliance with the BioPerl 1.5.2 and am running into some design decisions that I am unclear on. Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of the 'type' against SOFA? It seems to me that this should be optional behavior as is the case with the Bio::FeatureIO family. I'd be happy to write the patch if there is any agreement with me on this case. Thanks, Matt -- Matthew W. Vaughn, Ph.D. Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 phone: (516) 367-8469 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2413 bytes Desc: not available URL: From jason at bioperl.org Thu Dec 14 11:59:20 2006 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Dec 2006 11:59:20 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Message-ID: <640E2BB7-33F3-44C9-B903-9DDA54F02D12@bioperl.org> So can you look at the tempfile that is created and see if it is sane? Set -save_tempfiles => 1 whene you initialize the factory object or do $factory->save_tempfiles(1) before calling the blastall. -jason On Dec 14, 2006, at 11:34 AM, Andrew Stewart wrote: > Thanks for the reply, Sendu. > > So I've tried passing a reference to an array of Seq objects with the > following code... > > push @blast_run, $factory->blastall(\@query); # where @query is an > array of Bio::Seq objects > > (In case you're wondering, I'm pushing the report into an array of > reports because I'm running several instances of blastall with > different parameters each time.) > > ....and it throws me the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ > common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E > > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 > STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 > STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ > Bio/Tools/Run/StandAloneBlast.pm:557 > STACK: main::run_blastall ./new_blast_script.pl:215 > STACK: ./new_blast_script.pl:115 > ----------------------------------------------------------- > > And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm > returns... > 757 my $status = system($commandstring); > 758 > 759 $self->throw("$executable call crashed: $? $commandstring > \n") > 760 unless ($status==0) ; > > So it looks like the system call isn't returning a happy $status. At > this point I'm pretty much stuck, though. Blastall works just fine > if I only send it a single Seq object. Looking at _setinput, it > appears a reference to an array of Seq objects should end up creating > a multi-fasta file. The only possibilities I can think of to explain > this is... > > - The -i file isn't be created for some reason when an (ref to) array > of Seqs is passed > - There is something wrong with the -i file that is created and sent > to blastall. > - Something else is wrong with the $commandstring being sent to the > system call. > > Does anyone see something here that I don't? > > > Thanks, > Andrew > > > > On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote: > >> Andrew Stewart wrote: >>> I am trying to StandAloneBlast->blastall an array or Bio::Seq >>> objects. The documentation claims that blastall can be passed a >>> file name, >> >> You're referring to 'In addition, sequence input may be in the form >> of either a Bio::Seq object or or an array of Bio::Seq objects'? I >> agree its not clear, but supplying a reference to an array is still >> supplying an array. Anyway, I'll clarify it. >> >> >> In any case, the usage for the method is what you should pay >> attention to: >> >>> Usage: >>> $seq_array_ref = \@seq_array; # where @seq_array is an array of >>> Bio::Seq objects >>> $blast_report = $factory->blastall(\@seq_array); >>> Should this be... >>> $report = $factory->blastall(@seq_array); >>> or >>> $report = $factory->blastall(\@seq_array); >>> ??? >> >> It should be exactly what it says. A reference to the array. >> >> >>> And if you are blastall'ing an array of Seq objects, then does >>> blastall just return one big blast report or should I be expecting >>> an array of blast reports? >> >> Returns : Reference to a Blast object or BPlite object >> containing the blast report. >> >> That means, just one big object, not an array. > > > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From stewarta at nmrc.navy.mil Thu Dec 14 16:23:07 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 16:23:07 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> Message-ID: > It was a shot in the dark, really. The fact that the return status > was bad could be due to a number of problems (permissions issues, > bad data, etc). The fact that a single sequence worked indicated > that permissions and output format likely weren't to blame. The > only other thing left was a problem with blastall itself. > > BTW, the blast docs do not indicate whether there is a maximum > number of sequences. There may be a point where available memory > becomes the limiting issue. > > chris Interesting. I ran the 738-sequence dataset through blastall manually and the report only returned 198 of the 738 expected results. Not only that, it seems to have just cut off right in the middle of the 198th result and a Segmentation fault was reported. I removed the 198th sequence, wondering if it might be some issue with the input, and the segmentation fault occured again with the results ending on the 210th result. I stuck the 198th sequence back in, but at the start of the file and sure enough the Segmentation error occurred earlier. I think we can rule out the size of the input or number of sequences as the source of error here. I'm more inclined to think it has something to do with the blast databases being queried against. I found an old discussion on a problem that sounds fairly similar to this one, for anyone interested. http://bioinformatics.org/pipermail/bioclusters/2004-June/001742.html I think I'll try to work around the problem for now. andrew On Dec 14, 2006, at 1:36 PM, Chris Fields wrote: > > On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote: > >>> So can you look at the tempfile that is created and see if it is >>> sane? >>> >>> Set -save_tempfiles => 1 whene you initialize the factory object >>> or do >>> $factory->save_tempfiles(1) >>> before calling the blastall. >>> >>> -jason >>> >> >> Jason, >> I was actually wondering how to do that. Thanks. Odd though, it >> still doesn't seem to be saving the tempfiles. Might not matter > > That needs to be checked out. Can anyone verify that? > >>> The error pops up when the executable returns a bad status, so >>> maybe it's choking on too many input sequences (i.e. Bioperl is >>> doing everything correctly, but you are attempting to BLAST too >>> many sequences in one go). How many sequences are you attempting >>> to use as input? What happens when you use fewer input sequences? >>> >>> chris >>> >> >> I was processing 738 sequences for input. I cut that down to 20 >> sequences and I'm getting some other exception thrown further >> downstream, so it appears you may be correct. You don't happen to >> know what the max number of sequences that blastall allows for input, >> would ya? ;) I suppose I'll have to break @query down into smaller >> doses or something. >> >> Thanks, >> Andrew > > It was a shot in the dark, really. The fact that the return status > was bad could be due to a number of problems (permissions issues, > bad data, etc). The fact that a single sequence worked indicated > that permissions and output format likely weren't to blame. The > only other thing left was a problem with blastall itself. > > BTW, the blast docs do not indicate whether there is a maximum > number of sequences. There may be a point where available memory > becomes the limiting issue. > > chris -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From lincoln.stein at gmail.com Thu Dec 14 15:24:56 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 14 Dec 2006 15:24:56 -0500 Subject: [Bioperl-l] Bio::Graphics xyplot In-Reply-To: <4578951B.5050206@sfu.ca> References: <4578951B.5050206@sfu.ca> Message-ID: <6dce9a0b0612141224r1ef7cce2s6e6123461c3827d8@mail.gmail.com> Hi, The way it works is that you create a single feature that spans the entire range of the xyplot. It contains subfeatures, each of which has a score. The graph points correspond to each of the subfeatures. Lincoln On 12/7/06, Keith Anthony Boroevich wrote: > > Hi everyone, > > I'm attempting to add an xyplot of the phred quality scores to an > Bio::Graphics image, and cannot get it to work. > I have the panel with a track for both the scale and the DNA displaying > properly. When I attempt to add the xyplot i just get a garbled track > of, what looks like, timy xyplots for each datapoint. I have the cvs > (updated today) of bioperl-live running. I think what I am missing is > the creation of a "Sequence Feature Group" to hold the individual points > of the plot. However, I cannot seem to find such an object. This is > what I attempted: > > -------BEGIN---CODE----------- > # start panel > my $panel = Bio::Graphics::Panel->new(-length => $f_seqlen, > -width => $f_seqlen*10, > -pad_left => 10, > -pad_right => 10, > -grid => 1 > ); > # add scale > $panel->add_track(arrow => > Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen), > -double => 1, > -tick => 2, > -fgcolor => 'black'); > # add DNA ($feature is of type Bio::SeqFeature::Annotated) > $panel->add_track(dna => $feature); > # get list of quality scores from database > my ($pqs_value) = $dbh->selectrow_array($sql); > my @pqs_value = split(/\s/,$pqs_value); > # create track > my $track = $panel->add_track(-glyph => 'xyplot', > -graph_type => 'points', > -point_symbol => 'point', > -max_score => 100, > -min_score => 0, > -scale => 'none'); > # add "subfeatures" to > for (my $i=0;$i<$f_seqlen;$i++) { > > > $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i])); > > } > print $panel->png(); > $panel->finished; > ------END---CODE---------- > > I also attempted to create an array of the point features and passed > that by reference to the panel "add_track" as it describes in the xyplot > documentation, but that resulted in the exact same image. > > keith > > -- > ><)))?> -cGRASP- < > Keith Anthony Boroevich > Davidson Lab > Dept of Molecular Biology > Simon Fraser University > Tel: 604-268-7276 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bix at sendu.me.uk Thu Dec 14 17:15:07 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 17:15:07 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> Message-ID: <4581CCEB.20206@sendu.me.uk> Matthew Vaughn wrote: > Dear all, > > I'm trying to bring some of my code into compliance with the BioPerl > 1.5.2 and am running into some design decisions that I am unclear on. > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of > the 'type' against SOFA? It seems to me that this should be optional > behavior as is the case with the Bio::FeatureIO family. I'd be happy to > write the patch if there is any agreement with me on this case. Lots of people seem to have worked on it over the years, but perhaps Scott Cain is the person to talk to? revision 1.4 date: 2004/09/25 11:41:29; author: scain; state: Exp; lines: +1 -1 two things: * adding SOFA as an available ontology to DocumentRegistry.pm * modifying FeatureIO::gff to use SOFA to validate, and to parse Ontology_term From lincoln.stein at gmail.com Thu Dec 14 16:56:41 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 14 Dec 2006 16:56:41 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: References: Message-ID: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> Hi All, I'm afraid that the xyplot glyph that is in the recent bioperl release has an error that causes the content to be printed to the right of the correct position. Unfortunately this wasn't caught before the release because the glyph was only tested on very large (whole genome) features. You will need to do a CVS update to get a fixed version from bioperl-live. A future bugfix release of gbrowse will patch this glyph for you automatically. Lincoln On 12/12/06, Kara Dolinski wrote: > > Hi, > I'm having a problem getting features and an xyplot properly aligned in > Gbrowse. For example, see this page: > > http://tinyurl.com/ylbq3q > > The feature in the "CENPK SNPs" track should actually be around the peak > of the graph in the "CENPK prediction signal" xyplot ie. the SNP feature > is at position 79, and the xyplot axes and data should span from 61 - 95. > However, as you can see, the data in the xyplot are oddly separated from > the axes (which seem to be in the correct place), with the data shifted over > to about position 120-155. > This occurs elsewhere, not just at the ends of the chromosomes. > > When I zoom to ~80 bp, all is well, see: > > http://tinyurl.com/yzav8k > > The relevant snippets from the GFF and the config files are below. > > Thanks! > Kara > > GFF: > > chrI SNPScanner CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score > is 2.24506 > chrI SNPScanner CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score > is 3.26837 > chrI SNPScanner CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score > is 1.39938 > chrI SNPScanner CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score > is 1.4039 > chrI SNPScanner CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score > is 9.16134 > chrI SNPScanner CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score > is 10.1413 > chrI SNPScanner CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score > is 12.9256 > chrI SNPScanner CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score > is 13.195 > chrI SNPScanner CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score > is 22.7127 > chrI SNPScanner CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score > is 23.8289 > chrI SNPScanner CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score > is 21.9123 > chrI SNPScanner CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score > is 28.3344 > chrI SNPScanner CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score > is 35.0436 > chrI SNPScanner CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score > is 37.361 > chrI SNPScanner CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score > is 39.5408 > chrI SNPScanner CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score > is 28.2008 > chrI SNPScanner CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score > is 32.6254 > chrI SNPScanner CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score > is 36.0832 > chrI SNPScanner CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score > is 32.1205 > chrI SNPScanner CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score > is 41.3048 > chrI SNPScanner CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score > is 30.7975 > chrI SNPScanner CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score > is 29.4282 > chrI SNPScanner CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score > is 35.3586 > chrI SNPScanner CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score > is 34.1426 > chrI SNPScanner CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score > is 30.2966 > chrI SNPScanner CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score > is 17.8402 > chrI SNPScanner CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score > is 15.2637 > chrI SNPScanner CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score > is 12.657 > chrI SNPScanner CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score > is 10.2033 > chrI SNPScanner CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score > is 9.40143 > chrI SNPScanner CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score > is 6.56273 > chrI SNPScanner CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score > is 3.66211 > chrI SNPScanner CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score > is 0.394194 > > CONFIG: > > > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} > > [CENPK_all_scores_graph] > feature = GRAPH_CENPK:SNPScanner > glyph = xyplot > graph_type = boxes > fgcolor = purple > bgcolor = purple > height = 100 > min_score = 0 > max_score = 110 > label = 0 > key = CENPK prediction signal > link = > category = SNPs: signal graphs > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dmessina at wustl.edu Thu Dec 14 20:45:24 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 19:45:24 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: References: Message-ID: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> Hey Chris, My thoughts below. > [Chris] > This could be used to annotate any > PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, > maybe in a collection (similar to AnnotationCollection). I thought > something like this may be of general use for any PrimarySeq > (quality, structure), alignments like NEXUS and Stockholm, > SeqFeatures where structure could be stored (tRNA or riboswitches), > etc. > > However, this also seems to fall into the category of sequence > annotation. So, would it be better to have a set of Bio::Annotation > classes used for this purpose? To me, all meta data is equal. That is, your classic Genbank feature annotation and a user's arbitrary meta-tag like "Bob thinks this is a kinase domain" aren't different in kind even if they are different in content. As resequencing projects multiply, the ability to create arbitrary meta tags, attach them to different types of objects, and use those tags to link them together will become desirable, if not essential. Keeping a common interface to all of these meta data types would be advantageous, plus new users won't have to determine whether they need to use Bio::Meta objects or Bio::Annotation objects. So I would argue for all of the meta data types to live "under one roof". Which roof isn't as important. Bio::Annotation, since it already exists for today's meta data, seems like a reasonable choice. (assuming Annotation objects are flexible enough to be extended as you propose) There, and no flames or jibes even. :) Dave From cjfields at uiuc.edu Thu Dec 14 21:21:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 20:21:10 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> Message-ID: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> On Dec 14, 2006, at 7:45 PM, David Messina wrote: > Hey Chris, > > My thoughts below. > >> [Chris] >> This could be used to annotate any >> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, >> maybe in a collection (similar to AnnotationCollection). I thought >> something like this may be of general use for any PrimarySeq >> (quality, structure), alignments like NEXUS and Stockholm, >> SeqFeatures where structure could be stored (tRNA or riboswitches), >> etc. >> >> However, this also seems to fall into the category of sequence >> annotation. So, would it be better to have a set of Bio::Annotation >> classes used for this purpose? > > > To me, all meta data is equal. That is, your classic Genbank feature > annotation and a user's arbitrary meta-tag like "Bob thinks this is a > kinase domain" aren't different in kind even if they are different in > content. > > As resequencing projects multiply, the ability to create arbitrary > meta tags, attach them to different types of objects, and use those > tags to link them together will become desirable, if not essential. > > Keeping a common interface to all of these meta data types would be > advantageous, plus new users won't have to determine whether they > need to use Bio::Meta objects or Bio::Annotation objects. > > So I would argue for all of the meta data types to live "under one > roof". Which roof isn't as important. Bio::Annotation, since it > already exists for today's meta data, seems like a reasonable choice. > (assuming Annotation objects are flexible enough to be extended as > you propose) > > There, and no flames or jibes even. :) I guess what I want to know is whether there should to be a distinction between 'normal' sequence annotation (comments, references, and so on) and annotation that could be best described as position-specific (like RNA or protein structural annotation). The current meta implementation is for sequence data only; I felt it would be nice to have a generic implementation that would be applicable to any object data. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dmessina at wustl.edu Thu Dec 14 21:46:27 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 20:46:27 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> Message-ID: <9C72012A-EFD7-42DD-93F8-578251CFDE01@wustl.edu> And it all seemed so clear to me when I wrote it. :) > whether there should to be a distinction I would argue no because it would contravene a s > a generic implementation that would be applicable to any object data. I wholeheartedly agree that this is the way to go. A generic implementation would allow arbitrary object data while maintaining a standard interface. From dmessina at wustl.edu Thu Dec 14 21:46:27 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 20:46:27 -0600 Subject: [Bioperl-l] Proposal for Meta data Message-ID: [oops, accidentally hit send midsentence] And it all seemed so clear to me when I wrote it. :) > whether there should to be a distinction I would argue no because it would contravene a standard interface. > a generic implementation that would be applicable to any object data. I wholeheartedly agree that this is the way to go. A generic implementation would allow arbitrary object data while maintaining a standard interface. Dave From neetisomaiya at gmail.com Fri Dec 15 00:21:42 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 15 Dec 2006 10:51:42 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> Message-ID: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Hi, Thanks a lot for your response. I ran needle like this /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out It gave me the output in format msf. But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO. Please help. Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence. ~Neeti. On 12/14/06, Fairley, Derek wrote: > > Neeti, > > > > From http://emboss.sourceforge.net/apps/cvs/needle.html: > > > > "The results can be output in one of several styles by using the > command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of > the required format. Some of the alignment formats can cope with an > unlimited number of sequences, while others are only for pairs of sequences. > > > > > The available multiple alignment format names are: unknown, multiple, > simple, fasta, msf, trace, srs > > > > The available pairwise alignment format names are: pair, markx0, markx1, > markx2, markx3, markx10, srspair, score > > > > See: http://emboss.sf.net/docs/themes/AlignFormats.html for further > information on alignment formats." > > > > Not sure based on this whether you can get pairwise alignment in .msf > format; can't think of a good reason why not. The BioPerl Align::IO module > will allow you to parse alignments in .msf format. > > > > HTH, > > > > Derek. > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto: > bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya > Sent: 14 December 2006 08:03 > To: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > > > How do I run needle specifying that I want the MSF format, on a linux box? > > The help doesnt show me any format option. Is there anything available to > > pasre MSF format? > > Please find an example alignment file attached. Here the seq_of_contig > > aligns with the reference sequence (i.e. SEQ_1.REF) starting at position > > (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the > > output alignment, how can I parse the result to get this? > > > > On 12/12/06, Chris Fields wrote: > > > > > > > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > > > > > Hi, > > > > > > > > Does anyone know of a bioperl parser for needle output, basically I > > > > won't > > > > where the target sequence aligns on the template (i.e. coordinate > > > > on the > > > > template where the taget aligns). > > > > > > > > -- > > > > -Neeti > > > > Even my blood says, B positive > > > > > > I answered this a number of months back: > > > > > > http://tinyurl.com/yzlbx5 > > > > > > Basically, newer versions of EMBOSS have changed the output for the > > > AlignIO::emboss parser (which parses needle). I don't believe the > > > parser has been fixed to deal with that, but Jason has pointed out > > > you can use MSF output when running needle, then parse using AlignIO > > > with the format set to 'msf'. > > > > > > chris > > > > > > > > > > > -- > > -Neeti > > Even my blood says, B positive > -- -Neeti Even my blood says, B positive From Derek.Fairley at bll.n-i.nhs.uk Fri Dec 15 04:57:35 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Fri, 15 Dec 2006 09:57:35 -0000 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Message-ID: Neeti, In lieu of a response from a BioPerl guru... why not use Needle to generate your pairwise alignment in fasta format, rather than msf format? The sequence you want should correspond to a single HSP which you can get directly from the fasta alignment with Bio::SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use Bio::AlignIO at all. Derek. -----Original Message----- From: neeti somaiya [mailto:neetisomaiya at gmail.com] Sent: 15 December 2006 05:22 To: Fairley, Derek; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? Hi, Thanks a lot for your response. I ran needle like this ?/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out It gave me the output in format msf. But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO. Please help. Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence. ~Neeti. On 12/14/06, Fairley, Derek wrote: Neeti, ? >From http://emboss.sourceforge.net/apps/cvs/needle.html : ? "The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. ? The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs ? The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score ? See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats." ? Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format. ? HTH, ? Derek. ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya Sent: 14 December 2006 08:03 To: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? ? How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? ? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle).? I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > ? ? ? -- -Neeti Even my blood says, B positive -- -Neeti Even my blood says, B positive From cain at cshl.edu Fri Dec 15 00:01:36 2006 From: cain at cshl.edu (Scott Cain) Date: Fri, 15 Dec 2006 00:01:36 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <4581CCEB.20206@sendu.me.uk> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> Message-ID: <1166158897.2569.335.camel@localhost.localdomain> As much as I would like to take credit for this :-) Allen Day wrote the original code, and then Chris Fields tried to fix it so that it actually worked :-) I think it would be a good idea to have a validate_terms option like Bio::FeatureIO::gff. Scott On Thu, 2006-12-14 at 17:15 -0500, Sendu Bala wrote: > Matthew Vaughn wrote: > > Dear all, > > > > I'm trying to bring some of my code into compliance with the BioPerl > > 1.5.2 and am running into some design decisions that I am unclear on. > > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of > > the 'type' against SOFA? It seems to me that this should be optional > > behavior as is the case with the Bio::FeatureIO family. I'd be happy to > > write the patch if there is any agreement with me on this case. > > Lots of people seem to have worked on it over the years, but perhaps > Scott Cain is the person to talk to? > > revision 1.4 > date: 2004/09/25 11:41:29; author: scain; state: Exp; lines: +1 -1 > two things: > * adding SOFA as an available ontology to DocumentRegistry.pm > * modifying FeatureIO::gff to use SOFA to validate, and to parse > Ontology_term > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From neetisomaiya at gmail.com Fri Dec 15 07:46:08 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 15 Dec 2006 18:16:08 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Message-ID: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> I ran needle like this /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out Please find the output attached. When I run the following :- use Bio::SearchIO; my $io = Bio::SearchIO->new(-file => "1.out", -format => "fasta" ); while ( my $result = $io->next_result() ) { while( my $hit = $result->next_hit) { print "yes\n"; } } It says :- -------------------- WARNING --------------------- MSG: unrecognized FASTA Family report file! --------------------------------------------------- What should I do? ~Neeti. On 12/15/06, Fairley, Derek wrote: > > Neeti, > > In lieu of a response from a BioPerl guru... why not use Needle to > generate your pairwise alignment in fasta format, rather than msf format? > The sequence you want should correspond to a single HSP which you can get > directly from the fasta alignment with Bio::SearchIO: > http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use > Bio::AlignIO at all. > > Derek. > > > -----Original Message----- > From: neeti somaiya [mailto:neetisomaiya at gmail.com] > Sent: 15 December 2006 05:22 > To: Fairley, Derek; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > Hi, > > Thanks a lot for your response. > I ran needle like this > /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out > It gave me the output in format msf. > But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I > get the alignment start and stop coordinates on the sequence. I mean > something like hsp->query->start which gives us the alignment start position > on query sequence in a blast output when using Bio::SearchIO. > Please help. > Like I explained with an example in my previous mail, I want the > coordinate where the alignment starts on the sequence. > > ~Neeti. > On 12/14/06, Fairley, Derek wrote: > Neeti, > > From http://emboss.sourceforge.net/apps/cvs/needle.html : > > "The results can be output in one of several styles by using the > command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of > the required format. Some of the alignment formats can cope with an > unlimited number of sequences, while others are only for pairs of sequences. > > The available multiple alignment format names are: unknown, multiple, > simple, fasta, msf, trace, srs > > The available pairwise alignment format names are: pair, markx0, markx1, > markx2, markx3, markx10, srspair, score > > See: http://emboss.sf.net/docs/themes/AlignFormats.html for further > information on alignment formats." > > Not sure based on this whether you can get pairwise alignment in .msf > format; can't think of a good reason why not. The BioPerl Align::IO module > will allow you to parse alignments in .msf format. > > HTH, > > Derek. > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto: > bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya > Sent: 14 December 2006 08:03 > To: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > How do I run needle specifying that I want the MSF format, on a linux box? > The help doesnt show me any format option. Is there anything available to > pasre MSF format? > Please find an example alignment file attached. Here the seq_of_contig > aligns with the reference sequence (i.e. SEQ_1.REF) starting at position > (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the > output alignment, how can I parse the result to get this? > > On 12/12/06, Chris Fields wrote: > > > > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > > > Hi, > > > > > > Does anyone know of a bioperl parser for needle output, basically I > > > won't > > > where the target sequence aligns on the template (i.e. coordinate > > > on the > > > template where the taget aligns). > > > > > > -- > > > -Neeti > > > Even my blood says, B positive > > > > I answered this a number of months back: > > > > http://tinyurl.com/yzlbx5 > > > > Basically, newer versions of EMBOSS have changed the output for the > > AlignIO::emboss parser (which parses needle). I don't believe the > > parser has been fixed to deal with that, but Jason has pointed out > > you can use MSF output when running needle, then parse using AlignIO > > with the format set to 'msf'. > > > > chris > > > > > > -- > -Neeti > Even my blood says, B positive > > > > -- > -Neeti > Even my blood says, B positive > -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.out Type: application/octet-stream Size: 90277 bytes Desc: not available URL: From jason at bioperl.org Fri Dec 15 09:28:13 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 09:28:13 -0500 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> Message-ID: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > > On Dec 14, 2006, at 7:45 PM, David Messina wrote: > >> Hey Chris, >> >> My thoughts below. >> >>> [Chris] >>> This could be used to annotate any >>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, >>> maybe in a collection (similar to AnnotationCollection). I thought >>> something like this may be of general use for any PrimarySeq >>> (quality, structure), alignments like NEXUS and Stockholm, >>> SeqFeatures where structure could be stored (tRNA or riboswitches), >>> etc. >>> >>> However, this also seems to fall into the category of sequence >>> annotation. So, would it be better to have a set of Bio::Annotation >>> classes used for this purpose? >> >> >> To me, all meta data is equal. That is, your classic Genbank feature >> annotation and a user's arbitrary meta-tag like "Bob thinks this is a >> kinase domain" aren't different in kind even if they are different in >> content. >> >> As resequencing projects multiply, the ability to create arbitrary >> meta tags, attach them to different types of objects, and use those >> tags to link them together will become desirable, if not essential. >> >> Keeping a common interface to all of these meta data types would be >> advantageous, plus new users won't have to determine whether they >> need to use Bio::Meta objects or Bio::Annotation objects. >> >> So I would argue for all of the meta data types to live "under one >> roof". Which roof isn't as important. Bio::Annotation, since it >> already exists for today's meta data, seems like a reasonable choice. >> (assuming Annotation objects are flexible enough to be extended as >> you propose) >> >> There, and no flames or jibes even. :) > > I guess what I want to know is whether there should to be a > distinction between 'normal' sequence annotation (comments, > references, and so on) and annotation that could be best described as > position-specific (like RNA or protein structural annotation). The > current meta implementation is for sequence data only; I felt it > would be nice to have a generic implementation that would be > applicable to any object data. my stream-of-consciousness for right now: I was thinking Bio::Annotation is where this should go - that system doesn't have anything about it that makes it explicitly sequence related. What we're trying to hammer out here on the Alignment side - which fits with your RNA example - is have features, basically SeqFeatures - associated with alignments so columns can be annotated to cover things like character sets and partitions for phylogenetic analyses. As for data which annotates non-contiguous things like RNAstems we may have to be more creative about that or model it with a splitLocation. So currently we've added code so that an Alignment is-a Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this end, with the goal of being able to capture more of the data that can be represented in a NEXUS file. It feels more like a hack than an elegant Meta-data solution, but I am totally sure whether the data you are thinking about doing at this point, perhaps I need to spend more time thinking about it. Or are you worried about the idea of whether the semantic mapping of the data into features or annotations is confusing users? From jason at bioperl.org Fri Dec 15 09:48:32 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 09:48:32 -0500 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> Message-ID: <42CB9018-72CD-433E-A42F-152D63D2F584@bioperl.org> I get the impression you are trying to use the wrong tool for the job. Can you explain a little more generally what you want to do? Semantically FASTA in Bio::SearchIO is much different from FASTA in Bio::AlignIO. We explain this on the wiki, please have a look on the FASTA page. do not use Bio::SearchIO to parse multi-fasta alignment output Bio::SearchIO is for pairwise alignment reports use Bio::AlignIO for a multi-fasta format or for msf - you just provide a different field to '-format'. But none of that is going to help you get start/end for your alignment because that is not part of the output format - do the experiment of looking at the file and figuring out what are the actual fields you want output, if they don't exist then you either have a format that won't work for your question, or you will have to calculate additional . If you trying to align transcripts to genome please consider tools that are built for it (and referenced on the wiki like Sim4, est2genome, exonerate, BLAT). -jason On Dec 15, 2006, at 7:46 AM, neeti somaiya wrote: > I ran needle like this > > /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out > > Please find the output attached. > > When I run the following :- > > use Bio::SearchIO; > > my $io = Bio::SearchIO->new(-file => "1.out", > -format => "fasta" ); > > while ( my $result = $io->next_result() ) > { > while( my $hit = $result->next_hit) > { > > print "yes\n"; > } > } > > > It says :- > > -------------------- WARNING --------------------- > MSG: unrecognized FASTA Family report file! > --------------------------------------------------- > > What should I do? > > ~Neeti. > > On 12/15/06, Fairley, Derek wrote: >> >> Neeti, >> >> In lieu of a response from a BioPerl guru... why not use Needle to >> generate your pairwise alignment in fasta format, rather than msf >> format? >> The sequence you want should correspond to a single HSP which you >> can get >> directly from the fasta alignment with Bio::SearchIO: >> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need >> to use >> Bio::AlignIO at all. >> >> Derek. >> >> >> -----Original Message----- >> From: neeti somaiya [mailto:neetisomaiya at gmail.com] >> Sent: 15 December 2006 05:22 >> To: Fairley, Derek; bioperl-l >> Subject: Re: [Bioperl-l] needle parser in bioperl? >> >> Hi, >> >> Thanks a lot for your response. >> I ran needle like this >> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out >> It gave me the output in format msf. >> But now my problem is, if I use Bio::AlignIO module of Bioperl, >> how can I >> get the alignment start and stop coordinates on the sequence. I mean >> something like hsp->query->start which gives us the alignment >> start position >> on query sequence in a blast output when using Bio::SearchIO. >> Please help. >> Like I explained with an example in my previous mail, I want the >> coordinate where the alignment starts on the sequence. >> >> ~Neeti. >> On 12/14/06, Fairley, Derek wrote: >> Neeti, >> >> From http://emboss.sourceforge.net/apps/cvs/needle.html : >> >> "The results can be output in one of several styles by using the >> command-line qualifier -aformat xxx, where 'xxx' is replaced by >> the name of >> the required format. Some of the alignment formats can cope with an >> unlimited number of sequences, while others are only for pairs of >> sequences. >> >> The available multiple alignment format names are: unknown, multiple, >> simple, fasta, msf, trace, srs >> >> The available pairwise alignment format names are: pair, markx0, >> markx1, >> markx2, markx3, markx10, srspair, score >> >> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further >> information on alignment formats." >> >> Not sure based on this whether you can get pairwise alignment in .msf >> format; can't think of a good reason why not. The BioPerl >> Align::IO module >> will allow you to parse alignments in .msf format. >> >> HTH, >> >> Derek. >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto: >> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya >> Sent: 14 December 2006 08:03 >> To: Chris Fields; bioperl-l >> Subject: Re: [Bioperl-l] needle parser in bioperl? >> >> How do I run needle specifying that I want the MSF format, on a >> linux box? >> The help doesnt show me any format option. Is there anything >> available to >> pasre MSF format? >> Please find an example alignment file attached. Here the >> seq_of_contig >> aligns with the reference sequence (i.e. SEQ_1.REF) starting at >> position >> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate >> from the >> output alignment, how can I parse the result to get this? >> >> On 12/12/06, Chris Fields wrote: >> > >> > >> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: >> > >> > > Hi, >> > > >> > > Does anyone know of a bioperl parser for needle output, >> basically I >> > > won't >> > > where the target sequence aligns on the template (i.e. coordinate >> > > on the >> > > template where the taget aligns). >> > > >> > > -- >> > > -Neeti >> > > Even my blood says, B positive >> > >> > I answered this a number of months back: >> > >> > http://tinyurl.com/yzlbx5 >> > >> > Basically, newer versions of EMBOSS have changed the output for the >> > AlignIO::emboss parser (which parses needle). I don't believe the >> > parser has been fixed to deal with that, but Jason has pointed out >> > you can use MSF output when running needle, then parse using >> AlignIO >> > with the format set to 'msf'. >> > >> > chris >> > >> >> >> >> -- >> -Neeti >> Even my blood says, B positive >> >> >> >> -- >> -Neeti >> Even my blood says, B positive >> > > > > -- > -Neeti > Even my blood says, B positive > <1.out> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From lubapardo at gmail.com Fri Dec 15 11:39:11 2006 From: lubapardo at gmail.com (Luba Pardo) Date: Fri, 15 Dec 2006 17:39:11 +0100 Subject: [Bioperl-l] NO BLAST Message-ID: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> *Hello,* *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;* ** *I got the following error message: cannot find path to blastall.* *The code I used is (modified from HOWTObeginners): * #! /local/bin/perl -w #use strict; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; use Bio::Tools::Run::StandAloneBlast; my $db_object = Bio::DB::GenBank-> new; #my $seq_ob = $db_object->get_Seq_by_id('NM_004043'); #$seq= Bio::SeqIO->new(-file => "> out.fasta", -format => 'fasta'); #$seq ->write_seq($seq_ob); #print $seq; @params = (program =>'blastn', database =>'db.fa'); $blast_obj =Bio::Tools::Run::StandAloneBlast->new(@params); $seq_obj = Bio::Seq->new(-id =>"testquery", -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); $report_obj = $blast_obj->blastall($seq_obj); $result_obj =$report_obj->next_result; print $result_obj->num_hits; *Whether I create a sequence the novo or retrieve one from internet I got the same message.* From cjfields at uiuc.edu Fri Dec 15 12:23:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 11:23:27 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> Message-ID: On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: > > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > >> >> On Dec 14, 2006, at 7:45 PM, David Messina wrote: >> >>> Hey Chris, >>> >>> My thoughts below. >>> >>>> [Chris] >>>> This could be used to annotate any >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- >>>> you, >>>> maybe in a collection (similar to AnnotationCollection). I thought >>>> something like this may be of general use for any PrimarySeq >>>> (quality, structure), alignments like NEXUS and Stockholm, >>>> SeqFeatures where structure could be stored (tRNA or riboswitches), >>>> etc. >>>> >>>> However, this also seems to fall into the category of sequence >>>> annotation. So, would it be better to have a set of >>>> Bio::Annotation >>>> classes used for this purpose? >>> >>> >>> To me, all meta data is equal. That is, your classic Genbank feature >>> annotation and a user's arbitrary meta-tag like "Bob thinks this >>> is a >>> kinase domain" aren't different in kind even if they are >>> different in >>> content. >>> >>> As resequencing projects multiply, the ability to create arbitrary >>> meta tags, attach them to different types of objects, and use those >>> tags to link them together will become desirable, if not essential. >>> >>> Keeping a common interface to all of these meta data types would be >>> advantageous, plus new users won't have to determine whether they >>> need to use Bio::Meta objects or Bio::Annotation objects. >>> >>> So I would argue for all of the meta data types to live "under one >>> roof". Which roof isn't as important. Bio::Annotation, since it >>> already exists for today's meta data, seems like a reasonable >>> choice. >>> (assuming Annotation objects are flexible enough to be extended as >>> you propose) >>> >>> There, and no flames or jibes even. :) >> >> I guess what I want to know is whether there should to be a >> distinction between 'normal' sequence annotation (comments, >> references, and so on) and annotation that could be best described as >> position-specific (like RNA or protein structural annotation). The >> current meta implementation is for sequence data only; I felt it >> would be nice to have a generic implementation that would be >> applicable to any object data. > > my stream-of-consciousness for right now: > > I was thinking Bio::Annotation is where this should go - that > system doesn't have anything about it that makes it explicitly > sequence related. What we're trying to hammer out here on the > Alignment side - which fits with your RNA example - is have > features, basically SeqFeatures - associated with alignments so > columns can be annotated to cover things like character sets and > partitions for phylogenetic analyses. As for data which annotates > non-contiguous things like RNAstems we may have to be more > creative about that or model it with a splitLocation. > > So currently we've added code so that an Alignment is-a > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this > end, with the goal of being able to capture more of the data that > can be represented in a NEXUS file. > > It feels more like a hack than an elegant Meta-data solution, but I > am totally sure whether the data you are thinking about doing at > this point, perhaps I need to spend more time thinking about it. > Or are you worried about the idea of whether the semantic mapping > of the data into features or annotations is confusing users? Sorry in advance for the longish response here... My original thought was to have a generic abstract class capable of positionally describing data in any another class, similar to Heikki's Bio::Seq::MetaI but not constrained to sequence data only. Implementing classes would be capable of having different data structures based on their use (simple string, array, AoA, AoH, AoO). One MetaCollection class to contain them all in a tag-like system, so you could have mixed data types describe the same object. The latter Collection class is so similar to AnnotationCollection that I agree Bio::Annotation would be the best place for this. The way I reconfigured Stockholm alignment parsing/writing is to use Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is capable of holding a sequence and several meta strings, stored as tags or 'names'. However, there is no Meta object for alignments (for RNA/protein structure consensus and other Rfam/Pfam markup); I hacked around this by using a Bio::Seq::Meta w/o a seq, but I would rather have a generic Meta object independent of the sequence cruft. So for this partial Pfam alignment, Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG #=GR Q92SV1_RHIME/122-299 pAS ......................... Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT #=GC SA_cons 03002200312...1312414..676 #=GC seq_cons luhhLuhsRpl...hthppth..+pG // '#=GC' lines would be in generic meta string objects in the alignment, while '#=GR' tags would be in similar meta objects in the relevant sequences. As long as both aren't AnnotatableI this isn't an issue. Similarly, NEXUS files which contained any position-based values could hold a meta string/array object in a similar tag. The basic scheme is: |--String | Annotation::Meta----|--Array | |--HorriblyComplexDataStruct Then I started thinking about where this could be applied, and whether a true Meta object needs to be constrained only to describing position-based data. This somewhat relates to this bug: http://bugzilla.open-bio.org/show_bug.cgi?id=1825 which seems to need a simple but unconstrained hash-of-arrays-based meta object. Then my head appropriately exploded... Hope everything is going well at the hackathon! Looks like some interesting stuff coming out of it. chris From cjfields at uiuc.edu Fri Dec 15 12:49:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 11:49:45 -0600 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <1166158897.2569.335.camel@localhost.localdomain> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> Message-ID: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> On Dec 14, 2006, at 11:01 PM, Scott Cain wrote: > As much as I would like to take credit for this :-) Allen Day > wrote the > original code, and then Chris Fields tried to fix it so that it > actually > worked :-) I think it would be a good idea to have a validate_terms > option like Bio::FeatureIO::gff. > > Scott I did ?!? I committed a bug fix a while back: Revision 1.34 / (view) - annotate - [select for diffs] , Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields Branch: MAIN CVS Tags: branch-experimental Branch point for: branch-1-5-2 Changes since 1.33: +155 -33 lines Diff to previous 1.33 Bug 2026; Robert's enhancements To tell the truth I don't know if this is where the mandatory checks were added in; I'm not too familiar with SeqFeature::Annotation yet. I agree with Scott (and Matthew) that SOFA checks should be optional. Matthew, can you write up a patch and maybe some tests? chris From stewarta at nmrc.navy.mil Thu Dec 14 18:30:11 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 18:30:11 -0500 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown Message-ID: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> I'm getting the following exception... ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ SearchIO/blast.pm:1172 STACK: main::process_reports ./new_blast_script.pl:254 STACK: ./new_blast_script.pl:132 ----------------------------------------------------------- next_result is a pretty dense chunk of code to decipher. I was wondering if anyone more familiar with that code might know what the "no data for midline $_" exception is referring to? For context: 1161 if( /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+ (\-?\d+)/ ) { 1162 my ($full,$type,$start,$str,$end) = ($1, $2,$3,$4,$5); 1163 if( $str eq '-' ) { 1164 $i = 3 if $type eq 'Sbjct'; 1165 } else { 1166 $data{$type} = $str; 1167 } 1168 $len = length($full); 1169 $self->{"\_$type"}->{'begin'} = $start unless $self->{"_$type"}->{'begin'}; 1170 $self->{"\_$type"}->{'end'} = $end; 1171 } else { 1172 $self->throw("no data for midline $_") 1173 unless (defined $_ && defined $len); 1174 $data{'Mid'} = substr($_,$len); 1175 } -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From jason at bioperl.org Fri Dec 15 13:56:13 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 13:56:13 -0500 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown In-Reply-To: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> Message-ID: It means it is expecting alignment block of data and there is none (or there is none in the context it is expecting it) - so something is wrong with the report as it gets tripped up. I'm not sure reading the code is going to help you - what someone will have to do is figure out what is different about this report than reports that do work for the parser. You'll do better if you just provide an example report that is failing as a bug report. Providing the version of BLAST you are using and version of bioperl will help. I seem to remember NCBI changing the BLAST text format so that will break the parser if it is a significant change. As has been mentioned in the past, this playing cat and mouse with format changes means things will periodically break. If you need rock- solid always going to work, I guess the XML is better route to go. -jason On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote: > I'm getting the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ > SearchIO/blast.pm:1172 > STACK: main::process_reports ./new_blast_script.pl:254 > STACK: ./new_blast_script.pl:132 > ----------------------------------------------------------- > > > next_result is a pretty dense chunk of code to decipher. I was > wondering if anyone more familiar with that code might know what the > "no data for midline $_" exception is referring to? > > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Dec 15 14:21:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 13:21:32 -0600 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown In-Reply-To: References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> Message-ID: <6A0D17FA-CB98-4937-998E-11B87FB9CBBD@uiuc.edu> On Dec 15, 2006, at 12:56 PM, Jason Stajich wrote: > It means it is expecting alignment block of data and there is none > (or there is none in the context it is expecting it) - so something > is wrong with the report as it gets tripped up. > > I'm not sure reading the code is going to help you - what someone > will have to do is figure out what is different about this report > than reports that do work for the parser. > You'll do better if you just provide an example report that is > failing as a bug report. > > Providing the version of BLAST you are using and version of bioperl > will help. I seem to remember NCBI changing the BLAST text format so > that will break the parser if it is a significant change. > > As has been mentioned in the past, this playing cat and mouse with > format changes means things will periodically break. If you need rock- > solid always going to work, I guess the XML is better route to go. > > -jason I agree that XML is the only reliable way to go, though I have been reading on the BioPython group about some issues with newer (2.2.13 or greater) BLAST XML output when reports with multiple BLAST queries. Don't know if this affects Bioperl or not. As for the 'midline' error, there was a similar error a while back (fixed for the 1.5.2 release) that had to do with extra lines in the alignment section in some BLAST reports. Unless we have a demo BLAST report and sample code we can't do much about it (we need to reproduce the error in order to fix it), so the best thing to do it file a bug report. chris > On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote: > >> I'm getting the following exception... >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: >> 328 >> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ >> SearchIO/blast.pm:1172 >> STACK: main::process_reports ./new_blast_script.pl:254 >> STACK: ./new_blast_script.pl:132 >> ----------------------------------------------------------- >> >> >> next_result is a pretty dense chunk of code to decipher. I was >> wondering if anyone more familiar with that code might know what the >> "no data for midline $_" exception is referring to? >> >> >> -- >> Andrew Stewart >> Research Assistant, Genomics Team >> Navy Medical Research Center (NMRC) >> Biological Defense Research Directorate (BDRD) >> BDRD Annex >> 12300 Washington Avenue, 2nd Floor >> Rockville, MD 20852 >> >> email: stewarta at nmrc.navy.mil >> phone: 301-231-6700 Ext 270 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From vaughn at cshl.edu Fri Dec 15 13:05:47 2006 From: vaughn at cshl.edu (Matthew Vaughn) Date: Fri, 15 Dec 2006 13:05:47 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> Message-ID: Yes, I will. I am working on it today. It's a little more complicated to fix this than I expected because SeqFeature::Annotation->type() returns a Bio::AnnotationI rather than a simple scalar like it used to. On 12/15/06, Chris Fields wrote: > On Dec 14, 2006, at 11:01 PM, Scott Cain wrote: > > > As much as I would like to take credit for this :-) Allen Day > > wrote the > > original code, and then Chris Fields tried to fix it so that it > > actually > > worked :-) I think it would be a good idea to have a validate_terms > > option like Bio::FeatureIO::gff. > > > > Scott > > I did ?!? I committed a bug fix a while back: > > Revision 1.34 / (view) - annotate - [select for diffs] , > Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields > Branch: MAIN > CVS Tags: branch-experimental > Branch point for: branch-1-5-2 > Changes since 1.33: +155 -33 lines > Diff to previous 1.33 > > Bug 2026; Robert's enhancements > > To tell the truth I don't know if this is where the mandatory checks > were added in; I'm not too familiar with SeqFeature::Annotation yet. > > I agree with Scott (and Matthew) that SOFA checks should be > optional. Matthew, can you write up a patch and maybe some tests? > > chris > > > > From valiente at lsi.upc.edu Fri Dec 15 19:45:27 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Sat, 16 Dec 2006 01:45:27 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4577EFD3.7090904@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> Message-ID: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> > I don't think that can be true. Your error message contains 'Must > supply > a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). > > If you uninstall the fink installation and install 1.5.2 using cpan > (with root privileges by going sudo cpan) that should at least get > rid of the error messages... > > >> The tree is not correct (I've parsed it from R to have a double >> check) but don't know yet what the problem is with it. > > ... But if the tree is wrong anyway... Let me know what you find out. I've uninstalled the fink installation and used the cvs instead, and the error message is gone. However, on a larger set of 190 species, which are all present in the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, something must be wrong with the merge_lineage method in the major rewrite of the taxonomy2tree script. Can someone please check this? I'm attaching the 190 species call to the script. Thanks, Gabriel -------------- next part -------------- A non-text attachment was scrubbed... Name: fetch-bork.sh Type: application/octet-stream Size: 7378 bytes Desc: not available URL: From lincoln.stein at gmail.com Fri Dec 15 11:02:27 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 15 Dec 2006 11:02:27 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> Message-ID: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> This is very embarassing for me, particularly since I spent a lot of time validating that Bio::Graphics was working properly before the 1.5.2 release went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release? Lincoln On 12/14/06, Lincoln Stein wrote: > > Hi All, > > I'm afraid that the xyplot glyph that is in the recent bioperl release has > an error that causes the content to be printed to the right of the correct > position. Unfortunately this wasn't caught before the release because the > glyph was only tested on very large (whole genome) features. > > You will need to do a CVS update to get a fixed version from bioperl-live. > A future bugfix release of gbrowse will patch this glyph for you > automatically. > > Lincoln > > On 12/12/06, Kara Dolinski wrote: > > > > Hi, > > I'm having a problem getting features and an xyplot properly aligned in > > Gbrowse. For example, see this page: > > > > http://tinyurl.com/ylbq3q > > > > The feature in the "CENPK SNPs" track should actually be around the peak > > of the graph in the "CENPK prediction signal" xyplot ie. the SNP > > feature is at position 79, and the xyplot axes and data should span from > > 61 - 95. However, as you can see, the data in the xyplot are oddly > > separated from the axes (which seem to be in the correct place), with the > > data shifted over to about position 120-155. > > This occurs elsewhere, not just at the ends of the chromosomes. > > > > When I zoom to ~80 bp, all is well, see: > > > > http://tinyurl.com/yzav8k > > > > The relevant snippets from the GFF and the config files are below. > > > > Thanks! > > Kara > > > > GFF: > > > > chrI SNPScanner > > CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score > > is 2.24506 > > chrI SNPScanner > > CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score > > is 3.26837 > > chrI SNPScanner > > CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score > > is 1.39938 > > chrI SNPScanner > > CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score > > is 1.4039 > > chrI SNPScanner > > CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score > > is 9.16134 > > chrI SNPScanner > > CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score > > is 10.1413 > > chrI SNPScanner > > CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score > > is 12.9256 > > chrI SNPScanner > > CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score > > is 13.195 > > chrI SNPScanner > > CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score > > is 22.7127 > > chrI SNPScanner > > CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score > > is 23.8289 > > chrI SNPScanner > > CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score > > is 21.9123 > > chrI SNPScanner > > CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score > > is 28.3344 > > chrI SNPScanner > > CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score > > is 35.0436 > > chrI SNPScanner > > CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score > > is 37.361 > > chrI SNPScanner > > CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score > > is 39.5408 > > chrI SNPScanner > > CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score > > is 28.2008 > > chrI SNPScanner > > CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score > > is 32.6254 > > chrI SNPScanner > > CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score > > is 36.0832 > > chrI SNPScanner > > CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score > > is 32.1205 > > chrI SNPScanner > > CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score > > is 41.3048 > > chrI SNPScanner > > CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score > > is 30.7975 > > chrI SNPScanner > > CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score > > is 29.4282 > > chrI SNPScanner > > CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score > > is 35.3586 > > chrI SNPScanner > > CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score > > is 34.1426 > > chrI SNPScanner > > CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score > > is 30.2966 > > chrI SNPScanner > > CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score > > is 17.8402 > > chrI SNPScanner > > CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score > > is 15.2637 > > chrI SNPScanner > > CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score > > is 12.657 > > chrI SNPScanner > > CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score > > is 10.2033 > > chrI SNPScanner > > CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score > > is 9.40143 > > chrI SNPScanner > > CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score > > is 6.56273 > > chrI SNPScanner > > CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score > > is 3.66211 > > chrI SNPScanner > > CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score > > is 0.394194 > > > > CONFIG: > > > > > > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} > > > > [CENPK_all_scores_graph] > > feature = GRAPH_CENPK:SNPScanner > > glyph = xyplot > > graph_type = boxes > > fgcolor = purple > > bgcolor = purple > > height = 100 > > min_score = 0 > > max_score = 110 > > label = 0 > > key = CENPK prediction signal > > link = > > category = SNPs: signal graphs > > > > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > > your > > opinions on IT & business topics through brief surveys - and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > > > _______________________________________________ > > Gmod-gbrowse mailing list > > Gmod-gbrowse at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > > > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Sat Dec 16 01:10:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 00:10:07 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> Message-ID: <70A5E333-8CF5-49D3-84AC-7A6A02791B5C@uiuc.edu> We could feasibly have regular point releases of the 1.5 dev. series for bug fixes; I guess it just depends on how often these should come out and what critical tests must pass for a release to go forward. Sendu's already done a ton of work towards getting BioPerl switched over to Module::Build and Test::More, and fixing bugs. As Hilmar has pointed out in the past, this is a developer's series, so not every test needs to pass before a release goes out. When would you like this to go out? chris On Dec 15, 2006, at 10:02 AM, Lincoln Stein wrote: > This is very embarassing for me, particularly since I spent a lot > of time > validating that Bio::Graphics was working properly before the 1.5.2 > release > went out. How long before there is a 1.5.3 release? How about a > 1.5.2.1release? > > Lincoln > > On 12/14/06, Lincoln Stein wrote: >> >> Hi All, >> >> I'm afraid that the xyplot glyph that is in the recent bioperl >> release has >> an error that causes the content to be printed to the right of the >> correct >> position. Unfortunately this wasn't caught before the release >> because the >> glyph was only tested on very large (whole genome) features. >> >> You will need to do a CVS update to get a fixed version from >> bioperl-live. >> A future bugfix release of gbrowse will patch this glyph for you >> automatically. >> >> Lincoln >> >> On 12/12/06, Kara Dolinski wrote: >>> >>> Hi, >>> I'm having a problem getting features and an xyplot properly >>> aligned in >>> Gbrowse. For example, see this page: >>> >>> http://tinyurl.com/ylbq3q >>> >>> The feature in the "CENPK SNPs" track should actually be around >>> the peak >>> of the graph in the "CENPK prediction signal" xyplot ie. the SNP >>> feature is at position 79, and the xyplot axes and data should >>> span from >>> 61 - 95. However, as you can see, the data in the xyplot are oddly >>> separated from the axes (which seem to be in the correct place), >>> with the >>> data shifted over to about position 120-155. >>> This occurs elsewhere, not just at the ends of the chromosomes. >>> >>> When I zoom to ~80 bp, all is well, see: >>> >>> http://tinyurl.com/yzav8k >>> >>> The relevant snippets from the GFF and the config files are below. >>> >>> Thanks! >>> Kara >>> >>> GFF: >>> >>> chrI SNPScanner >>> CENPK_GRAPH 61 95 41.9883 . . >>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_CALL 79 79 41.9883 . . >>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_SCORE 61 61 2.24506 . . >>> ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score >>> is 2.24506 >>> chrI SNPScanner >>> CENPK_SCORE 62 62 3.26837 . . >>> ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score >>> is 3.26837 >>> chrI SNPScanner >>> CENPK_SCORE 63 63 1.39938 . . >>> ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score >>> is 1.39938 >>> chrI SNPScanner >>> CENPK_SCORE 64 64 1.4039 . . >>> ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score >>> is 1.4039 >>> chrI SNPScanner >>> CENPK_SCORE 65 65 9.16134 . . >>> ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score >>> is 9.16134 >>> chrI SNPScanner >>> CENPK_SCORE 66 66 10.1413 . . >>> ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score >>> is 10.1413 >>> chrI SNPScanner >>> CENPK_SCORE 67 67 12.9256 . . >>> ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score >>> is 12.9256 >>> chrI SNPScanner >>> CENPK_SCORE 68 68 13.195 . . >>> ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score >>> is 13.195 >>> chrI SNPScanner >>> CENPK_SCORE 69 69 22.7127 . . >>> ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score >>> is 22.7127 >>> chrI SNPScanner >>> CENPK_SCORE 70 70 23.8289 . . >>> ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score >>> is 23.8289 >>> chrI SNPScanner >>> CENPK_SCORE 71 71 21.9123 . . >>> ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score >>> is 21.9123 >>> chrI SNPScanner >>> CENPK_SCORE 72 72 28.3344 . . >>> ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score >>> is 28.3344 >>> chrI SNPScanner >>> CENPK_SCORE 73 73 35.0436 . . >>> ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score >>> is 35.0436 >>> chrI SNPScanner >>> CENPK_SCORE 74 74 37.361 . . >>> ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score >>> is 37.361 >>> chrI SNPScanner >>> CENPK_SCORE 75 75 39.5408 . . >>> ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score >>> is 39.5408 >>> chrI SNPScanner >>> CENPK_SCORE 76 76 28.2008 . . >>> ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score >>> is 28.2008 >>> chrI SNPScanner >>> CENPK_SCORE 77 77 32.6254 . . >>> ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score >>> is 32.6254 >>> chrI SNPScanner >>> CENPK_SCORE 78 78 36.0832 . . >>> ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score >>> is 36.0832 >>> chrI SNPScanner >>> CENPK_SCORE 79 79 41.9883 . . >>> ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_SCORE 80 80 32.1205 . . >>> ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score >>> is 32.1205 >>> chrI SNPScanner >>> CENPK_SCORE 81 81 41.3048 . . >>> ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score >>> is 41.3048 >>> chrI SNPScanner >>> CENPK_SCORE 82 82 30.7975 . . >>> ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score >>> is 30.7975 >>> chrI SNPScanner >>> CENPK_SCORE 83 83 29.4282 . . >>> ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score >>> is 29.4282 >>> chrI SNPScanner >>> CENPK_SCORE 84 84 35.3586 . . >>> ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score >>> is 35.3586 >>> chrI SNPScanner >>> CENPK_SCORE 85 85 34.1426 . . >>> ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score >>> is 34.1426 >>> chrI SNPScanner >>> CENPK_SCORE 86 86 30.2966 . . >>> ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score >>> is 30.2966 >>> chrI SNPScanner >>> CENPK_SCORE 87 87 17.8402 . . >>> ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score >>> is 17.8402 >>> chrI SNPScanner >>> CENPK_SCORE 88 88 15.2637 . . >>> ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score >>> is 15.2637 >>> chrI SNPScanner >>> CENPK_SCORE 89 89 12.657 . . >>> ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score >>> is 12.657 >>> chrI SNPScanner >>> CENPK_SCORE 90 90 10.2033 . . >>> ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score >>> is 10.2033 >>> chrI SNPScanner >>> CENPK_SCORE 91 91 9.40143 . . >>> ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score >>> is 9.40143 >>> chrI SNPScanner >>> CENPK_SCORE 92 92 6.56273 . . >>> ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score >>> is 6.56273 >>> chrI SNPScanner >>> CENPK_SCORE 93 93 3.66211 . . >>> ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score >>> is 3.66211 >>> chrI SNPScanner >>> CENPK_SCORE 94 94 0.394194 . . >>> ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score >>> is 0.394194 >>> >>> CONFIG: >>> >>> >>> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} >>> >>> [CENPK_all_scores_graph] >>> feature = GRAPH_CENPK:SNPScanner >>> glyph = xyplot >>> graph_type = boxes >>> fgcolor = purple >>> bgcolor = purple >>> height = 100 >>> min_score = 0 >>> max_score = 110 >>> label = 0 >>> key = CENPK prediction signal >>> link = >>> category = SNPs: signal graphs >>> >>> >>> >>> -------------------------------------------------------------------- >>> ----- >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to >>> share >>> your >>> opinions on IT & business topics through brief surveys - and earn >>> cash >>> http://www.techsay.com/default.php? >>> page=join.php&p=sourceforge&CID=DEVDEV >>> >>> >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>> >>> >>> >> >> >> -- >> Lincoln D. Stein >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> FOR URGENT MESSAGES & SCHEDULING, >> PLEASE CONTACT MY ASSISTANT, >> SANDRA MICHELSEN, AT michelse at cshl.edu >> > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sat Dec 16 01:28:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 00:28:47 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: >> I don't think that can be true. Your error message contains 'Must >> supply >> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >> >> If you uninstall the fink installation and install 1.5.2 using >> cpan (with root privileges by going sudo cpan) that should at >> least get rid of the error messages... >> >> >>> The tree is not correct (I've parsed it from R to have a double >>> check) but don't know yet what the problem is with it. >> >> ... But if the tree is wrong anyway... Let me know what you find out. > > I've uninstalled the fink installation and used the cvs instead, > and the error message is gone. However, on a larger set of 190 > species, which are all present in the NCBI taxonomy, the resulting > tree has only 178 taxa. I suspect, something must be wrong with the > merge_lineage method in the major rewrite of the taxonomy2tree > script. Can someone please check this? I'm attaching the 190 > species call to the script. Thanks, > > Gabriel I can confirm that. It is definitely dropping them in merge_lineage (); if you add a call to get_leaf_nodes to check how many are present after each merge_lineage() call, you can see it dropping nodes along the trace. in taxonomy2tree.pl: my $ct; my ($treect, $mergect) = 0; for my $name (@species) { my $ncbi_id = $db->get_taxonid($name); if ($ncbi_id) { #print "Species: $name\n\tTaxID: $ncbi_id\n"; #$ids{$ncbi_id}++; my $node = $db->get_taxon(-taxonid => $ncbi_id); if ($tree) { $tree->merge_lineage($node); } else { $tree = Bio::Tree::Tree->new(-node => $node); } printf("%-3d: Nodes: %-4d\n",$ct,scalar($tree->get_leaf_nodes)); } else { warn "no NCBI Taxonomy node for species ",$name,"\n"; } $ct++; } chris From bix at sendu.me.uk Sat Dec 16 09:37:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 14:37:49 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> Message-ID: <458404BD.8030908@sendu.me.uk> Lincoln Stein wrote: > This is very embarassing for me, particularly since I spent a lot of time > validating that Bio::Graphics was working properly before the 1.5.2 release > went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release? I'm happy to try a point release for critical bug fixes. Why don't you commit the necessary fixes to branch-1-5-2 and let me know when you're happy, and I'll do 1.5.2.1. Cheers, Sendu. From bix at sendu.me.uk Sat Dec 16 09:47:57 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 14:47:57 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: <4584071D.3070005@sendu.me.uk> Gabriel Valiente wrote: >> I don't think that can be true. Your error message contains 'Must supply >> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >> >> If you uninstall the fink installation and install 1.5.2 using cpan >> (with root privileges by going sudo cpan) that should at least get rid >> of the error messages... >> >> >>> The tree is not correct (I've parsed it from R to have a double >>> check) but don't know yet what the problem is with it. >> >> ... But if the tree is wrong anyway... Let me know what you find out. > > I've uninstalled the fink installation and used the cvs instead, and the > error message is gone. However, on a larger set of 190 species, which > are all present in the NCBI taxonomy, the resulting tree has only 178 > taxa. I suspect, something must be wrong with the merge_lineage method > in the major rewrite of the taxonomy2tree script. Can someone please > check this? I'm attaching the 190 species call to the script. Thanks, Ok, I'll look into it. You're also welcome to see if you can take your own code from your original taxonomy2tree script and see if you can merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with your algorithms to get it working correctly. Indeed, does your original version of the script work on this data set? Cheers, Sendu. From cjfields at uiuc.edu Sat Dec 16 10:18:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 09:18:50 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4584071D.3070005@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4584071D.3070005@sendu.me.uk> Message-ID: <6AE33842-B2E7-4E9B-B80D-68A058045818@uiuc.edu> On Dec 16, 2006, at 8:47 AM, Sendu Bala wrote: > Gabriel Valiente wrote: >>> I don't think that can be true. Your error message contains 'Must >>> supply >>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >>> >>> If you uninstall the fink installation and install 1.5.2 using cpan >>> (with root privileges by going sudo cpan) that should at least >>> get rid >>> of the error messages... >>> >>> >>>> The tree is not correct (I've parsed it from R to have a double >>>> check) but don't know yet what the problem is with it. >>> >>> ... But if the tree is wrong anyway... Let me know what you find >>> out. >> >> I've uninstalled the fink installation and used the cvs instead, >> and the >> error message is gone. However, on a larger set of 190 species, which >> are all present in the NCBI taxonomy, the resulting tree has only 178 >> taxa. I suspect, something must be wrong with the merge_lineage >> method >> in the major rewrite of the taxonomy2tree script. Can someone please >> check this? I'm attaching the 190 species call to the script. Thanks, > > Ok, I'll look into it. You're also welcome to see if you can take your > own code from your original taxonomy2tree script and see if you can > merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with > your algorithms to get it working correctly. Indeed, does your > original > version of the script work on this data set? > > > Cheers, > Sendu. Sendu, Don't know if it helps, but when I tried Gabriel's shell script last night I ran a modification of taxonomy2tree to see what would pop up. Everything is fine up to about 100 iterations, then merge_lineage () starts dropping leaf nodes. chris From bix at sendu.me.uk Sat Dec 16 10:33:35 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 15:33:35 +0000 Subject: [Bioperl-l] NO BLAST In-Reply-To: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> References: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> Message-ID: <458411CF.8000707@sendu.me.uk> Luba Pardo wrote: > *Hello,* > *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;* > ** > *I got the following error message: cannot find path to blastall.* > *The code I used is (modified from HOWTObeginners): Bioperl doesn't know where you installed blast. If you've actually installed it, you can set the environment variable BLASTDIR to point to the directory that contains the blastall executable. From cain.cshl at gmail.com Fri Dec 15 13:09:48 2006 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 15 Dec 2006 13:09:48 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> Message-ID: <1166206188.2569.380.camel@localhost.localdomain> On Fri, 2006-12-15 at 11:49 -0600, Chris Fields wrote: > > To tell the truth I don't know if this is where the mandatory checks > were added in; I'm not too familiar with SeqFeature::Annotation yet. > > I agree with Scott (and Matthew) that SOFA checks should be > optional. Matthew, can you write up a patch and maybe some tests? > > chris > That's not where they were added in, it just that they hadn't been fully implemented before then, so they didn't work (which probably meant they weren't mandatory, though I don't remember (it could be that it just croaked)). Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From hlapp at gmx.net Sun Dec 17 01:02:04 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 17 Dec 2006 01:02:04 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <458404BD.8030908@sendu.me.uk> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> Message-ID: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: > Lincoln Stein wrote: >> This is very embarassing for me, particularly since I spent a lot >> of time >> validating that Bio::Graphics was working properly before the >> 1.5.2 release >> went out. How long before there is a 1.5.3 release? How about a >> 1.5.2.1release? > > I'm happy to try a point release for critical bug fixes. Why don't you > commit the necessary fixes to branch-1-5-2 and let me know when you're > happy, and I'll do 1.5.2.1. Feel free to do that, but why not make a 1.5.3 off the main trunk? 1.5.2.1 may be adding more to the version confusion (developer/stable/ point-release/etc) than it is worth, and there is no shame in releasing new developer versions every few weeks. My $0.02 ... -hilmar > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From fgarret at ub.edu Mon Dec 18 07:07:02 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 13:07:02 +0100 Subject: [Bioperl-l] codeml Message-ID: <45868466.508@ub.edu> Hi all, I've been using bioperl's PAML module (specifically the codeml part) but with just one tree. Since the program accepts several trees as input (and runs the analysis for each tree outputting the difference in likelihoods for each one) I was wondering if there's some way to do it through bioperl? thanks in adv, FG From heikki at sanbi.ac.za Mon Dec 18 08:51:50 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 18 Dec 2006 15:51:50 +0200 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> Message-ID: <200612181551.51277.heikki@sanbi.ac.za> Reading the discussion, I think it is time to draw some guidelines. 1. Base the Meta implementation to a real use cases. MSA is a good example. 2. Allow generalisations If you can see an other implementation of the same idea that can be merged with the first do it but do not hurt yourself if you can not. The most difficult question is how to separate case-specific attributes that are best implemented by subclassing with additional methods from truly widely variable meta data that is best done as a parallel track meta information holding class. The problem I see with undefined, totally open meta annotation, is that if you can put anything in there, it is also totally confusing to a user. If you can put anything in, how do you know what to get get out and know that it is there? That leads to the the third guideline: 3. Use separate meta classes only when there are several different ways of encoding data that is present in large numbers *and* when you are expecting to be assessing the data computationally rather than just checking if an attribute is there. -Heikki On Friday 15 December 2006 19:23, Chris Fields wrote: > On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: > > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > >> On Dec 14, 2006, at 7:45 PM, David Messina wrote: > >>> Hey Chris, > >>> > >>> My thoughts below. > >>> > >>>> [Chris] > >>>> This could be used to annotate any > >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- > >>>> you, > >>>> maybe in a collection (similar to AnnotationCollection). I thought > >>>> something like this may be of general use for any PrimarySeq > >>>> (quality, structure), alignments like NEXUS and Stockholm, > >>>> SeqFeatures where structure could be stored (tRNA or riboswitches), > >>>> etc. > >>>> > >>>> However, this also seems to fall into the category of sequence > >>>> annotation. So, would it be better to have a set of > >>>> Bio::Annotation > >>>> classes used for this purpose? > >>> > >>> To me, all meta data is equal. That is, your classic Genbank feature > >>> annotation and a user's arbitrary meta-tag like "Bob thinks this > >>> is a > >>> kinase domain" aren't different in kind even if they are > >>> different in > >>> content. > >>> > >>> As resequencing projects multiply, the ability to create arbitrary > >>> meta tags, attach them to different types of objects, and use those > >>> tags to link them together will become desirable, if not essential. > >>> > >>> Keeping a common interface to all of these meta data types would be > >>> advantageous, plus new users won't have to determine whether they > >>> need to use Bio::Meta objects or Bio::Annotation objects. > >>> > >>> So I would argue for all of the meta data types to live "under one > >>> roof". Which roof isn't as important. Bio::Annotation, since it > >>> already exists for today's meta data, seems like a reasonable > >>> choice. > >>> (assuming Annotation objects are flexible enough to be extended as > >>> you propose) > >>> > >>> There, and no flames or jibes even. :) > >> > >> I guess what I want to know is whether there should to be a > >> distinction between 'normal' sequence annotation (comments, > >> references, and so on) and annotation that could be best described as > >> position-specific (like RNA or protein structural annotation). The > >> current meta implementation is for sequence data only; I felt it > >> would be nice to have a generic implementation that would be > >> applicable to any object data. > > > > my stream-of-consciousness for right now: > > > > I was thinking Bio::Annotation is where this should go - that > > system doesn't have anything about it that makes it explicitly > > sequence related. What we're trying to hammer out here on the > > Alignment side - which fits with your RNA example - is have > > features, basically SeqFeatures - associated with alignments so > > columns can be annotated to cover things like character sets and > > partitions for phylogenetic analyses. As for data which annotates > > non-contiguous things like RNAstems we may have to be more > > creative about that or model it with a splitLocation. > > > > So currently we've added code so that an Alignment is-a > > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this > > end, with the goal of being able to capture more of the data that > > can be represented in a NEXUS file. > > > > It feels more like a hack than an elegant Meta-data solution, but I > > am totally sure whether the data you are thinking about doing at > > this point, perhaps I need to spend more time thinking about it. > > Or are you worried about the idea of whether the semantic mapping > > of the data into features or annotations is confusing users? > > Sorry in advance for the longish response here... > > My original thought was to have a generic abstract class capable of > positionally describing data in any another class, similar to > Heikki's Bio::Seq::MetaI but not constrained to sequence data only. > Implementing classes would be capable of having different data > structures based on their use (simple string, array, AoA, AoH, AoO). > One MetaCollection class to contain them all in a tag-like system, so > you could have mixed data types describe the same object. The latter > Collection class is so similar to AnnotationCollection that I agree > Bio::Annotation would be the best place for this. > > The way I reconfigured Stockholm alignment parsing/writing is to use > Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is > capable of holding a sequence and several meta strings, stored as > tags or 'names'. However, there is no Meta object for alignments > (for RNA/protein structure consensus and other Rfam/Pfam markup); I > hacked around this by using a Bio::Seq::Meta w/o a seq, but I would > rather have a generic Meta object independent of the sequence cruft. > > So for this partial Pfam alignment, > > Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG > #=GR Q92SV1_RHIME/122-299 pAS ......................... > Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS > Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG > #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT > #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 > #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT > #=GC SA_cons 03002200312...1312414..676 > #=GC seq_cons luhhLuhsRpl...hthppth..+pG > // > > '#=GC' lines would be in generic meta string objects in the > alignment, while '#=GR' tags would be in similar meta objects in the > relevant sequences. As long as both aren't AnnotatableI this isn't > an issue. > > Similarly, NEXUS files which contained any position-based values > could hold a meta string/array object in a similar tag. > > The basic scheme is: > |--String > > Annotation::Meta----|--Array > > |--HorriblyComplexDataStruct > > Then I started thinking about where this could be applied, and > whether a true Meta object needs to be constrained only to describing > position-based data. This somewhat relates to this bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=1825 > > which seems to need a simple but unconstrained hash-of-arrays-based > meta object. > > Then my head appropriately exploded... > > Hope everything is going well at the hackathon! Looks like some > interesting stuff coming out of it. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From fgarret at ub.edu Mon Dec 18 11:18:31 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 17:18:31 +0100 Subject: [Bioperl-l] PAML files Message-ID: <4586BF57.4090002@ub.edu> Hi all, does anyone knows how to get the name of the .ctl file created by the PAML module? Inside the tmp directory there are 2 files with random names (tree and ctl). Why do they have random names?? Wouldn't it be easier to assign them a fixed name?? For instance "codeml.ctl" and "tree.nwk"?? thanks in adv, FG From bix at sendu.me.uk Mon Dec 18 11:15:21 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 16:15:21 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> Message-ID: <4586BE99.7020308@sendu.me.uk> Hilmar Lapp wrote: > > On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: > >> Lincoln Stein wrote: >>> This is very embarassing for me, particularly since I spent a lot >>> of time validating that Bio::Graphics was working properly before >>> the 1.5.2 release went out. How long before there is a 1.5.3 >>> release? How about a 1.5.2.1release? >> >> I'm happy to try a point release for critical bug fixes. Why don't >> you commit the necessary fixes to branch-1-5-2 and let me know when >> you're happy, and I'll do 1.5.2.1. > > Feel free to do that, but why not make a 1.5.3 off the main trunk? > 1.5.2.1 may be adding more to the version confusion > (developer/stable/point-release/etc) than it is worth, My feeling is that 1.5.3 should be reserved for some significant changes and new features, and not just a few bug fixes. I'd say this causes less confusion amongst users - they can associate '1.5.2' with a certain API and feature-set, and the specific name of the file they download and install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't matter at all to them. I also won't have to make some major announcement about it; it will simply be the most recent developer version of bioperl available so new users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing 1.5.2 users will only feel compelled to get it if they suffer from the bugs fixed. > and there is no shame in releasing new developer versions every few > weeks. I think doing frequent releases are inadvisable; such a quick release won't have had much testing so we shouldn't encourage people to install it: encouragement is implicit when a major new version comes out like 1.5.3. People who want to live on the edge can and should be using a CVS checkout. From bix at sendu.me.uk Mon Dec 18 14:15:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 19:15:16 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: <4586E8C4.6030306@sendu.me.uk> Chris Fields wrote: > On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: > >> However, on a larger set of 190 species, which are all present in >> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, >> something must be wrong with the merge_lineage method in the major >> rewrite of the taxonomy2tree script. Can someone please check this? >> I'm attaching the 190 species call to the script. Thanks, >> >> Gabriel > > I can confirm that. It is definitely dropping them in merge_lineage > (); if you add a call to get_leaf_nodes to check how many are > present after each merge_lineage() call, you can see it dropping > nodes along the trace. I confirm the 'dropped' nodes, but also claim that this is no bug. For example, the first 'drop' happens for the 101st species which is 'Leptospira interrogans serovar Copenhageni'. This is a variation (descendant) of species 24: 'Leptospira interrogans'. So when the variation is added it becomes a leaf and 'Leptospira interrogans' is no longer a leaf, so the overall number of leaves does not increase. The next drop is for species 103 'Prochlorococcus marinus subsp. pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'. Same deal. I didn't check any others, but suspect the same issue arises in all cases. Gabriel, please confirm this isn't a bug, or suggest how you propose to see your taxa when they are not all leaves of the tree. PS. I changed the merge_lineage() algorithm to be 18x faster (from the absurd 3mins for making the 190 species tree to a more reasonable 10s), without changing the tree produced. From fgarret at ub.edu Mon Dec 18 15:01:38 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 21:01:38 +0100 Subject: [Bioperl-l] PAML files In-Reply-To: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> References: <4586BF57.4090002@ub.edu> <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> Message-ID: <4586F3A2.4010607@ub.edu> Hi Jason, This question is related with the one I made previously today. I need to run codeml with 3 tree topologies. I looked on codeml module but it only accepts one tree as input so I thought of using the codeml module to prepare all the files and then I would just have to run the codeml with the new tree file in batch. But for that I need to know which one is the ctl file. any idea? FG Jason Stajich wrote: > They are temporary names so they are deliberately random and there is no > intention of you needing them after a run since it to be cleaned up on > the fly. We use an internal method for generating tempfiles that takes > care of cleanup afterwards. I suppose since we do all the work within a > temp directory that is cleaned up, one could have a fixed name for the > tree, alignment, and ctl files but honestly we never expect people to be > reading these filenames as they are intended to be transient. > > What problem are you having that you need the filename? > > -jason > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > >> Hi all, >> >> does anyone knows how to get the name of the .ctl file created by the >> PAML module? Inside the tmp directory there are 2 files with random >> names (tree and ctl). Why do they have random names?? Wouldn't it be >> easier to assign them a fixed name?? For instance "codeml.ctl" and >> "tree.nwk"?? >> >> thanks in adv, >> FG >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > From fgarret at ub.edu Mon Dec 18 15:07:46 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 21:07:46 +0100 Subject: [Bioperl-l] codeml In-Reply-To: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> References: <45868466.508@ub.edu> <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> Message-ID: <4586F512.1030209@ub.edu> Right now it's impossible for me to write it. By February or March I should have more time but I'll let you know. FG Jason Stajich wrote: > This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I > guess we'll need to allow the -tree option to accept and arrayref of trees. > Are you willing to try write this patch? It should be added as a > bug/feature request to bugzilla so it can be corrected in short order. > > -jason > On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote: > >> Hi all, >> >> I've been using bioperl's PAML module (specifically the codeml part) but >> with just one tree. >> >> Since the program accepts several trees as input (and runs the analysis >> for each tree outputting the difference in likelihoods for each one) I >> was wondering if there's some way to do it through bioperl? >> >> thanks in adv, >> FG >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > > From cjfields at uiuc.edu Mon Dec 18 15:55:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 14:55:55 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4586E8C4.6030306@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> Message-ID: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: >> >>> However, on a larger set of 190 species, which are all present in >>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, >>> something must be wrong with the merge_lineage method in the major >>> rewrite of the taxonomy2tree script. Can someone please check this? >>> I'm attaching the 190 species call to the script. Thanks, >>> >>> Gabriel >> >> I can confirm that. It is definitely dropping them in merge_lineage >> (); if you add a call to get_leaf_nodes to check how many are >> present after each merge_lineage() call, you can see it dropping >> nodes along the trace. > > I confirm the 'dropped' nodes, but also claim that this is no bug. > > For example, the first 'drop' happens for the 101st species which is > 'Leptospira interrogans serovar Copenhageni'. This is a variation > (descendant) of species 24: 'Leptospira interrogans'. So when the > variation is added it becomes a leaf and 'Leptospira interrogans' > is no > longer a leaf, so the overall number of leaves does not increase. > > The next drop is for species 103 'Prochlorococcus marinus subsp. > pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'. > Same deal. I didn't check any others, but suspect the same issue > arises > in all cases. Makes sense now. I personally would consider this a bug since the results are unexpected (so the docs need to be modified in order to clarify). Some say tomato... I suppose this is one of the issues one might run into when using NCBI taxonomy to build trees. > Gabriel, please confirm this isn't a bug, or suggest how you > propose to > see your taxa when they are not all leaves of the tree. Having the nodes appear internally seems semantically correct to me. Is there any other way? > PS. I changed the merge_lineage() algorithm to be 18x faster (from the > absurd 3mins for making the 190 species tree to a more reasonable > 10s), > without changing the tree produced. Definitely an improvement! chris From jason at bioperl.org Mon Dec 18 14:33:32 2006 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Dec 2006 14:33:32 -0500 Subject: [Bioperl-l] PAML files In-Reply-To: <4586BF57.4090002@ub.edu> References: <4586BF57.4090002@ub.edu> Message-ID: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> They are temporary names so they are deliberately random and there is no intention of you needing them after a run since it to be cleaned up on the fly. We use an internal method for generating tempfiles that takes care of cleanup afterwards. I suppose since we do all the work within a temp directory that is cleaned up, one could have a fixed name for the tree, alignment, and ctl files but honestly we never expect people to be reading these filenames as they are intended to be transient. What problem are you having that you need the filename? -jason On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > Hi all, > > does anyone knows how to get the name of the .ctl file created by the > PAML module? Inside the tmp directory there are 2 files with random > names (tree and ctl). Why do they have random names?? Wouldn't it be > easier to assign them a fixed name?? For instance "codeml.ctl" and > "tree.nwk"?? > > thanks in adv, > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjm at fruitfly.org Mon Dec 18 16:50:00 2006 From: cjm at fruitfly.org (Chris Mungall) Date: Mon, 18 Dec 2006 13:50:00 -0800 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> Message-ID: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> I agree with everything Heikki is saying, I just wanted to highlight one paragraph: > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? One solution is to give your annotation/metadata-model formal computational semantics and use ontologies to give additional semantics to your metadata tags. This provides both user information in the form of documentation, and a means of specifying to the computer exactly what should be done with the tags. This is probably overkill for bioperl; but if the use cases being proposed do lean in the direction of a new metadata system that is not necessarily backwards compatible with the existing one, then I'd recommend checking out what's already out there before re-inventing the wheel. Perl RDF libraries are getting a little better. If anyone is interested in pursuing this sort of thing (probably on a branch), let me know On Dec 18, 2006, at 5:51 AM, Heikki Lehvaslaiho wrote: > > Reading the discussion, I think it is time to draw some guidelines. > > 1. Base the Meta implementation to a real use cases. > > MSA is a good example. > > 2. Allow generalisations > > If you can see an other implementation of the same idea that can > be merged > with the first do it but do not hurt yourself if you can not. > > > The most difficult question is how to separate case-specific > attributes that > are best implemented by subclassing with additional methods from > truly widely > variable meta data that is best done as a parallel track meta > information > holding class. > > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? > > That leads to the the third guideline: > > 3. Use separate meta classes only when there are several different > ways of > encoding data that is present in large numbers *and* when you are > expecting > to be assessing the data computationally rather than just checking > if an > attribute is there. > > > -Heikki > > > > On Friday 15 December 2006 19:23, Chris Fields wrote: >> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: >>> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: >>>> On Dec 14, 2006, at 7:45 PM, David Messina wrote: >>>>> Hey Chris, >>>>> >>>>> My thoughts below. >>>>> >>>>>> [Chris] >>>>>> This could be used to annotate any >>>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- >>>>>> you, >>>>>> maybe in a collection (similar to AnnotationCollection). I >>>>>> thought >>>>>> something like this may be of general use for any PrimarySeq >>>>>> (quality, structure), alignments like NEXUS and Stockholm, >>>>>> SeqFeatures where structure could be stored (tRNA or >>>>>> riboswitches), >>>>>> etc. >>>>>> >>>>>> However, this also seems to fall into the category of sequence >>>>>> annotation. So, would it be better to have a set of >>>>>> Bio::Annotation >>>>>> classes used for this purpose? >>>>> >>>>> To me, all meta data is equal. That is, your classic Genbank >>>>> feature >>>>> annotation and a user's arbitrary meta-tag like "Bob thinks this >>>>> is a >>>>> kinase domain" aren't different in kind even if they are >>>>> different in >>>>> content. >>>>> >>>>> As resequencing projects multiply, the ability to create arbitrary >>>>> meta tags, attach them to different types of objects, and use >>>>> those >>>>> tags to link them together will become desirable, if not >>>>> essential. >>>>> >>>>> Keeping a common interface to all of these meta data types >>>>> would be >>>>> advantageous, plus new users won't have to determine whether they >>>>> need to use Bio::Meta objects or Bio::Annotation objects. >>>>> >>>>> So I would argue for all of the meta data types to live "under one >>>>> roof". Which roof isn't as important. Bio::Annotation, since it >>>>> already exists for today's meta data, seems like a reasonable >>>>> choice. >>>>> (assuming Annotation objects are flexible enough to be extended as >>>>> you propose) >>>>> >>>>> There, and no flames or jibes even. :) >>>> >>>> I guess what I want to know is whether there should to be a >>>> distinction between 'normal' sequence annotation (comments, >>>> references, and so on) and annotation that could be best >>>> described as >>>> position-specific (like RNA or protein structural annotation). The >>>> current meta implementation is for sequence data only; I felt it >>>> would be nice to have a generic implementation that would be >>>> applicable to any object data. >>> >>> my stream-of-consciousness for right now: >>> >>> I was thinking Bio::Annotation is where this should go - that >>> system doesn't have anything about it that makes it explicitly >>> sequence related. What we're trying to hammer out here on the >>> Alignment side - which fits with your RNA example - is have >>> features, basically SeqFeatures - associated with alignments so >>> columns can be annotated to cover things like character sets and >>> partitions for phylogenetic analyses. As for data which annotates >>> non-contiguous things like RNAstems we may have to be more >>> creative about that or model it with a splitLocation. >>> >>> So currently we've added code so that an Alignment is-a >>> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this >>> end, with the goal of being able to capture more of the data that >>> can be represented in a NEXUS file. >>> >>> It feels more like a hack than an elegant Meta-data solution, but I >>> am totally sure whether the data you are thinking about doing at >>> this point, perhaps I need to spend more time thinking about it. >>> Or are you worried about the idea of whether the semantic mapping >>> of the data into features or annotations is confusing users? >> >> Sorry in advance for the longish response here... >> >> My original thought was to have a generic abstract class capable of >> positionally describing data in any another class, similar to >> Heikki's Bio::Seq::MetaI but not constrained to sequence data only. >> Implementing classes would be capable of having different data >> structures based on their use (simple string, array, AoA, AoH, AoO). >> One MetaCollection class to contain them all in a tag-like system, so >> you could have mixed data types describe the same object. The latter >> Collection class is so similar to AnnotationCollection that I agree >> Bio::Annotation would be the best place for this. >> >> The way I reconfigured Stockholm alignment parsing/writing is to use >> Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is >> capable of holding a sequence and several meta strings, stored as >> tags or 'names'. However, there is no Meta object for alignments >> (for RNA/protein structure consensus and other Rfam/Pfam markup); I >> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would >> rather have a generic Meta object independent of the sequence cruft. >> >> So for this partial Pfam alignment, >> >> Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG >> #=GR Q92SV1_RHIME/122-299 pAS ......................... >> Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS >> Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG >> #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT >> #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 >> #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT >> #=GC SA_cons 03002200312...1312414..676 >> #=GC seq_cons luhhLuhsRpl...hthppth..+pG >> // >> >> '#=GC' lines would be in generic meta string objects in the >> alignment, while '#=GR' tags would be in similar meta objects in the >> relevant sequences. As long as both aren't AnnotatableI this isn't >> an issue. >> >> Similarly, NEXUS files which contained any position-based values >> could hold a meta string/array object in a similar tag. >> >> The basic scheme is: >> |--String >> >> Annotation::Meta----|--Array >> >> |--HorriblyComplexDataStruct >> >> Then I started thinking about where this could be applied, and >> whether a true Meta object needs to be constrained only to describing >> position-based data. This somewhat relates to this bug: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1825 >> >> which seems to need a simple but unconstrained hash-of-arrays-based >> meta object. >> >> Then my head appropriately exploded... >> >> Hope everything is going well at the hackathon! Looks like some >> interesting stuff coming out of it. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Dec 18 14:35:50 2006 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Dec 2006 14:35:50 -0500 Subject: [Bioperl-l] codeml In-Reply-To: <45868466.508@ub.edu> References: <45868466.508@ub.edu> Message-ID: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I guess we'll need to allow the -tree option to accept and arrayref of trees. Are you willing to try write this patch? It should be added as a bug/ feature request to bugzilla so it can be corrected in short order. -jason On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote: > Hi all, > > I've been using bioperl's PAML module (specifically the codeml > part) but > with just one tree. > > Since the program accepts several trees as input (and runs the > analysis > for each tree outputting the difference in likelihoods for each one) I > was wondering if there's some way to do it through bioperl? > > thanks in adv, > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From gowthaman.ramasamy at sbri.org Mon Dec 18 17:19:09 2006 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Mon, 18 Dec 2006 14:19:09 -0800 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence Message-ID: Hi List, Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) Many thanks in advance, gowtham From cjfields at uiuc.edu Mon Dec 18 17:33:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 16:33:34 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> Message-ID: On Dec 18, 2006, at 7:51 AM, Heikki Lehvaslaiho wrote: > > Reading the discussion, I think it is time to draw some guidelines. > > 1. Base the Meta implementation to a real use cases. > > MSA is a good example. AlignIO::stockholm is where I'll initially test it out. > 2. Allow generalisations > > If you can see an other implementation of the same idea that can > be merged > with the first do it but do not hurt yourself if you can not. I agree. > The most difficult question is how to separate case-specific > attributes that > are best implemented by subclassing with additional methods from > truly widely > variable meta data that is best done as a parallel track meta > information > holding class. I would probably start with a general Bio::Annotation::MetaI abstract class, which supplements AnnotationI with general meta-specific methods (meta, meta_text, named_meta, etc)? Implement this in whatever way one wanted (RNA structure as strings, quality data as arrays, etc) under the constraints of the interface description. Multiple meta objects, potentially of mixed data types, could be added in an AnnotationCollection along with other Bio::Annotation data, or stored in a nested meta-specific AnnotationCollection object (I favor the former as it's simpler). So you could have an alignment, sequence, seqfeature (anything that is AnnotatableI) with a regular AnnotationCollection also containing possibly multiple meta objects, each meta object also containing possibly more than one set of meta data. The key issue I have is whether or not to constrain these to describing positional data, similar to Bio::Seq::Meta, by ensuring that the data is_flush(), etc. My current inclination is 'no', and to have a separate abstract class which describes these methods, implementing those separately. > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? > > That leads to the the third guideline: > > 3. Use separate meta classes only when there are several different > ways of > encoding data that is present in large numbers *and* when you are > expecting > to be assessing the data computationally rather than just checking > if an > attribute is there. > > > -Heikki The initial use case for this would be simple data strings for alignment data. I already have a partial implementation in place for stockholm using Bio::Seq::Meta (which led me to this proposal!). I like Chris M.'s idea of ensuring that meta implementations use some sort of formalized ontology, but I'll probably start out very simple and work up from there. chris From cjfields at uiuc.edu Mon Dec 18 17:38:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 16:38:14 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <4586BE99.7020308@sendu.me.uk> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> <4586BE99.7020308@sendu.me.uk> Message-ID: <6AD475AE-7F5E-4612-BC24-73B65AA47F30@uiuc.edu> On Dec 18, 2006, at 10:15 AM, Sendu Bala wrote: > Hilmar Lapp wrote: >> >> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: >> >>> Lincoln Stein wrote: >>>> This is very embarassing for me, particularly since I spent a lot >>>> of time validating that Bio::Graphics was working properly before >>>> the 1.5.2 release went out. How long before there is a 1.5.3 >>>> release? How about a 1.5.2.1release? >>> >>> I'm happy to try a point release for critical bug fixes. Why don't >>> you commit the necessary fixes to branch-1-5-2 and let me know when >>> you're happy, and I'll do 1.5.2.1. >> >> Feel free to do that, but why not make a 1.5.3 off the main trunk? >> 1.5.2.1 may be adding more to the version confusion >> (developer/stable/point-release/etc) than it is worth, > > My feeling is that 1.5.3 should be reserved for some significant > changes > and new features, and not just a few bug fixes. I'd say this causes > less > confusion amongst users - they can associate '1.5.2' with a certain > API > and feature-set, and the specific name of the file they download and > install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't > matter at all to them. > > I also won't have to make some major announcement about it; it will > simply be the most recent developer version of bioperl available so > new > users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing > 1.5.2 users will only feel compelled to get it if they suffer from the > bugs fixed. > > >> and there is no shame in releasing new developer versions every few >> weeks. > > I think doing frequent releases are inadvisable; such a quick release > won't have had much testing so we shouldn't encourage people to > install > it: encouragement is implicit when a major new version comes out like > 1.5.3. People who want to live on the edge can and should be using a > CVS checkout. I thought that 1.5.2 was considered a point release for the 1.5 dev series, for bug fixes along with the potential for added/experimental features. Similarly, 1.6.x releases would be point releases for bug fixes only with all tests passing (no added features since it is a stable release series). I guess one could reason that 1.5.x releases have both bug fixes and new features, while 1.5.x.y releases are simply bug fixes for the 1.5.x branch (no new features). We probably should add something to the FAQ and maybe make a few changes to the 1.5.2 wiki page. I think having a 1.5.2.1 release is feasible as a quick one-off to get Lincoln's fixes in, since you would make them off the 1.5.2 branch anyway (so I guess it could be considered a bug release from that branch). It's probably not something we should make a habit of, but then again I'm not the Pumpkin! chris From bix at sendu.me.uk Mon Dec 18 17:50:11 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 22:50:11 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> Message-ID: <45871B23.8070103@sendu.me.uk> Chris Fields wrote: > > On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: > >> For example, the first 'drop' happens for the 101st species which is >> 'Leptospira interrogans serovar Copenhageni'. This is a variation >> (descendant) of species 24: 'Leptospira interrogans'. So when the >> variation is added it becomes a leaf and 'Leptospira interrogans' is no >> longer a leaf, so the overall number of leaves does not increase. > > Makes sense now. I personally would consider this a bug since the > results are unexpected (so the docs need to be modified in order to > clarify). Some say tomato... > > I suppose this is one of the issues one might run into when using NCBI > taxonomy to build trees. No, the tree produced is perfectly fine. The taxonomy2tree.pl script deliberately then does: # simple paths are contracted by removing degree one nodes $tree->contract_linear_paths; Because that is what Gabriel's script originally did. >> Gabriel, please confirm this isn't a bug, or suggest how you propose to >> see your taxa when they are not all leaves of the tree. > > Having the nodes appear internally seems semantically correct to me. Is > there any other way? I suppose if we want to see all the input species output again we have to make contract_linear_paths() aware of nodes we want to keep, even when they are degree one nodes. Gabriel, is that what you want to see? From cjfields at uiuc.edu Mon Dec 18 18:14:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 17:14:23 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <45871B23.8070103@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> Message-ID: On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: >>> For example, the first 'drop' happens for the 101st species which is >>> 'Leptospira interrogans serovar Copenhageni'. This is a variation >>> (descendant) of species 24: 'Leptospira interrogans'. So when the >>> variation is added it becomes a leaf and 'Leptospira interrogans' >>> is no >>> longer a leaf, so the overall number of leaves does not increase. >> >> Makes sense now. I personally would consider this a bug since the >> results are unexpected (so the docs need to be modified in order >> to clarify). Some say tomato... >> I suppose this is one of the issues one might run into when using >> NCBI taxonomy to build trees. > > No, the tree produced is perfectly fine. The taxonomy2tree.pl > script deliberately then does: > > # simple paths are contracted by removing degree one nodes > $tree->contract_linear_paths; > > Because that is what Gabriel's script originally did. I think you misunderstood me. The tree is fine; the data used to make the tree (NCBI taxonomy) is the issue. One of the clear caveats that NCBI attaches to their taxonomy data is that should not be the 'primary source for taxonomic or phylogenetic information': http://tinyurl.com/y3k624 I think it works as a good guide as long as one takes the above into consideration. That and the fact that not all taxids attached to sequence data will represent leaf nodes. chris From cjfields at uiuc.edu Mon Dec 18 18:15:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 17:15:56 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> Message-ID: <16D6DB51-C2CB-4E89-A597-4672FAA6681B@uiuc.edu> On Dec 18, 2006, at 3:50 PM, Chris Mungall wrote: > > I agree with everything Heikki is saying, I just wanted to highlight > one paragraph: > >> The problem I see with undefined, totally open meta annotation, is >> that if you >> can put anything in there, it is also totally confusing to a user. >> If you can >> put anything in, how do you know what to get get out and know that >> it is >> there? > > One solution is to give your annotation/metadata-model formal > computational semantics and use ontologies to give additional > semantics to your metadata tags. This provides both user information > in the form of documentation, and a means of specifying to the > computer exactly what should be done with the tags. > > This is probably overkill for bioperl; but if the use cases being > proposed do lean in the direction of a new metadata system that is > not necessarily backwards compatible with the existing one, then I'd > recommend checking out what's already out there before re-inventing > the wheel. Perl RDF libraries are getting a little better. > > If anyone is interested in pursuing this sort of thing (probably on a > branch), let me know ... I like the idea of of using ontologies (although that's one of my many weak points!). I'll likely start off with simple examples using meta data initially, then progress from there. It is a developer series, after all! Thanks everybody! I think I have an idea on how to at least get started. chris From bix at sendu.me.uk Mon Dec 18 18:27:15 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 23:27:15 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> Message-ID: <458723D3.4010908@sendu.me.uk> Chris Fields wrote: > > On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: >>>> For example, the first 'drop' happens for the 101st species which is >>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation >>>> (descendant) of species 24: 'Leptospira interrogans'. So when the >>>> variation is added it becomes a leaf and 'Leptospira interrogans' is no >>>> longer a leaf, so the overall number of leaves does not increase. >>> >>> Makes sense now. I personally would consider this a bug since the >>> results are unexpected (so the docs need to be modified in order to >>> clarify). Some say tomato... >>> I suppose this is one of the issues one might run into when using >>> NCBI taxonomy to build trees. >> >> No, the tree produced is perfectly fine. The taxonomy2tree.pl script >> deliberately then does: >> >> # simple paths are contracted by removing degree one nodes >> $tree->contract_linear_paths; >> >> Because that is what Gabriel's script originally did. > > I think you misunderstood me. The tree is fine; the data used to make > the tree (NCBI taxonomy) is the issue. In what way is it the issue? The database is also fine as far as I can see, in so far as it is not causing any problems in this instance. Gabriel asked for a tree featuring a species and its subspecies. The NCBI taxonomy database provided Bioperl the correct data to build such a tree. Then Gabriel asked to remove the degree one nodes of his tree. His problem was that doing that happened to (correctly) remove the species node. If he wants to see both his species and his subspecies he must either not remove degree one nodes, or alter the method of doing so to keep desired nodes. There is no possible way for NCBI to improve matters here. From bix at sendu.me.uk Mon Dec 18 18:45:59 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 23:45:59 +0000 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <45872837.6050403@sendu.me.uk> Gowthaman Ramasamy wrote: > Hi List, Is there any module in bioperl which can find out the primer > binding sites in a genomic sequence. I am interested in finding > locations with few mismatches along the primer...not just the exact > match (which is a very trivial task) There's no module dedicated to that task, but Bioperl may help you to answer the question. Probably the easiest/reliable/clear thing to do is to do a Blast with appropriate settings for short sequence with few mismatches. You can write a script to only consider hits for your forward primer that are a 'primable' distance from a hit to your reverse primer (and check their orientations are correct as well). Or use some e-pcr tool. From Kevin.M.Brown at asu.edu Mon Dec 18 18:52:20 2006 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 18 Dec 2006 16:52:20 -0700 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence Message-ID: <1A4207F8295607498283FE9E93B775B40270F3BB@EX02.asurite.ad.asu.edu> A function I use to find the first landing site for a primer. Should be modifiable to handle multiple occurences: =head1 C Match searches for a near alignment between two strings and returns the position at which the two strings align. Match is based on 80% conformation match($this, $in_that) =cut sub match { my ($primer, $gene) = @_; my $start = 0; my $pattern = ""; for (my $i = 0 ; $i < length($primer) ; $i++) { $pattern .= substr($primer, $i, 1); pos($gene) = 0; if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($pattern) + 1; } else { $start = 0; chop($pattern); $pattern .= '.'; } } if ($pattern =~ /\.$/) { if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($pattern) + 1; } } $pattern =~ s/\.//g; if ((length($pattern) / length($primer)) > .8) { #print $start . "\n"; return $start; } return 0; } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, December 18, 2006 4:46 PM > To: Gowthaman Ramasamy > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] module to find out primer binding > sites in a genome sequence > > Gowthaman Ramasamy wrote: > > Hi List, Is there any module in bioperl which can find out > the primer > > binding sites in a genomic sequence. I am interested in finding > > locations with few mismatches along the primer...not just the exact > > match (which is a very trivial task) > > There's no module dedicated to that task, but Bioperl may help you to > answer the question. > > Probably the easiest/reliable/clear thing to do is to do a Blast with > appropriate settings for short sequence with few mismatches. You can > write a script to only consider hits for your forward primer > that are a > 'primable' distance from a hit to your reverse primer (and check their > orientations are correct as well). > > Or use some e-pcr tool. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From torsten.seemann at infotech.monash.edu.au Mon Dec 18 18:52:58 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 19 Dec 2006 10:52:58 +1100 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <458729DA.9030909@infotech.monash.edu.au> Gowthaman Ramasamy wrote: > Hi List, > Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. > I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) This FAQ question may help: http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F This software may help: http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From sdavis2 at mail.nih.gov Mon Dec 18 21:16:19 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 18 Dec 2006 21:16:19 -0500 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <45874B73.7010600@mail.nih.gov> Gowthaman Ramasamy wrote: > Hi List, > Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. > I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) > See here: http://genome.ucsc.edu/cgi-bin/hgPcr?command=start It is designed for exactly this task, is very fast, is available as an executable or web-based (though watch the usage requirements), and the output can be parsed rather easily. Sean From cjfields at uiuc.edu Mon Dec 18 21:30:04 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 20:30:04 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <458723D3.4010908@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> Message-ID: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> >> I think you misunderstood me. The tree is fine; the data used to >> make >> the tree (NCBI taxonomy) is the issue. > > In what way is it the issue? The database is also fine as far as I can > see, in so far as it is not causing any problems in this instance. I should maybe have clarified a bit more: what I said has nothing to do with the structure of the database itself. I was just pointing out that NCBI Taxonomy isn't the best source of data for building a phylogenetic tree, something NCBI also points out (the link in my last post). Not a big deal, really. > Gabriel asked for a tree featuring a species and its subspecies. The > NCBI taxonomy database provided Bioperl the correct data to build > such a > tree. Then Gabriel asked to remove the degree one nodes of his > tree. His > problem was that doing that happened to (correctly) remove the species > node. If he wants to see both his species and his subspecies he must > either not remove degree one nodes, or alter the method of doing so to > keep desired nodes. There is no possible way for NCBI to improve > matters > here. Actually, there isn't any way they could w/o digging through the literature in many cases. The problem is incomplete taxonomic information for nodes derived from older sequence data, where a genus and species was designated but nothing else (strain, etc) is known. Again, I merely was pointing out what I had mentioned above. I wasn't criticizing you, Gabriel, or the methodology here. Honest! chris From avilella at gmail.com Mon Dec 18 16:43:27 2006 From: avilella at gmail.com (Albert Vilella) Date: Mon, 18 Dec 2006 21:43:27 +0000 Subject: [Bioperl-l] PAML files In-Reply-To: <4586F3A2.4010607@ub.edu> References: <4586BF57.4090002@ub.edu> <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> <4586F3A2.4010607@ub.edu> Message-ID: <358f4d650612181343o5bd51169w7b46cceb34a5c92b@mail.gmail.com> Filipe, if you need to create the ctl file but not run the job, you can use the "prepare" method in Codeml run. Also, there is a tmpdir and save_tempfiles method that will keep the files where you want. About the naming, you can add a ".tree" and ".aln" extension to the tempnames if you want, by altering the $temptreefile and $tempseqfile variables in bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm (cvs head version). If you want, you can also add a couple of getters/setters there: sub alnfilename{ my $self = shift; return $self->{'alnfilename'} = shift if @_; return $self->{'alnfilename'}; } and subtitute those $tempseqfile io calls for you $self->{'alnfilename'} io calls. $codeml->alnfilename("/path/name"); $codeml->prepare; ... $codeml->run; What I use to do is to have the aln and tree files in a different place. Codeml will create the tmp files for running somewhere, and then delete all the stuff when done. Cheers, Albert. On 12/18/06, Filipe Garrett wrote: > > Hi Jason, > > This question is related with the one I made previously today. > I need to run codeml with 3 tree topologies. I looked on codeml module > but it only accepts one tree as input so I thought of using the codeml > module to prepare all the files and then I would just have to run the > codeml with the new tree file in batch. But for that I need to know > which one is the ctl file. > > any idea? > FG > > Jason Stajich wrote: > > They are temporary names so they are deliberately random and there is no > > intention of you needing them after a run since it to be cleaned up on > > the fly. We use an internal method for generating tempfiles that takes > > care of cleanup afterwards. I suppose since we do all the work within a > > temp directory that is cleaned up, one could have a fixed name for the > > tree, alignment, and ctl files but honestly we never expect people to be > > reading these filenames as they are intended to be transient. > > > > What problem are you having that you need the filename? > > > > -jason > > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > > > >> Hi all, > >> > >> does anyone knows how to get the name of the .ctl file created by the > >> PAML module? Inside the tmp directory there are 2 files with random > >> names (tree and ctl). Why do they have random names?? Wouldn't it be > >> easier to assign them a fixed name?? For instance "codeml.ctl" and > >> "tree.nwk"?? > >> > >> thanks in adv, > >> FG > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > http://jason.open-bio.org/ > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From valiente at lsi.upc.edu Mon Dec 18 23:18:20 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Tue, 19 Dec 2006 13:18:20 +0900 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> Message-ID: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> Thanks a lot for the prompt answer and follow-up discussion. I think this turned out not to be a bug in the merge_lineage() code but a direct consequence of building a phylogenetic tree instead of a taxonomic tree, aka with internal node labels. In order to reconstruct the NCBI taxonomy for the set of species present in a given phylogenetic tree, the only reasonable work-around seems to be a first step of merging lineages and contracting linear paths with the current implementation, followed by a second step of restricting the given phylogenetic tree to the set of species present in the obtained NCBI taxonomy. But this does not affect the code for merge_lineage(). Gabriel >>> I think you misunderstood me. The tree is fine; the data used to >>> make >>> the tree (NCBI taxonomy) is the issue. >> >> In what way is it the issue? The database is also fine as far as I >> can >> see, in so far as it is not causing any problems in this instance. > > I should maybe have clarified a bit more: what I said has nothing > to do with the structure of the database itself. I was just > pointing out that NCBI Taxonomy isn't the best source of data for > building a phylogenetic tree, something NCBI also points out (the > link in my last post). Not a big deal, really. > >> Gabriel asked for a tree featuring a species and its subspecies. The >> NCBI taxonomy database provided Bioperl the correct data to build >> such a >> tree. Then Gabriel asked to remove the degree one nodes of his >> tree. His >> problem was that doing that happened to (correctly) remove the >> species >> node. If he wants to see both his species and his subspecies he must >> either not remove degree one nodes, or alter the method of doing >> so to >> keep desired nodes. There is no possible way for NCBI to improve >> matters >> here. > > Actually, there isn't any way they could w/o digging through the > literature in many cases. The problem is incomplete taxonomic > information for nodes derived from older sequence data, where a > genus and species was designated but nothing else (strain, etc) is > known. > > Again, I merely was pointing out what I had mentioned above. I > wasn't criticizing you, Gabriel, or the methodology here. Honest! > > chris From cjfields at uiuc.edu Mon Dec 18 23:41:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 22:41:16 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> Message-ID: On Dec 18, 2006, at 10:18 PM, Gabriel Valiente wrote: > Thanks a lot for the prompt answer and follow-up discussion. I > think this turned out not to be a bug in the merge_lineage() code > but a direct consequence of building a phylogenetic tree instead of > a taxonomic tree, aka with internal node labels. > > In order to reconstruct the NCBI taxonomy for the set of species > present in a given phylogenetic tree, the only reasonable work- > around seems to be a first step of merging lineages and contracting > linear paths with the current implementation, followed by a second > step of restricting the given phylogenetic tree to the set of > species present in the obtained NCBI taxonomy. But this does not > affect the code for merge_lineage(). > > Gabriel I did notice one thing, though it's minor: if you use the option to retrieve the data from Entrez, a few species aren't found (even though they show up in a local taxonomy search). I think both were E. coli strains. chris From DGroskreutz at twt.com Tue Dec 19 02:00:40 2006 From: DGroskreutz at twt.com (DGroskreutz at twt.com) Date: Tue, 19 Dec 2006 01:00:40 -0600 Subject: [Bioperl-l] CN=Deb Groskreutz/OU=MSN/O=TWT is out of the office. Message-ID: I will be out of the office starting 12/18/2006 and will not return until 01/02/2007. NOTICE OF CONFIDENTIALITY: The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments. From michael.watson at bbsrc.ac.uk Tue Dec 19 07:20:56 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 19 Dec 2006 12:20:56 -0000 Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs? Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I'm using bioperl-1.4. I did do a google search fro this but couldn't find anything. If this is fixed in 1.5.2 then forgive me. I'm getting a warning: MSG: No whitespace allowed in FASTA ID [unknown id] When trying to convert from EMBL format to fasta. The offending sequence is CK234114: ID CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP. XX AC CK234114; XX DT 03-MAR-2004 (Rel. 79, Created) DT 03-MAR-2004 (Rel. 79, Last updated, Version 1) XX DE SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon cDNA DE Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA DE sequence. Etc Any advice? Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From michael.watson at bbsrc.ac.uk Tue Dec 19 07:27:59 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 19 Dec 2006 12:27:59 -0000 Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs? In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E682@iahce2ksrv1.iah.bbsrc.ac.uk> Sorry, problem solved. Mick -----Original Message----- From: michael watson (IAH-C) Sent: 19 December 2006 12:21 To: bioperl-l at lists.open-bio.org Subject: Problems with EMBL entries and fasta IDs? Hi I'm using bioperl-1.4. I did do a google search fro this but couldn't find anything. If this is fixed in 1.5.2 then forgive me. I'm getting a warning: MSG: No whitespace allowed in FASTA ID [unknown id] When trying to convert from EMBL format to fasta. The offending sequence is CK234114: ID CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP. XX AC CK234114; XX DT 03-MAR-2004 (Rel. 79, Created) DT 03-MAR-2004 (Rel. 79, Last updated, Version 1) XX DE SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon cDNA DE Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA DE sequence. Etc Any advice? Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From roest216 at student.otago.ac.nz Tue Dec 19 04:15:55 2006 From: roest216 at student.otago.ac.nz (Stephan Roessner) Date: Tue, 19 Dec 2006 22:15:55 +1300 Subject: [Bioperl-l] problems installing bioperl Message-ID: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> Dear support team, I installed bioperl 1.5.2_100 on a ferdora machine to be able to use gbrowse. The installation seems to work (except of the test failures) but the gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but of course it requires 1.52. Is there a chance to find out what went wrong? thanks a lot, Stephan From bix at sendu.me.uk Tue Dec 19 10:12:39 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 15:12:39 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> Message-ID: <45880167.9010605@sendu.me.uk> Stephan Roessner wrote: > Dear support team, > > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use > gbrowse. > The installation seems to work (except of the test failures) but the > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but > of course it requires 1.52. > > Is there a chance to find out what went wrong? Nothing went wrong with the Bioperl installation (well, expect there shouldn't have been any test failures - can you post those please?). gbrowse simply defined its Bioperl requirement incorrectly. If you tell me exactly where you downloaded gbrowse from and how you went about installing it, and provide the exact, complete error message you got from it, I might be able help the authors fix the problem. Or I'm pretty sure they can figure it our for themselves :) From cjfields at uiuc.edu Tue Dec 19 11:05:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 10:05:01 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] problems installing bioperl In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> Message-ID: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> On Dec 19, 2006, at 9:31 AM, Scott Cain wrote: > I really don't think the BioPerl version detection is wrong. I > actually > don't check Bio::Root::Version::VERSION in Makefile.PL, I check > Bio::Graphics::Panel->api_version. When it doesn't find the correct > api_version, it gives a warning the BioPerl 1.5.2 is not installed. I > have seen this happen when more than one BioPerl instance is installed > and `perl Makefile.PL` finds the wrong one first. My suggestion is to > try reinstalling BioPerl and providing the --uninst 1 argument to > remove > older versions of BioPerl: > > sudo ./Build install --uninst 1 > > Scott Could having two Bioperl instances explain the test failures? I'm not sure (maybe Sendu can answer this), but I would assume Module::Build uses the current working directory for test runs. chris From bix at sendu.me.uk Tue Dec 19 12:02:34 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 17:02:34 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] problems installing bioperl In-Reply-To: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> Message-ID: <45881B2A.8060907@sendu.me.uk> Chris Fields wrote: > > On Dec 19, 2006, at 9:31 AM, Scott Cain wrote: > >> I really don't think the BioPerl version detection is wrong. I actually >> don't check Bio::Root::Version::VERSION in Makefile.PL, I check >> Bio::Graphics::Panel->api_version. When it doesn't find the correct >> api_version, it gives a warning the BioPerl 1.5.2 is not installed. I >> have seen this happen when more than one BioPerl instance is installed >> and `perl Makefile.PL` finds the wrong one first. My suggestion is to >> try reinstalling BioPerl and providing the --uninst 1 argument to remove >> older versions of BioPerl: >> >> sudo ./Build install --uninst 1 >> >> Scott > > Could having two Bioperl instances explain the test failures? I'm not > sure (maybe Sendu can answer this), but I would assume Module::Build > uses the current working directory for test runs. It does, so that shouldn't be an issue for the test failures. From ferraria at gmail.com Tue Dec 19 11:40:05 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Tue, 19 Dec 2006 17:40:05 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy Message-ID: Hi all, I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with the cpan shell. I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI 'gene' database (first step of my pipeline). But the installation of this package doesn't seem to be correct : The simple example given on the documentation doesn't work. (this one : http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) Here is the error message I got : "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." In the UserAgent package, line 779 is in the private "_need_proxy" subroutine and corresponds to the code : ...if (@{ $self->{'no_proxy'} }) ... If I comment this line in the UserAgent package and the corresponding "}", the example works. But obviously, I prefer to solve the problem in a regular way :) Indeed, my computer accesses the internet via a http proxy and I didn't tell this to BioPerl at any moment. As I read on the BioPerl Wiki site, I tried to configure an $http_proxy environment variable but it still doesn't work. One last maybe important information is that I saw during the installation that the tests 't/EUtilities' were skipped because of an unrevealed reason. So finally I got two questions : 1. Is there somebody who can figure out what is my problem ? 2. At the moment, is the Bio::DB::EUtilities package really efficient or using directly the NCBI eutilities with the LWP::Simple package could be an good alternative ? Many thanks in advance, Best Regards, Anthony Ferrari From bix at sendu.me.uk Tue Dec 19 12:06:03 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 17:06:03 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> Message-ID: <45881BFB.7020008@sendu.me.uk> Scott Cain wrote: > I really don't think the BioPerl version detection is wrong. I actually > don't check Bio::Root::Version::VERSION in Makefile.PL, I check > Bio::Graphics::Panel->api_version. When it doesn't find the correct > api_version, it gives a warning the BioPerl 1.5.2 is not installed. I > have seen this happen when more than one BioPerl instance is installed > and `perl Makefile.PL` finds the wrong one first. Yes, I saw that, which is why I thought I must be looking at something different to what the OP had installed. > My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove > older versions of BioPerl: > > sudo ./Build install --uninst 1 My confusion is that he has definitely installed 1.5.2 and this version is being polled for its version number (by something!) and returning the correct '1.0050021', whilst the something expects '1.52'. Anyway, this can only be resolved if Stephan provides the real error message and its context. From cjfields at uiuc.edu Tue Dec 19 12:27:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 11:27:24 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: Message-ID: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote: > Hi all, > > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake > machine with > the cpan shell. > I want to use the Bio::DB::EUtilities to retrieve data (id's) from > NCBI > 'gene' database (first step of my pipeline). > > But the installation of this package doesn't seem to be correct : > The simple example given on the documentation doesn't work. (this > one : > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) > > Here is the error message I got : > "Can't use an undefined value as an ARRAY reference at > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > In the UserAgent package, line 779 is in the private "_need_proxy" > subroutine and corresponds to the code : ...if (@{ $self-> > {'no_proxy'} }) > ... > > If I comment this line in the UserAgent package and the > corresponding "}", > the example works. But obviously, I prefer to solve the problem in > a regular > way :) > > Indeed, my computer accesses the internet via a http proxy and I > didn't tell > this to BioPerl at any moment. > As I read on the BioPerl Wiki site, I tried to configure an > $http_proxy > environment variable but it still doesn't work. > > One last maybe important information is that I saw during the > installation > that the tests 't/EUtilities' were skipped because of an unrevealed > reason. > > > So finally I got two questions : > 1. Is there somebody who can figure out what is my problem ? > 2. At the moment, is the Bio::DB::EUtilities package really > efficient or > using directly the NCBI eutilities with the LWP::Simple package > could be an > good alternative ? > > Many thanks in advance, > Best Regards, > Anthony Ferrari First things first: at the moment the BioPerl EUtilities interface is very experimental (as specifically outlined in the POD), so I can't really recommend it for production use until the API is cleaned up. However, I do appreciate any feedback or comments re:EUtilities (the reason it's out there in the 1.5.2 release). You might check out this bug report, which relates directly to your issue: http://bugzilla.open-bio.org/show_bug.cgi?id=2109 After I worked out the proxy issue Torsten got it working. Let me know if this doesn't help or fix the problem. chris From cain at cshl.edu Tue Dec 19 10:31:50 2006 From: cain at cshl.edu (Scott Cain) Date: Tue, 19 Dec 2006 10:31:50 -0500 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <45880167.9010605@sendu.me.uk> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> Message-ID: <1166542310.6981.119.camel@localhost.localdomain> I really don't think the BioPerl version detection is wrong. I actually don't check Bio::Root::Version::VERSION in Makefile.PL, I check Bio::Graphics::Panel->api_version. When it doesn't find the correct api_version, it gives a warning the BioPerl 1.5.2 is not installed. I have seen this happen when more than one BioPerl instance is installed and `perl Makefile.PL` finds the wrong one first. My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove older versions of BioPerl: sudo ./Build install --uninst 1 Scott On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote: > Stephan Roessner wrote: > > Dear support team, > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use > > gbrowse. > > The installation seems to work (except of the test failures) but the > > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but > > of course it requires 1.52. > > > > Is there a chance to find out what went wrong? > > Nothing went wrong with the Bioperl installation (well, expect there > shouldn't have been any test failures - can you post those please?). > gbrowse simply defined its Bioperl requirement incorrectly. If you tell > me exactly where you downloaded gbrowse from and how you went about > installing it, and provide the exact, complete error message you got > from it, I might be able help the authors fix the problem. > > Or I'm pretty sure they can figure it our for themselves :) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From ferraria at gmail.com Tue Dec 19 12:06:31 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Tue, 19 Dec 2006 18:06:31 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: Message-ID: Hi all, I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with the cpan shell. I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI 'gene' database (first step of my pipeline). But the installation of this package doesn't seem to be correct : The simple example given on the documentation doesn't work. (this one : http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) Here is the error message I got : "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." In the UserAgent package, line 779 is in the private "_need_proxy" subroutine and corresponds to the code : ...if (@{ $self->{'no_proxy'} }) ... If I comment this line in the UserAgent package and the corresponding "}", the example works. But obviously, I prefer to solve the problem in a regular way :) Indeed, my computer accesses the internet via a http proxy and I didn't tell this to BioPerl at any moment. As I read on the BioPerl Wiki site, I tried to configure an $http_proxy environment variable but it still doesn't work. One last maybe important information is that I saw during the installation that the tests 't/EUtilities' were skipped because of an unrevealed reason. So finally I got two questions : 1. Is there somebody who can figure out what is my problem ? 2. At the moment, is the Bio::DB::EUtilities package really efficient or using directly the NCBI eutilities with the LWP::Simple package could be an good alternative ? Many thanks in advance, Best Regards, Anthony Ferrari From stewarta at nmrc.navy.mil Tue Dec 19 13:49:57 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Tue, 19 Dec 2006 13:49:57 -0500 Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3 Message-ID: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> I see that Bio::Tools::Glimmer documentation clearly states that this module is intended for use with GlimmerM (eukaryotic version) only. I am wondering if anyone can recall any talk about adopting Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)? I've searched the list history with little luck other than someone else asking a similar question. If not, does anyone have any thoughts on how difficult it might be to implement support for glimmer2/3 result parsing? Perhaps just a matter of editing the _parse_predictions method? -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From rvosa at sfu.ca Tue Dec 19 13:53:47 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 19 Dec 2006 10:53:47 -0800 Subject: [Bioperl-l] problems installing bioperl Message-ID: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From cjfields at uiuc.edu Tue Dec 19 14:31:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 13:31:17 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3 In-Reply-To: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> References: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> Message-ID: <71E04575-DFD2-4F5A-B268-493D3246CBFA@uiuc.edu> On Dec 19, 2006, at 12:49 PM, Andrew Stewart wrote: > I see that Bio::Tools::Glimmer documentation clearly states that this > module is intended for use with GlimmerM (eukaryotic version) only. > I am wondering if anyone can recall any talk about adopting > Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)? > I've searched the list history with little luck other than someone > else asking a similar question. There is a thread here: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12546/ focus=12546 > If not, does anyone have any thoughts on how difficult it might be to > implement support for glimmer2/3 result parsing? Perhaps just a > matter of editing the _parse_predictions method? It depends on how different the various Glimmer formats are; I'll have to look at the ones Torsten added in CVS. You could always try modifying Bio::Tools::Glimmer to parse Glimmer2/3 and GlimmerM reports, but based on the mail list thread above it may not be so straightforward. chris From MEC at stowers-institute.org Tue Dec 19 14:57:48 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 19 Dec 2006 13:57:48 -0600 Subject: [Bioperl-l] bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation Message-ID: Lincoln and fellow Bio::DB::SeqFeature travelers, I find that using bp_seqfeature_load.PLS to load subfeatures of genes already loaded using bp_seqfeature_load.PLS fails with ------------- EXCEPTION ------------- MSG: FBgn0017545 doesn't have a primary id STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 STACK toplevel /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo ad.PLS:76 Where FBgn0017545 is the ID of a gene previously loaded. I am unsure how to remedy my situation and welcome any advise on correct or improved approach to my problem. Here's some detail if it helps. I am developing a pipeline to design a microarray probes capable of distinguishing among splice variants in drosophila (using latest Flybase dmel_r5.1 annotation). So I 1) load a filtered selection of Flybase annotation using bp_seqfeature_load. (for testing purposes, I am using a single gene's worth of annotation, FBgn0017545.gff, attached). This is done as follows: > bp_seqfeature_load.PLS --create FBgn0017545.gff 2) analyze all the genes in the database, and create GFF3 output each feature of which has a 'Parent' that is a previously loaded gene (i.e. FBgn0017545). (These features represent the unique introns, splice sites, and exonic design targets. Output of this analysis, FBgn0017545_matd.gff, is also attached) 3) load these analysis results into the same database, as follows: > bp_seqfeature_load.PLS FBgn0017545_matd.gff It is at this point that I get the above error. However, I don't get any error and the data loads fine if I load the two files together, as follows: > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff FBgn0017545_matd.gff) So, I suspect that either I am misunderstanding when/how to use bp_seqfeature_load.PLS or else this use case has not yet arisen and must be provided for somehow. I am running against bioperl-live Thanks for your thoughts and assistance, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri From Kevin.M.Brown at asu.edu Tue Dec 19 16:46:19 2006 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Tue, 19 Dec 2006 14:46:19 -0700 Subject: [Bioperl-l] Bio::SimpleAlign Message-ID: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> I'm working on a script that plays around with alignments of sequences and one of the things I noticed is that the code for the match method does not seem to actually use the start/end information when creating the match between objects in the alignment. Maybe I'm misunderstanding what the alignment is supposed to hold in terms of sequence. The alignment objects I build up are based on the sequence of a gene and the sequences of the primers that amplify that gene. $alignments{$gene->id()}->add_seq( new Bio::LocatableSeq( -seq => $seq[0]->seq(), -id => $seq[0]->id(), -start => $start, -end => $start + $seq[0]->length() - 1, -strand => 1 ) ); $alignments{$gene->id()}->add_seq( new Bio::LocatableSeq( -seq => $seq[1]->seq(), -id => $seq[1]->id(), -start => $stop, -end => $stop + $seq[1]->length() - 1, -strand => -1 ) ); So, you can see I input a start and stop point for the primer, but when I use the match function all it does is match the first character of the gene sequence to the first char of the primer sequences, then the second gene char to the second in each primer, etc... This doesn't seem to fit with the documentation and seems odd that there would be holders for the start/stop points and not use them when doing things like matching of sequences in an alignment. From bix at sendu.me.uk Tue Dec 19 17:01:22 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 22:01:22 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> References: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> Message-ID: <45886132.7050505@sendu.me.uk> Rutger Vos wrote: > Aren't 1.5.2_100 and 1.0050021 supposed to be equivalent in in this weird > version-string-translation way that makes 5.5 and 5.005 equivalent also? Yes, 1.5.2_100 and 1.0050021 are equivalent. The equivalent of 5.5 is 5.500 however. From lstein at cshl.edu Tue Dec 19 16:58:24 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 19 Dec 2006 16:58:24 -0500 Subject: [Bioperl-l] bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation In-Reply-To: References: Message-ID: <6dce9a0b0612191358t4764bfe0g601cd22d09132e55@mail.gmail.com> Hi Malcom, Your second guess was right. The use case of augmenting an existing gene with additional splice forms isn't provided for. You can get the functionality by making direct calls to Bio::DB::SeqFeature::Store methods: my @genes = $db->get_features_by_name('FBgn0017545'); @genes == 1 or die "Didn't get exactly one gene"; my $parent = $genes[0]; my $parent = $genes[0]; my $chr = $parent->seq_id; my $start = $parent->start; my $end = $parent->end; my $strand = $parent->strand; my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', -source => 'added', -seq_id => '4r', -strand => $strand, -start => $start+10, -end => $end, ); $parent->add_SeqFeature($new_splice_form); for my $pos ([$start+10,$start+100],[$start+200,$end]) { my ($e_start,$e_end) = @$pos; my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', -store => $db, -seq_id => '4r', -strand => $strand, -start => $e_start, -end => $e_end); $new_splice_form->add_SeqFeature($exon); } I found a bug in updating the seqfeature database when I wrote this script, so you'll have to get the latest biperl live. I think you can use this to write a splice form updating script. In order to support the idea of adding new splice forms to an existing gene using the GFF3 format, I will have to either modify the loader, or write a separate script (probably better to do the latter). It shouldn't be hard if you'd like to give it a try. Lincoln On 12/19/06, Cook, Malcolm wrote: > > Lincoln and fellow Bio::DB::SeqFeature travelers, > > I find that using bp_seqfeature_load.PLS to load subfeatures of genes > already loaded using bp_seqfeature_load.PLS fails with > > ------------- EXCEPTION ------------- > MSG: FBgn0017545 doesn't have a primary id > STACK > Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 > STACK toplevel > /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo > ad.PLS:76 > > Where FBgn0017545 is the ID of a gene previously loaded. > > I am unsure how to remedy my situation and welcome any advise on correct > or improved approach to my problem. > > Here's some detail if it helps. I am developing a pipeline to design a > microarray probes capable of distinguishing among splice variants in > drosophila (using latest Flybase dmel_r5.1 annotation). So I > > 1) load a filtered selection of Flybase annotation using > bp_seqfeature_load. (for testing purposes, I am using a single gene's > worth of annotation, FBgn0017545.gff, attached). This is done as > follows: > > > bp_seqfeature_load.PLS --create FBgn0017545.gff > > 2) analyze all the genes in the database, and create GFF3 output each > feature of which has a 'Parent' that is a previously loaded gene (i.e. > FBgn0017545). (These features represent the unique introns, splice > sites, and exonic design targets. Output of this analysis, > FBgn0017545_matd.gff, is also attached) > > 3) load these analysis results into the same database, as follows: > > > bp_seqfeature_load.PLS FBgn0017545_matd.gff > > It is at this point that I get the above error. > > However, I don't get any error and the data loads fine if I load the two > files together, as follows: > > > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff > FBgn0017545_matd.gff) > > So, I suspect that either I am misunderstanding when/how to use > bp_seqfeature_load.PLS or else this use case has not yet arisen and must > be provided for somehow. > > I am running against bioperl-live > > Thanks for your thoughts and assistance, > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From rvosa at sfu.ca Tue Dec 19 23:23:20 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 19 Dec 2006 20:23:20 -0800 Subject: [Bioperl-l] suggestions for suitable 'taxon' object Message-ID: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From cjfields at uiuc.edu Wed Dec 20 01:16:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 00:16:47 -0600 Subject: [Bioperl-l] suggestions for suitable 'taxon' object In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> Message-ID: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote: > Hi all, > > I am looking for a bioperl object that can be abused to function as a > suitable 'taxon' object, where I mean 'taxon' as understood by the > NEXUS > file format (i.e. not strictly an entity from a taxonomy, but more > loosely > an OTU). > > The object would primarily function as a way to relate nodes in > trees to > sequences in an alignment (a foreign key that both nodes and > sequences refer > to), and secondarily as the keeper of the canonical name of the > OTU, such > that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node > named 'Homo > sapiens (constrained monophyly)' can still be understood to refer > to the > same thing - the OTU 'Homo sapiens sapiens' (for example). Alignment (SimpleAlign) objects contain Bio::LocatableSeq sequence objects; at the moment LocatableSeqs don't store their own annotation but they could easily be made or subclassed to be AnnotatableI (i.e. they can store annotation collections). I recently made SimpleAlign Annotatable; Jason has also made SimpleAlign implement FeatureHolderI, so alignments can store SeqFeatures as well; he may have his own designs here. There may be a wide variety of ways to go about this. I would probably do the following (bear in mind I'm a microbiologist, not a computer scientist). If one could add trees as annotation to the alignment (i.e. if trees could be Annotation objects and kept in the SimpleAlign's annotation collection), and each sequence in the alignment contained reference to a node object of the tree (i.e. if Bio::Taxon/Bio::Species objects could also be Annotation objects, but kept in a LocatableSeq annotation collection), both could refer to the same node object. This may not be exactly what you want, but maybe it's close? SimpleAlign->AnnoColln->Tree->OTU(Nodes) \----->LocSeqs-->AnnoColln-----/ I suppose this could also be done with Seqfeatures... > I was thinking that a (possibly expanded) Bio::Species might work > if there > was some sensible way of appending references to node and sequence > objects > to it (or otherwise associate them with each other), but I am > writing *to > solicit any and all suggestions*. I am looking for something > similar to > Bio::Phylo::Taxa::Taxon. > > Any and all comments and suggestions greatly appreciated! > > Best wishes, > > Rutger Vos Sendu would be the best one to speak about Bio::Taxon and Bio::Species and may have some ideas on the above. The current plan was to deprecate Bio::Species, but who knows? chris From heikki at sanbi.ac.za Wed Dec 20 05:25:08 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 20 Dec 2006 12:25:08 +0200 Subject: [Bioperl-l] Bio::SimpleAlign In-Reply-To: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> Message-ID: <200612201225.08862.heikki@sanbi.ac.za> Kevin, Sequences that are added to the alignment are supposed to be *aligned*. SimpleAlign does not do it for you. It seems to me that you are adding sequences like this: nnnnnnnnnnnnnnnnnnnn 1 - 20, "a short gene" nnnnnn 21 - 26 "a short primer after the gene" when you should be doing this nnnnnnnnnnnnnnnnnnnn 1 - 20, "a short gene" --------------------nnnnnn 21 - 26 "a short primer after the gene" Note that the default way of displaying names in SimpleAlign is "name/start-end". The name usually are expected to refer to the sequence from which this subsequence is derived from. The displayname does not change if you add gaps. Yours, -Heikki On Tuesday 19 December 2006 23:46, Kevin Brown wrote: > I'm working on a script that plays around with alignments of sequences > and one of the things I noticed is that the code for the match method > does not seem to actually use the start/end information when creating > the match between objects in the alignment. Maybe I'm misunderstanding > what the alignment is supposed to hold in terms of sequence. The > alignment objects I build up are based on the sequence of a gene and the > sequences of the primers that amplify that gene. > > $alignments{$gene->id()}->add_seq( > new Bio::LocatableSeq( > -seq => $seq[0]->seq(), > -id => $seq[0]->id(), > -start => $start, > -end => $start + $seq[0]->length() - 1, > -strand => 1 > ) > ); If your sequence does not contain gaps and the numbering starts from one, you can let the object handle start/stop: my $a = new Bio::LocatableSeq( -seq => 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', -id => 'A00001', -strand => 1 } > $alignments{$gene->id()}->add_seq( > new Bio::LocatableSeq( > -seq => $seq[1]->seq(), > -id => $seq[1]->id(), > -start => $stop, > -end => $stop + $seq[1]->length() - 1, > -strand => -1 > ) > ); > > So, you can see I input a start and stop point for the primer, but when > I use the match function all it does is match the first character of the > gene sequence to the first char of the primer sequences, then the second > gene char to the second in each primer, etc... This doesn't seem to fit > with the documentation and seems odd that there would be holders for the > start/stop points and not use them when doing things like matching of > sequences in an alignment. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From ferraria at gmail.com Wed Dec 20 06:04:16 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 12:04:16 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> Message-ID: On 19/12/06, Chris Fields wrote: > > > On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote: > > > Hi all, > > > > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake > > machine with > > the cpan shell. > > I want to use the Bio::DB::EUtilities to retrieve data (id's) from > > NCBI > > 'gene' database (first step of my pipeline). > > > > But the installation of this package doesn't seem to be correct : > > The simple example given on the documentation doesn't work. (this > > one : > > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) > > > > Here is the error message I got : > > "Can't use an undefined value as an ARRAY reference at > > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > In the UserAgent package, line 779 is in the private "_need_proxy" > > subroutine and corresponds to the code : ...if (@{ $self-> > > {'no_proxy'} }) > > ... > > > > If I comment this line in the UserAgent package and the > > corresponding "}", > > the example works. But obviously, I prefer to solve the problem in > > a regular > > way :) > > > > Indeed, my computer accesses the internet via a http proxy and I > > didn't tell > > this to BioPerl at any moment. > > As I read on the BioPerl Wiki site, I tried to configure an > > $http_proxy > > environment variable but it still doesn't work. > > > > One last maybe important information is that I saw during the > > installation > > that the tests 't/EUtilities' were skipped because of an unrevealed > > reason. > > > > > > So finally I got two questions : > > 1. Is there somebody who can figure out what is my problem ? > > 2. At the moment, is the Bio::DB::EUtilities package really > > efficient or > > using directly the NCBI eutilities with the LWP::Simple package > > could be an > > good alternative ? > > > > Many thanks in advance, > > Best Regards, > > Anthony Ferrari > > First things first: at the moment the BioPerl EUtilities interface is > very experimental (as specifically outlined in the POD), so I can't > really recommend it for production use until the API is cleaned up. > However, I do appreciate any feedback or comments re:EUtilities (the > reason it's out there in the 1.5.2 release). > > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > I carefully read this bug but that doesn't help because this has already been modified in the now given GenericWebDBI.pm So my problem does not come from a deep recursion loop. As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/EUtilities.t " to see what's really happening. And actually, all tests are skipped because of the same message error -> "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." *** I tried the same command with the modified LWP::UserAgent package (which means I comment the line 779 and the corresponding '}') and all 453 tests passed. But not always. I made the tests several times and it often failed. And always on a test called "eXXX->cookie->cookie() query key" (ending with query key). In those cases, I got back a html message indicating that the error was thrown by the internal sever of NCBI. So I guess that sometimes it is just NCBI server fault (internal problem), and BioPerl is not implied.. But once more, I comment a line from a basic package so it is a bit hazardous. *** tony - a little bit lost. From smane at vbi.vt.edu Tue Dec 19 14:46:56 2006 From: smane at vbi.vt.edu (Shrinivasrao P. Mane) Date: Tue, 19 Dec 2006 14:46:56 -0500 Subject: [Bioperl-l] Using Muscle parameter within bioperl Message-ID: Hi, I need to run muscle using bioperl. This is how I do it in command line. muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet I used the following in perl script my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); The program runs and produces the result file but it doesn't create a log file nor does it stop sending output to STDOUT (-quiet). Could anybody help me with this? Thanks Mane From cjfields at uiuc.edu Wed Dec 20 09:09:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 08:09:56 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> Message-ID: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > > > I carefully read this bug but that doesn't help because this has > already been modified in the now given GenericWebDBI.pm > So my problem does not come from a deep recursion loop. > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > EUtilities.t " to see what's really happening. > And actually, all tests are skipped because of the same message error > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > *** > I tried the same command with the modified LWP::UserAgent package > (which means I comment the line 779 and the corresponding '}') and > all 453 tests passed. > But not always. I made the tests several times and it often > failed. And always on a test called "eXXX->cookie->cookie() query > key" (ending with query key). In those cases, I got back a html > message indicating that the error was thrown by the internal sever > of NCBI. So I guess that sometimes it is just NCBI server fault > (internal problem), and BioPerl is not implied.. > But once more, I comment a line from a basic package so it is a bit > hazardous. > *** > > tony - a little bit lost. I'm cc'ing Torsten as he has a bit more experience with proxies. EUtilities is set up to check for an env. proxy and also take a set proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy to say this was a bug in LWP, but I think the problem is that something is undefined (i.e. an env. variable), or username/password. From the bug report, Torsten set his proxy variables using the following: -------------------------------------- "Note: I am behind an _authenticating_ proxy. My $http_proxy and $HTTP_PROXY are both set to http://USER:PASS at proxy.monash.edu.au:80/" -------------------------------------- Note the lowercase for $http_proxy, which can make a difference. After the recursion fix, I'm assuming he made no changes to the env. settings, and according to the bug everything was fine (is that correct Tortsen?). Also LWP::UserAgent has this: -------------------------------------- "Load proxy settings from *_proxy environment variables. You might specify proxies like this (sh-syntax): gopher_proxy=http://proxy.my.place/ wais_proxy=http://proxy.my.place/ no_proxy="localhost,my.domain" export gopher_proxy wais_proxy no_proxy csh or tcsh users should use the setenv command to define these environment variables. On systems with case insensitive environment variables there exists a name clash between the CGI environment variables and the HTTP_PROXY environment variable normally picked up by env_proxy(). Because of this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY environment variable can be used instead." -------------------------------------- chris From bix at sendu.me.uk Wed Dec 20 09:08:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 14:08:16 +0000 Subject: [Bioperl-l] Using Muscle parameter within bioperl In-Reply-To: References: Message-ID: <458943D0.10400@sendu.me.uk> Shrinivasrao P. Mane wrote: > Hi, > I need to run muscle using bioperl. This is how I do it in command line. > > muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet > > I used the following in perl script > > my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => > 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); > > The program runs and produces the result file but it doesn't create a > log file nor does it stop sending output to STDOUT (-quiet). > Could anybody help me with this? The Muscle arguments don't take dashed args. To make switches active you need to set them to some true value. So (-verbose => 1, quiet => 1, log => 'inv.log'). Verbose may not do what you want since it is both a Bioperl option and a Muscle option; if you want the latter try using verbose => 1. From bix at sendu.me.uk Wed Dec 20 09:51:33 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 14:51:33 +0000 Subject: [Bioperl-l] suggestions for suitable 'taxon' object In-Reply-To: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> Message-ID: <45894DF5.1060503@sendu.me.uk> Chris Fields wrote: > On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote: > >> Hi all, >> >> I am looking for a bioperl object that can be abused to function as >> a suitable 'taxon' object, where I mean 'taxon' as understood by >> the NEXUS file format (i.e. not strictly an entity from a taxonomy, >> but more loosely an OTU). >> >> The object would primarily function as a way to relate nodes in >> trees to sequences in an alignment (a foreign key that both nodes >> and sequences refer to), and secondarily as the keeper of the >> canonical name of the OTU, such that a sequence named >> 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo sapiens >> (constrained monophyly)' can still be understood to refer to the >> same thing - the OTU 'Homo sapiens sapiens' (for example). I haven't had time to give your suggestions consideration, but I can say that I'm having to do the same thing for a bioperl-run module and my work-around is simply to set a custom name on my Bio::Taxon objects. To explain, I have the benefit that my tree is made up of Bio::Taxon objects, so I call $taxon->name('seq_id', $seq->id). Then when I want to know which of my sequences corresponds to a particular taxon, I work out which of them has the id given by shift @{$taxon->name('seq_id')}. Hardly ideal, but it works for now. >> I was thinking that a (possibly expanded) Bio::Species might work >> if there was some sensible way of appending references to node and >> sequence objects to it (or otherwise associate them with each >> other), but I am writing *to solicit any and all suggestions*. I am >> looking for something similar to Bio::Phylo::Taxa::Taxon. > > Sendu would be the best one to speak about Bio::Taxon and > Bio::Species and may have some ideas on the above. The current plan > was to deprecate Bio::Species, but who knows? Given that we do plan to deprecate Bio::Species, I'd resist the temptation to expand on it. Use Bio::Taxon as a base if it has stuff you need, or base straight from Bio::Tree::Node if not. From ferraria at gmail.com Wed Dec 20 10:40:34 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 16:40:34 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> Message-ID: Defining a "no_proxy" environment variable in my '.bashrc' file solved my problem. I set it to "localhost". It indeed corresponds to the line... [ ...if (@{ $self->{'no_proxy'} }) ... ] (I guess!) I really don't know why we are compelled to do this, but let's say that's the way it is. It works now ! Thanks a lot. Tony On 20/12/06, Chris Fields wrote: > > > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > > > You might check out this bug report, which relates directly to your > > issue: > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > > > After I worked out the proxy issue Torsten got it working. Let me > > know if this doesn't help or fix the problem. > > > > chris > > > > > > I carefully read this bug but that doesn't help because this has > > already been modified in the now given GenericWebDBI.pm > > So my problem does not come from a deep recursion loop. > > > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > > EUtilities.t " to see what's really happening. > > And actually, all tests are skipped because of the same message error > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > *** > > I tried the same command with the modified LWP::UserAgent package > > (which means I comment the line 779 and the corresponding '}') and > > all 453 tests passed. > > But not always. I made the tests several times and it often > > failed. And always on a test called "eXXX->cookie->cookie() query > > key" (ending with query key). In those cases, I got back a html > > message indicating that the error was thrown by the internal sever > > of NCBI. So I guess that sometimes it is just NCBI server fault > > (internal problem), and BioPerl is not implied.. > > But once more, I comment a line from a basic package so it is a bit > > hazardous. > > *** > > > > tony - a little bit lost. > > I'm cc'ing Torsten as he has a bit more experience with proxies. > > EUtilities is set up to check for an env. proxy and also take a set > proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy > to say this was a bug in LWP, but I think the problem is that > something is undefined (i.e. an env. variable), or username/password. > > From the bug report, Torsten set his proxy variables using the > following: > > -------------------------------------- > "Note: I am behind an _authenticating_ proxy. > My $http_proxy and $HTTP_PROXY are both set to > http://USER:PASS at proxy.monash.edu.au:80/" > -------------------------------------- > > Note the lowercase for $http_proxy, which can make a difference. > After the recursion fix, I'm assuming he made no changes to the env. > settings, and according to the bug everything was fine (is that > correct Tortsen?). > > Also LWP::UserAgent has this: > > -------------------------------------- > "Load proxy settings from *_proxy environment variables. You might > specify proxies like this (sh-syntax): > > gopher_proxy=http://proxy.my.place/ > wais_proxy=http://proxy.my.place/ > no_proxy="localhost,my.domain" > export gopher_proxy wais_proxy no_proxy > > csh or tcsh users should use the setenv command to define these > environment variables. > > On systems with case insensitive environment variables there exists a > name clash between the CGI environment variables and the HTTP_PROXY > environment variable normally picked up by env_proxy(). Because of > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY > environment variable can be used instead." > -------------------------------------- > > chris > From cjfields at uiuc.edu Wed Dec 20 11:10:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 10:10:48 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: Message-ID: <007901c72451$6ad540a0$15327e82@pyrimidine> Just to clarify: does it work it you don't have any proxy env. settings? chris _____ From: Anthony Ferrari [mailto:ferraria at gmail.com] Sent: Wednesday, December 20, 2006 9:41 AM To: Chris Fields Cc: bioperl-l List; Torsten Seemann Subject: Re: [Bioperl-l] Problem with : EUtilities - Proxy Defining a "no_proxy" environment variable in my '.bashrc' file solved my problem. I set it to "localhost". It indeed corresponds to the line... [ ...if (@{ $self->{'no_proxy'} }) ... ] (I guess!) I really don't know why we are compelled to do this, but let's say that's the way it is. It works now ! Thanks a lot. Tony On 20/12/06, Chris Fields wrote: On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > > > I carefully read this bug but that doesn't help because this has > already been modified in the now given GenericWebDBI.pm > So my problem does not come from a deep recursion loop. > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > EUtilities.t " to see what's really happening. > And actually, all tests are skipped because of the same message error > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > *** > I tried the same command with the modified LWP::UserAgent package > (which means I comment the line 779 and the corresponding '}') and > all 453 tests passed. > But not always. I made the tests several times and it often > failed. And always on a test called "eXXX->cookie->cookie() query > key" (ending with query key). In those cases, I got back a html > message indicating that the error was thrown by the internal sever > of NCBI. So I guess that sometimes it is just NCBI server fault > (internal problem), and BioPerl is not implied.. > But once more, I comment a line from a basic package so it is a bit > hazardous. > *** > > tony - a little bit lost. I'm cc'ing Torsten as he has a bit more experience with proxies. EUtilities is set up to check for an env. proxy and also take a set proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy to say this was a bug in LWP, but I think the problem is that something is undefined ( i.e. an env. variable), or username/password. >From the bug report, Torsten set his proxy variables using the following: -------------------------------------- "Note: I am behind an _authenticating_ proxy. My $http_proxy and $HTTP_PROXY are both set to http://USER:PASS at proxy.monash.edu.au:80/" -------------------------------------- Note the lowercase for $http_proxy, which can make a difference. After the recursion fix, I'm assuming he made no changes to the env. settings, and according to the bug everything was fine (is that correct Tortsen?). Also LWP::UserAgent has this: -------------------------------------- "Load proxy settings from *_proxy environment variables. You might specify proxies like this (sh-syntax): gopher_proxy=http://proxy.my.place/ wais_proxy= http://proxy.my.place/ no_proxy="localhost,my.domain" export gopher_proxy wais_proxy no_proxy csh or tcsh users should use the setenv command to define these environment variables. On systems with case insensitive environment variables there exists a name clash between the CGI environment variables and the HTTP_PROXY environment variable normally picked up by env_proxy(). Because of this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY environment variable can be used instead." -------------------------------------- chris From ferraria at gmail.com Wed Dec 20 11:59:49 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 17:59:49 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <007901c72451$6ad540a0$15327e82@pyrimidine> References: <007901c72451$6ad540a0$15327e82@pyrimidine> Message-ID: First, I got a $http_proxy env. variable automatically defined by the BioPerl installation (I don't define and export it in my .bash_profile). So when I'm logging in, $http_proxy=http://ip_adress:port/ Next step : two solutions : 1) defining an $no_proxy env.variable in my .bash_profile. It can be set to 'whatever'. 2) If I do not define '$no_proxy'; to make it work, I must call the no_proxy() method on each Bio::DB::EUtilities object I create before I can call the get_response() method on it. (The bug is in the 'get_response' call) And finally without 1) or 2) it doesn't work. Tony On 20/12/06, Chris Fields wrote: > > Just to clarify: does it work it you don't have any proxy env. settings? > One thing I didn't point out previously is that Bio::DB::GenericWebDBI > inherits LWP::UserAgent. You should be able to use $eutil->no_proxy() > instead of setting it in your env. > chris > > ------------------------------ > *From:* Anthony Ferrari [mailto:ferraria at gmail.com] > *Sent:* Wednesday, December 20, 2006 9:41 AM > *To:* Chris Fields > *Cc:* bioperl-l List; Torsten Seemann > *Subject:* Re: [Bioperl-l] Problem with : EUtilities - Proxy > > Defining a "no_proxy" environment variable in my '.bashrc' file solved my > problem. I set it to "localhost". > > It indeed corresponds to the line... [ ...if (@{ > $self->{'no_proxy'} }) ... ] (I guess!) > > > I really don't know why we are compelled to do this, but let's say that's > the way it is. > > It works now ! > > Thanks a lot. > > Tony > > > > > On 20/12/06, Chris Fields wrote: > > > > > > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > > > > > You might check out this bug report, which relates directly to your > > > issue: > > > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > > > > > After I worked out the proxy issue Torsten got it working. Let me > > > know if this doesn't help or fix the problem. > > > > > > chris > > > > > > > > > I carefully read this bug but that doesn't help because this has > > > already been modified in the now given GenericWebDBI.pm > > > So my problem does not come from a deep recursion loop. > > > > > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > > > EUtilities.t " to see what's really happening. > > > And actually, all tests are skipped because of the same message error > > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > > > *** > > > I tried the same command with the modified LWP::UserAgent package > > > (which means I comment the line 779 and the corresponding '}') and > > > all 453 tests passed. > > > But not always. I made the tests several times and it often > > > failed. And always on a test called "eXXX->cookie->cookie() query > > > key" (ending with query key). In those cases, I got back a html > > > message indicating that the error was thrown by the internal sever > > > of NCBI. So I guess that sometimes it is just NCBI server fault > > > (internal problem), and BioPerl is not implied.. > > > But once more, I comment a line from a basic package so it is a bit > > > hazardous. > > > *** > > > > > > tony - a little bit lost. > > > > I'm cc'ing Torsten as he has a bit more experience with proxies. > > > > EUtilities is set up to check for an env. proxy and also take a set > > proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy > > to say this was a bug in LWP, but I think the problem is that > > something is undefined ( i.e. an env. variable), or username/password. > > > > From the bug report, Torsten set his proxy variables using the > > following: > > > > -------------------------------------- > > "Note: I am behind an _authenticating_ proxy. > > My $http_proxy and $HTTP_PROXY are both set to > > http://USER:PASS at proxy.monash.edu.au:80/" > > -------------------------------------- > > > > Note the lowercase for $http_proxy, which can make a difference. > > After the recursion fix, I'm assuming he made no changes to the env. > > settings, and according to the bug everything was fine (is that > > correct Tortsen?). > > > > Also LWP::UserAgent has this: > > > > -------------------------------------- > > "Load proxy settings from *_proxy environment variables. You might > > specify proxies like this (sh-syntax): > > > > gopher_proxy=http://proxy.my.place/ > > wais_proxy= http://proxy.my.place/ > > no_proxy="localhost,my.domain" > > export gopher_proxy wais_proxy no_proxy > > > > csh or tcsh users should use the setenv command to define these > > environment variables. > > > > On systems with case insensitive environment variables there exists a > > name clash between the CGI environment variables and the HTTP_PROXY > > environment variable normally picked up by env_proxy(). Because of > > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY > > environment variable can be used instead." > > -------------------------------------- > > > > chris > > > > From cjfields at uiuc.edu Wed Dec 20 13:28:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 12:28:09 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: Message-ID: <000301c72464$9a12a070$15327e82@pyrimidine> > First, I got a $http_proxy env. variable automatically > defined by the BioPerl installation (I don't define and > export it in my .bash_profile). > So when I'm logging in, $http_proxy=http://ip_adress:port/ BioPerl can't permanently set any env. variables out of the box since that would mean modifying your local .bash_profile or the system profile. If you're a user on a system where you're not the sysadmin, then it's more likely the sysadmin has set up user accounts with an already-defined $http_proxy variable in the system .bash_profile (which is passed on to all users). The problem I can see (going by what you have above) is there is no username/password defined, only the address (IP:Port). I am assuming LWP is expecting some form of authentication when a proxy is env. defined w/o username/password included. If so, you'll have to supply those yourself, either by redefining $http_proxy to include it in your local .bash_profile, export $http_proxy='http://USER:PASS at proxy.monash.edu.au:80/' by using $agent->proxy() for including all proxy information, or by using $agent->authentication() so that a proxy can authorize any outgoing/incoming requests. The first may be preferrable if you are able to do so since you wouldn't have to authenticate every agent. Note that this would also explain why you had an LWP problem with an undefined array ref: the LWP agent is likely expecting some form of authentication, probably in the form [username, password], if a proxy env. variable is found. > Next step : two solutions : > 1) defining an $no_proxy env.variable in my .bash_profile. > It can be set to 'whatever'. > > 2) If I do not define '$no_proxy'; to make it work, I must call the > no_proxy() method on each Bio::DB::EUtilities object I create > before I can call the get_response() method on it. > > (The bug is in the 'get_response' call) If you mean when the request is calling proxy_authorization_basic(), that's not a bug. If we didn't authorize then it likely wouldn't work for properly set up proxies (Torsten's worked). Note that it's supposed to be passing a username/password from $self->authentication(). The fact that you can set $no_proxy to anything suggests there is no proxy in place. > And finally without 1) or 2) it doesn't work. > > Tony We can't guarantee that defining no_proxy will always work on your system, either. It's possible a proxy was added systemwide but a firewall hasn't been put in place yet; once it goes up and all requests need to be authorized, then you'll run into problems again. Conversely, maybe this was defined at some point systemwide in the .bash_profile but wasn't removed. The only one who would know is the sysadmin. If you aren't the sysadmin, you can contact them to find out about how to properly set up your proxy, or whether it is even necessary (maybe they neglected to remove the proxy definition from the system .bash_profile). Who knows? chris From bix at sendu.me.uk Wed Dec 20 16:03:03 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 21:03:03 +0000 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine> References: <000301c72464$9a12a070$15327e82@pyrimidine> Message-ID: <4589A507.60106@sendu.me.uk> Chris Fields wrote: >> First, I got a $http_proxy env. variable automatically >> defined by the BioPerl installation (I don't define and >> export it in my .bash_profile). >> So when I'm logging in, $http_proxy=http://ip_adress:port/ > > BioPerl can't permanently set any env. variables out of the box since True, and it doesn't try to set one temporarily either. To clarify some of the other points Chris made, the proxy variable certainly doesn't need username and password to be defined (from LWPs point of view), since not all proxies authenticate. Of course accesses won't work if authentication is actually required and these aren't set. There's no reason that no_proxy should have to be set. It is used to say what domains shouldn't be proxied. Either this is a real LWP bug, or somehow EUtilities or one of its bases is doing something wrong. It should be investigated... It would be very informative if Anthony could log in when he hasn't done anything to his environment variables (and so where the original problem manifests) and give us the results of: perl -e 'while (($key, $val) = each %ENV) { print "$key => $val\n" }' From avilella at gmail.com Wed Dec 20 09:07:17 2006 From: avilella at gmail.com (Albert Vilella) Date: Wed, 20 Dec 2006 14:07:17 +0000 Subject: [Bioperl-l] Using Muscle parameter within bioperl In-Reply-To: References: Message-ID: <358f4d650612200607m4324b8f1r91d2d917cd4951bd@mail.gmail.com> Try something like: my @params =('verbose'=>0, 'quiet'=>1, 'log'=>'/tmp/inv.log'); my $factory = Bio::Tools::Run::Alignment::Muscle->new(@params); it works for me with muscle 3.6. The log only gives me a start, commandstring and end. I dunno if that is what muscle is supposed to spit out. Albert. On 12/19/06, Shrinivasrao P. Mane wrote: > Hi, > I need to run muscle using bioperl. This is how I do it in command line. > > muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet > > I used the following in perl script > > my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => > 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); > > The program runs and produces the result file but it doesn't create a > log file nor does it stop sending output to STDOUT (-quiet). > Could anybody help me with this? > Thanks > Mane > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Dec 20 17:46:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 16:46:35 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <4589A507.60106@sendu.me.uk> Message-ID: <000c01c72488$b6a690b0$15327e82@pyrimidine> > Chris Fields wrote: > >> First, I got a $http_proxy env. variable automatically > defined by the > >> BioPerl installation (I don't define and export it in my > >> .bash_profile). > >> So when I'm logging in, > $http_proxy=http://ip_adress:port/ > > > > BioPerl can't permanently set any env. variables out of the > box since > > True, and it doesn't try to set one temporarily either. > > To clarify some of the other points Chris made, the proxy > variable certainly doesn't need username and password to be > defined (from LWPs point of view), since not all proxies > authenticate. Of course accesses won't work if authentication > is actually required and these aren't set. > > There's no reason that no_proxy should have to be set. It is > used to say what domains shouldn't be proxied. Either this is > a real LWP bug, or somehow EUtilities or one of its bases is > doing something wrong. It should be investigated... Actually, after some investigation I repeated the error and committed a fix. If I set (on WinXP) HTTP_PROXY to a dummy variable I get the same error: Can't use an undefined value as an ARRAY reference at C:/Perl/lib/LWP/UserAgent.pm line 787. It's EUtilities-specific as other WebAgents that have proxy settings do not have the same problem, though I haven't checked any WebAgent-based classes. I think this may also partly be an LWP bug as setting env_proxy to TRUE/FALSE doesn't seem to have an effect, but instantiating with it (env_proxy => 1) in the constructor fixes the problem. Anthony, I have committed a fix to CVS to GenericWebDBI and EUtilities. Could you try it out? -chris From cjfields at uiuc.edu Wed Dec 20 18:19:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 17:19:59 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine> Message-ID: <000001c7248d$5e7df450$15327e82@pyrimidine> > > First, I got a $http_proxy env. variable automatically > defined by the > > BioPerl installation (I don't define and export it in my > > .bash_profile). > > So when I'm logging in, > $http_proxy=http://ip_adress:port/ Anthony, Sorry about the prior long-winded response. I managed to reproduce the error about five minutes after I responded and managed to trace the problem back to GenericWebDBI. The issue seems to be with the LWP::UserAgent env_proxy method not setting correctly post-instantiation; setting to 0 or 1 doesn't seem to do anything. If I add it to the list of args for chained instantiation in the constructor: my $self = $class->SUPER::new(@args, env_proxy => 1); it suddenly works like a charm. Hard to know why it's being fussy... I'm going to try reproducing this on a few platforms and check to see if it has been reported as an LWP bug. I have also committed a fix to CVS if you want to test it out. Chris From jnewcomer at jhu.edu Wed Dec 20 20:56:10 2006 From: jnewcomer at jhu.edu (Joe Newcomer) Date: Wed, 20 Dec 2006 20:56:10 -0500 Subject: [Bioperl-l] a stupid question Message-ID: <002101c724a3$2ff80100$bd59dc80@aap.jhu.edu> Hello Paul Leo, I am with Johns Hopkins University Advanced Academic Programs. I am trying to contact a student named Paul Leo who has registered for Protein Bioinformatics. If this is you please email me. I would like to send you information about the spring course. Respectfully, Joe Newcomer (410) 516-5047 Online Education From anhthu.tieu at gsf.de Thu Dec 21 05:10:47 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 11:10:47 +0100 Subject: [Bioperl-l] imagemaps with heterogeneous_segments Message-ID: <458A5DA7.1010802@gsf.de> Hi, I use bioperl 1.5.2 and have been wondering whether it is possible to apply the image_and_map function with the glyph option "heterogenous_segments". Up to now I can successfully create an underlying imagemap for the entire track. However, what I want is to create an imagemap for each single segment on my track/glyph. Does anyone know who to realise this? Any help is appreciated. Thanks a lot. Anh Thu From anhthu.tieu at gsf.de Thu Dec 21 05:12:36 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 11:12:36 +0100 Subject: [Bioperl-l] imagemaps with heterogeneous_segments Message-ID: <458A5E14.8060409@gsf.de> Hi, I use bioperl 1.5.2 and have been wondering whether it is possible to apply the image_and_map function with the glyph option "heterogenous_segments". Up to now I can successfully create an underlying imagemap for the entire track. However, what I want is to create an imagemap for each single segment on my track/glyph. Does anyone know who to realise this? Any help is appreciated. Thanks a lot. Anh Thu From somil.sharma1 at gmail.com Thu Dec 21 01:22:24 2006 From: somil.sharma1 at gmail.com (Somil Sharma) Date: Thu, 21 Dec 2006 14:22:24 +0800 Subject: [Bioperl-l] problem Message-ID: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> hello *i run this program* *#!/use/bin/perl* *use Bio::DB::GenBank;* *$gb = new Bio::DB::GenBank; $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); print $seq1; * *and got this error on cmd line--* ---------- *EXCEPTION ------------- MSG: WebDBSeqI Request Error: 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error) Content-Type: text/plain Client-Date: Thu, 21 Dec 2006 06:28:33 GMT Client-Warning: Internal response* *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)* *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685 STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491 STACK Bio::DB::WebDBSeqI::get_Stream_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27 STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145 STACK toplevel C:\Perl\a2.pl:5* plz see if u can help me out. my ppm is also not able to install Bioperl so i did that also manually. waiting for ur reply From granjeau at tagc.univ-mrs.fr Thu Dec 21 06:14:25 2006 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Thu, 21 Dec 2006 12:14:25 +0100 Subject: [Bioperl-l] BioFetch: Adding databases Message-ID: <458A6C91.7090000@tagc.univ-mrs.fr> Hello! I needed to query the Unisave database at EBI. Up to date, the only way to access it is the dbfetch web service (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet defined in the BioFetch package (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote these few lines to make it work, but I don't think it fits a good programming practice. May be it makes sense to defined a method to add databases to FORMATMAP, in order to follow the dbfetch service evolutions. Cheers, --Samuel use Bio::DB::BioFetch; $Bio::DB::BioFetch::FORMATMAP{unisave} = { default => 'swiss', swissprot => 'swiss', fasta => 'fasta', namespace => 'unisave', }; my $bf = new Bio::DB::BioFetch(-db=>'unisave'); my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); print $seq->display_id(); print $seq->seq(); From cain at cshl.edu Thu Dec 21 08:56:21 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 21 Dec 2006 08:56:21 -0500 Subject: [Bioperl-l] problem In-Reply-To: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> References: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> Message-ID: <1166709381.3739.47.camel@localhost.localdomain> Hello, It looks to me like you have a networking problem that doesn't have anything to do with BioPerl. When I run your script, I get: Bio::Seq::RichSeq=HASH(0x97013e0) Fairly quickly, too. Can you get to http://eutils.ncbi.nlm.nih.gov/ in a browser without proxy settings? As an aside, you probably don't really want the HASH stuff above, so I modified your script to look like this, with warnings and strict to make future debugging easier: #!/use/bin/perl -w use strict; use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank; my $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); print $seq1->seq; Scott On Thu, 2006-12-21 at 14:22 +0800, Somil Sharma wrote: > hello > > *i run this program* > > *#!/use/bin/perl* > > *use Bio::DB::GenBank;* > > *$gb = new Bio::DB::GenBank; > $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); > print $seq1; > * > > *and got this error on cmd line--* > > ---------- *EXCEPTION ------------- > MSG: WebDBSeqI Request Error: > 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error) > Content-Type: text/plain > Client-Date: Thu, 21 Dec 2006 06:28:33 GMT > Client-Warning: Internal response* > > *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)* > > *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685 > STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491 > STACK Bio::DB::WebDBSeqI::get_Stream_by_id > C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27 > STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145 > STACK toplevel C:\Perl\a2.pl:5* > > plz see if u can help me out. > > my ppm is also not able to install Bioperl so i did that also manually. > > waiting for ur reply > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Thu Dec 21 09:28:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 21 Dec 2006 08:28:07 -0600 Subject: [Bioperl-l] BioFetch: Adding databases In-Reply-To: <458A6C91.7090000@tagc.univ-mrs.fr> References: <458A6C91.7090000@tagc.univ-mrs.fr> Message-ID: <193C6D1C-6374-4A86-9FBD-7FA994D5FDDF@uiuc.edu> I've added this to the BioFetch FORMATMAP as 'unisave' and committed to CVS. Thanks! chris On Dec 21, 2006, at 5:14 AM, Samuel GRANJEAUD - IR/IFR137 wrote: > Hello! > > I needed to query the Unisave database at EBI. Up to date, the only > way > to access it is the dbfetch web service > (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet > defined > in the BioFetch package > (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote > these few lines to make it work, but I don't think it fits a good > programming practice. May be it makes sense to defined a method to add > databases to FORMATMAP, in order to follow the dbfetch service > evolutions. > > Cheers, > --Samuel > > use Bio::DB::BioFetch; > $Bio::DB::BioFetch::FORMATMAP{unisave} = { > default => 'swiss', > swissprot => 'swiss', > fasta => 'fasta', > namespace => 'unisave', > }; > my $bf = new Bio::DB::BioFetch(-db=>'unisave'); > my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); > > print $seq->display_id(); > print $seq->seq(); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From anhthu.tieu at gsf.de Thu Dec 21 09:31:45 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 15:31:45 +0100 Subject: [Bioperl-l] multiple glyph elements in one track Message-ID: <458A9AD1.50907@gsf.de> Hello, I use bioperl 1.5.2. Does anyone know how I could create two seperate glyph elements on the same track with the Bio::Graphics::Panel module? My aim is to have two (e.g. two different) clickable imagemap elements on the same track. Until now I can merely create two glyph elements (transcript2 or generic options) per track with only one imagemap element (e.g. the same imagemap element is used for the entire track as the entire (=both elements) glyph's coordinates are returned to the image_and_map function as one set of coordinate). Thank you for your help. Best regards, Anh Thu From cain at cshl.edu Thu Dec 21 09:47:32 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 21 Dec 2006 09:47:32 -0500 Subject: [Bioperl-l] multiple glyph elements in one track In-Reply-To: <458A9AD1.50907@gsf.de> References: <458A9AD1.50907@gsf.de> Message-ID: <1166712453.3739.53.camel@localhost.localdomain> Hello Anh Thu, You can provide a callback for the glyph argument that returns different glyphs depending on what you want to do (ie, how you've coded your callback). This example is from the perldoc for Bio::Graphics::Panel: $panel->add_track(\@exons, -glyph => sub { my $feature = shift; $feature->source_tag eq ?curated? ? ?ellipse? : ?generic?; } ); Scott On Thu, 2006-12-21 at 15:31 +0100, Anh-Thu Tieu wrote: > Hello, > > I use bioperl 1.5.2. Does anyone know how I could create two seperate > glyph elements on the same track with the Bio::Graphics::Panel module? > My aim is to have two (e.g. two different) clickable imagemap elements > on the same track. Until now I can merely create two glyph elements > (transcript2 or generic options) per track with only one imagemap > element (e.g. the same imagemap element is used for the entire track as > the entire (=both elements) glyph's coordinates are returned to the > image_and_map function as one set of coordinate). > > Thank you for your help. > > Best regards, > > Anh Thu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cain.cshl at gmail.com Thu Dec 21 15:03:48 2006 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 21 Dec 2006 15:03:48 -0500 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> <1166604008.4588f6e87cccc@www.studentmail.otago.ac.nz> <1166621113.3739.11.camel@localhost.localdomain> <1166642653.45898dddbd8cf@www.studentmail.otago.ac.nz> <1166643051.3739.28.camel@localhost.localdomain> <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz> Message-ID: <1166731428.3739.71.camel@localhost.localdomain> Hi Stephan, About your bioperl mail: did you cancel it, or did it just disappear? If the latter, I might have accidentally deleted it, sorry :-/ So 'GBrowse is running' means that you can see the sample yeast chr1 database, browse around, etc, right? I still don't know what is up with the warning but my guess is that everything is working there. As for your question about writing a callback, the reason it's not working is that the attributes method returns a list (typically but not always with only one element), so what you are really doing in your test is this "number of elements in the list > 1200", which is almost always going to be false. You should change it to this: my $feature = shift; my ($score) = $feature->attributes('score'); if ($score > 1200) { ...etc... Finally, if you really want to test that you are using the correct bioperl, you can put this simple cgi in your cgi-bin directory as test_biographics.pl, set it as world executable and go to http://localhost/cgi-bin/test_biographics.pl (and, yes, I use strict and warnings even when the script is only 10 lines long :-) : #!/usr/bin/perl use strict; use warnings; use Bio::Graphics::Panel; use CGI qw/:standard/; print header(), start_html, p("Bio::Graphics::Panel api_version is ".Bio::Graphics::Panel->api_version), p("It should be 1.654 for BioPerl 1.5.2"), end_html; Scott On Fri, 2006-12-22 at 08:27 +1300, Stephan Roessner wrote: > Hi Scott, > > responded to group but did get through. > So I reply back to you. > > I installed Class-Base-0.03 using CPAN. > > Reinstalling GBrowse gives me still a warning like: > Warning: prerequisite Bio::Perl 1.52 not found. We have 1.0050021. > Writing Makefile for Bio::Graphocs::Browser::CAlign > Writing Makefile for Generic-Genome-Browser. > > GBrowse is running but I cannot access attributes and/or the score column > of .gff files. Is this related to the warning? > > To get an attribute I use > > my $feature = shift; > if ($feature->attributes('score') > 1200) { > return 'blue'; > } else { > return 'pink'; > } > But I retrieve not data using $feature-> > > Can I somehaow verify what bioperl version GBrowse is using? > > Stephan, > > > > Quoting Scott Cain : > > > Stephan, > > > > Yes, it is in cpan: > > > > http://search.cpan.org/~abw/Class-Base-0.03/lib/Class/Base.pm > > > > The cpan shell should be able to install it. > > > > Whether or not that works, please respond to the mailing list so that > > the rest of the conversation can be archived. > > > > Scott > > > > > > On Thu, 2006-12-21 at 08:24 +1300, Stephan Roessner wrote: > > > Hi Scott, > > > > > > No I didn't. > > > I had a look and couldn't find it. > > > It is not part of CPAN? > > > > > > Stephan > > > > > > > > > Quoting Scott Cain : > > > > > > > Stephan, > > > > > > > > Did you install Class::Base? It was inadvertantly left out the > > > > install > > > > document, but is required. > > > > > > > > Scott > > > > > > > > > > > > On Wed, 2006-12-20 at 21:40 +1300, Stephan Roessner wrote: > > > > > Hi all, > > > > > > > > > > I did sudo ./Build install --uninst 1 and got the error > > > > > * ERROR: Confiduration was initially created with MOdule::Build > > > > version > > > > > '0.2805', but we are now using '0.2806'. ... > > > > > > > > > > So I ran perl Build.PL and got the message > > > > > Creating new 'Buid' script for 'bioperl' verion '1.0050021'. > > > > > > > > > > I did run sudo ./Build install --uninst 1 again. > > > > > Seems to be fine with no error messages. > > > > > > > > > > When I run perl Makefile.PL for GBrowse 1.66-RC2 it results in > > > > > > > > > > Warning: prerequisite Bio::Perl 1.52 not found. We have > > 1.0050021. > > > > > Warning: prerequisite Class::Base 0 not found. > > > > > Writing Makefile for Bio::Graphocs::Browser::CAlign > > > > > Writing Makefile for Generic-Genome-Browser > > > > > > > > > > GBrowse is running but I have really troubles with aggregators > > trying > > > > to > > > > > use xyplot. It does not plot anything. So I thought the bioperl > > could > > > > be > > > > > the problem. > > > > > > > > > > Stephan > > > > > > > > > > > > > > > > > > > > Quoting Scott Cain : > > > > > > > > > > > I really don't think the BioPerl version detection is wrong. > > I > > > > > > actually > > > > > > don't check Bio::Root::Version::VERSION in Makefile.PL, I > > check > > > > > > Bio::Graphics::Panel->api_version. When it doesn't find the > > > > correct > > > > > > api_version, it gives a warning the BioPerl 1.5.2 is not > > installed. > > > > I > > > > > > have seen this happen when more than one BioPerl instance is > > > > installed > > > > > > and `perl Makefile.PL` finds the wrong one first. My > > suggestion is > > > > to > > > > > > try reinstalling BioPerl and providing the --uninst 1 argument > > to > > > > > > remove > > > > > > older versions of BioPerl: > > > > > > > > > > > > sudo ./Build install --uninst 1 > > > > > > > > > > > > Scott > > > > > > > > > > > > > > > > > > On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote: > > > > > > > Stephan Roessner wrote: > > > > > > > > Dear support team, > > > > > > > > > > > > > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be > > able > > > > to > > > > > > use > > > > > > > > gbrowse. > > > > > > > > The installation seems to work (except of the test > > failures) > > > > but > > > > > > the > > > > > > > > gbrowse installation tells me that BIO::pERL 1.0050021 is > > > > > > installed, but > > > > > > > > of course it requires 1.52. > > > > > > > > > > > > > > > > Is there a chance to find out what went wrong? > > > > > > > > > > > > > > Nothing went wrong with the Bioperl installation (well, > > expect > > > > there > > > > > > > shouldn't have been any test failures - can you post those > > > > please?). > > > > > > > gbrowse simply defined its Bioperl requirement incorrectly. > > If > > > > you > > > > > > tell > > > > > > > me exactly where you downloaded gbrowse from and how you > > went > > > > about > > > > > > > installing it, and provide the exact, complete error message > > you > > > > got > > > > > > > from it, I might be able help the authors fix the problem. > > > > > > > > > > > > > > Or I'm pretty sure they can figure it our for themselves :) > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > Scott Cain, Ph. D. > > > > > > cain at cshl.edu > > > > > > GMOD Coordinator (http://www.gmod.org/) > > > > > > 216-392-3087 > > > > > > Cold Spring Harbor Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > ------------------------------------------------------------------------ > > > > Scott Cain, Ph. D. > > > > cain.cshl at gmail.com > > > > GMOD Coordinator (http://www.gmod.org/) > > > > 216-392-3087 > > > > Cold Spring Harbor Laboratory > > > > > > > > > > > > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. > > cain.cshl at gmail.com > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rvosa at sfu.ca Sat Dec 23 17:17:37 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Sat, 23 Dec 2006 14:17:37 -0800 Subject: [Bioperl-l] [Summary] Re: suggestions for suitable 'taxon' object In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> Message-ID: <458DAB01.6080200@sfu.ca> The replies I've received so far indicate I should look into Bio::Taxon. I will probably come back with further questions/discussions as to how to link and cross reference taxa, sequences and nodes, but for now I should first look at the Bio::Taxon api (and unpack my other Christmas gifts). Thank you for all comments and suggestions. Happy holidays! Rutger Rutger Vos wrote: > Hi all, > > I am looking for a bioperl object that can be abused to function as a > suitable 'taxon' object, where I mean 'taxon' as understood by the NEXUS > file format (i.e. not strictly an entity from a taxonomy, but more loosely > an OTU). > > The object would primarily function as a way to relate nodes in trees to > sequences in an alignment (a foreign key that both nodes and sequences refer > to), and secondarily as the keeper of the canonical name of the OTU, such > that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo > sapiens (constrained monophyly)' can still be understood to refer to the > same thing - the OTU 'Homo sapiens sapiens' (for example). > > I was thinking that a (possibly expanded) Bio::Species might work if there > was some sensible way of appending references to node and sequence objects > to it (or otherwise associate them with each other), but I am writing *to > solicit any and all suggestions*. I am looking for something similar to > Bio::Phylo::Taxa::Taxon. > > Any and all comments and suggestions greatly appreciated! > > Best wishes, > > Rutger Vos > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger A. Vos Postdoctoral research fellow University of British Columbia Personal site: http://www.sfu.ca/~rvosa CIPRES: http://www.phylo.org Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From paul.boutros at utoronto.ca Sat Dec 23 22:36:59 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Sat, 23 Dec 2006 22:36:59 -0500 Subject: [Bioperl-l] Bio::Graphics::Glyph::dna Message-ID: <20061223223659.7sgfofa44mw4okks@webmail.utoronto.ca> Hi, I've been trying to get the dna glyph working and have had some problems. I'm using a fasta file, and am having some problems. This is ActiveState perl 5.8.8 (build 819) and BioPerl 1.5.2 on WinXP. I'm starting with a FASTA file, so I've tried: $panel->add_track( $seq, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); where $seq is a Bio::Seq object and I've tried it using a GFF $segment: my $db = Bio::DB::GFF->new( -adaptor=> 'berkeleydb', -create => 1, -dsn => 'temp' ); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary)_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); From paul.boutros at utoronto.ca Sat Dec 23 22:46:27 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Sat, 23 Dec 2006 22:46:27 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? Message-ID: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> Hello, I'm trying to get the dna glyph of Bio::Graphics to work and am having some problems. I'm starting with a fasta file, and I am running perl 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 If I try simply using a Bio::Seq object like this: $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: Can't locate object method "start" via package "Bio::Seq" at C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. And if I try creating a Bio::DB::GFFSegment object like this: my $db = Bio::DB::GFF->new( -adaptor => 'berkeleydb', -create => 1, -dsn => '/usr/local/share/gff/dmel' ); $db->initialize(1); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not implemented b y package Bio::DB::GFF::Segment. This is not your fault - author of Bio::DB::GFF::Segment should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::RootI::throw_not_implemented C:/Perl/site/lib/Bio/Root/RootI.pm:522 STACK: Bio::FeatureHolderI::get_SeqFeatures C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 STACK: Bio::Graphics::Glyph::_subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 STACK: Bio::Graphics::Glyph::subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Panel::_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 STACK: Bio::Graphics::Panel::_do_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 STACK: Bio::Graphics::Panel::add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 STACK: create_figure.pl:147 ---------------------------------------------------------------- I'm really unsure what to try next, any suggestions much appreciated! Paul From lstein at cshl.edu Sun Dec 24 12:23:18 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Sun, 24 Dec 2006 12:23:18 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? In-Reply-To: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> References: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> Message-ID: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com> Hi, You need to use either a Bio::SeqFeature::Generic object (with an attached Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to create Bio::DB::GFF::Segment objects directly. e.g. my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000); my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800); $feature->attach_seq($dna); Best, Lincoln On 12/23/06, Paul Boutros wrote: > > Hello, > > I'm trying to get the dna glyph of Bio::Graphics to work and am having > some problems. I'm starting with a fasta file, and I am running perl > 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 > > If I try simply using a Bio::Seq object like this: > $panel->add_track( > $segment, > -glyph => 'dna', > -do_gc => 'true', > -gc_window => 'auto' > ); > > I get the error: > Can't locate object method "start" via package "Bio::Seq" at > C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. > > > And if I try creating a Bio::DB::GFFSegment object like this: > my $db = Bio::DB::GFF->new( > -adaptor => 'berkeleydb', > -create => 1, > -dsn => '/usr/local/share/gff/dmel' > ); > > $db->initialize(1); > > $db->load_sequence_string( > $seq->primary_id(), > $seq->seq() > ); > > my $segment = Bio::DB::GFF::Segment->new( > $db, > $seq->primary_id(), > $seq->primary_id(), > 1, > $seq->length() > ); > > $panel->add_track( > $segment, > -glyph => 'dna', > -do_gc => 'true', > -gc_window => 'auto' > ); > > I get the error: > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not > implemented b > y package Bio::DB::GFF::Segment. > This is not your fault - author of Bio::DB::GFF::Segment should be blamed! > > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::RootI::throw_not_implemented > C:/Perl/site/lib/Bio/Root/RootI.pm:522 > STACK: Bio::FeatureHolderI::get_SeqFeatures > C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 > STACK: Bio::Graphics::Glyph::_subfeat > C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 > STACK: Bio::Graphics::Glyph::subfeat > C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 > STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 > STACK: Bio::Graphics::Glyph::Factory::make_glyph > C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 > STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 > STACK: Bio::Graphics::Glyph::Factory::make_glyph > C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 > STACK: Bio::Graphics::Panel::_add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 > STACK: Bio::Graphics::Panel::_do_add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 > STACK: Bio::Graphics::Panel::add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 > STACK: create_figure.pl:147 > ---------------------------------------------------------------- > > I'm really unsure what to try next, any suggestions much appreciated! > Paul > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From tgenahmet at gmail.com Wed Dec 27 16:38:43 2006 From: tgenahmet at gmail.com (Ahmet Kurdoglu) Date: Wed, 27 Dec 2006 14:38:43 -0700 Subject: [Bioperl-l] get mRNA details for a gene Message-ID: <9d8d0e2a0612271338t7cb15a63v5a08f624888b3f7b@mail.gmail.com> Hi, This is my first message to the list. I hope I get it right. Here is what I'm trying to accomplish: Get the mRNA details for a given gene (ex. DNASE2B) from its GenBank file. Using the web-interface I can search with this query: DNASE2B [sym] AND homo sapiens [ORGN] (returns only one result if you search 'gene' database) and get the GenBank file by clicking on NC_000001.9 and I can see the details for its two mRNAs. (I eventually need to get exon locations for both of its transcripts) However trying to do this in Perl has proved to be very difficult for me. I've tried various methods, including get_Seq_by_id, get_Seq_by_gi, and get_Stream_by_query. Before I explain in detail what I did I'd like to hear your ideas on how to accomplish this. Thank you. From sdavis2 at mail.nih.gov Thu Dec 28 16:57:03 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 28 Dec 2006 16:57:03 -0500 Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers In-Reply-To: References: Message-ID: <45943DAF.70100@mail.nih.gov> Michael Muratet US-Huntsville wrote: > Sean > > Thanks. I did consider the bioconductor package and downloaded your > write-up after it was recommended by the GEO folks. I've looked at R a > few times, but I never got proficient at it. I'm a lot better with perl. > > I've been looking at MINiML, too. It looked like it might be easier to > parse the SOFT file since the data is in-line with the attributes and > I'd have to use a SAX parser (not enough memory for DOM) for MINiML. > > NCBI must have parsers to get the data into their databases. Do you know > what they use? > Michael, You might want to look more specifically at the MINiML format specs. The data tables are stored as separate tab-delimited files with an external reference in the XML, so DOM parsing is possible with just a few kB of memory. Of course, to read in all of the data into memory at once will take a large amount of memory for some datasets. If you are going to load into a database, I would suggest reading the MINiML using DOM and then stepping through the data files one at a time, loading as you go. As for their parsers, I'm not sure what language they use, but writing a parser for either SOFT or MINiML isn't at all difficult. GEO uses a very simplified MAGE schema. As for R vs. perl, if you are planning on doing analyses of microarray data, I would highly suggest looking again at the R/bioconductor project. It will save you reinventing many wheels, such as getting annotation like gene ontology and pathways, doing stats, plotting, maintaining MIAME-compliant data structures, converting from multiple microarray formats, etc. Sean From allenday at ucla.edu Thu Dec 28 18:21:07 2006 From: allenday at ucla.edu (Allen Day) Date: Thu, 28 Dec 2006 15:21:07 -0800 Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers In-Reply-To: <45943DAF.70100@mail.nih.gov> References: <45943DAF.70100@mail.nih.gov> Message-ID: <5c24dcc30612281521o58b9f256sfa36c403f4c30bfa@mail.gmail.com> > As for R vs. perl, if you are planning on doing analyses of microarray > data, I would highly suggest looking again at the R/bioconductor > project. It will save you reinventing many wheels, such as getting > annotation like gene ontology and pathways, doing stats, plotting, > maintaining MIAME-compliant data structures, converting from multiple > microarray formats, etc. I'll second this statement WRT the data analysis. I'm doing all my analysis in R, Perl is just not good at dealing with large matrices or plotting. OTOH, I have also found that R is particularly weak when it comes to pipelining data and system interfacing. If your goal is to do ETL to a local database you're better off using Perl. I've found they're both about equally clunky for dealing with the experimental metadata, with a slight preference for Perl. That's more a reflection of the baroque MAGE model though than the programming languages themselves. -Allen > > Sean > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Paul.Boutros at utoronto.ca Sat Dec 30 02:43:32 2006 From: Paul.Boutros at utoronto.ca (Paul Boutros) Date: Sat, 30 Dec 2006 02:43:32 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? In-Reply-To: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com> Message-ID: <000c01c72be6$34d07e60$ec02a8c0@main> Hi Lincoln, Thanks, that worked like a charm! Can I suggest adding the example/explanation you gave me to the pod for Bio::Graphics::Glyph::dna? Here's a patch against the 1.5.2 version of dna.pm to do that. Paul 266c266,274 < in response to the dna() method. --- > in response to the dna() method. For example, you can use a > Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq > like this: > my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 ); > my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end => 800 ); > $feature->attach_seq($dna); > $panel->add_track( $feature, -glyph => 'dna' ); > > A Bio::Graphics::Feature object may also be used. _____ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Sunday, December 24, 2006 12:23 PM To: Paul.Boutros at utoronto.ca Cc: BioPerl Mailing List Subject: Re: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? Hi, You need to use either a Bio::SeqFeature::Generic object (with an attached Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to create Bio::DB::GFF::Segment objects directly. e.g. my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000); my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800); $feature->attach_seq($dna); Best, Lincoln On 12/23/06, Paul Boutros wrote: Hello, I'm trying to get the dna glyph of Bio::Graphics to work and am having some problems. I'm starting with a fasta file, and I am running perl 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 If I try simply using a Bio::Seq object like this: $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: Can't locate object method "start" via package "Bio::Seq" at C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. And if I try creating a Bio::DB::GFFSegment object like this: my $db = Bio::DB::GFF->new( -adaptor => 'berkeleydb', -create => 1, -dsn => '/usr/local/share/gff/dmel' ); $db->initialize(1); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not implemented b y package Bio::DB::GFF::Segment. This is not your fault - author of Bio::DB::GFF::Segment should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::RootI::throw_not_implemented C:/Perl/site/lib/Bio/Root/RootI.pm:522 STACK: Bio::FeatureHolderI::get_SeqFeatures C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 STACK: Bio::Graphics::Glyph::_subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 STACK: Bio::Graphics::Glyph::subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Panel::_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 STACK: Bio::Graphics::Panel::_do_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 STACK: Bio::Graphics::Panel::add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 STACK: create_figure.pl:147 ---------------------------------------------------------------- I'm really unsure what to try next, any suggestions much appreciated! Paul _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From er at xs4all.nl Sat Dec 30 19:05:16 2006 From: er at xs4all.nl (Erik) Date: Sun, 31 Dec 2006 01:05:16 +0100 (CET) Subject: [Bioperl-l] acquiring a local refseq + index Message-ID: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Hi all, I downloaded the refseq files (.gbff) and want to index the lot with Bio::DB::Flat. It turns out that there are many cases where the SOURCE and ORGANISM lines are messed up, sometimes to a degree where the indexing fails on a Bio::SeqIO::genbank error. I'd like to change Bio::SeqIO::genbank to let this parsing go at least so far as to make the indexing of the refseq files possible, and hopefully improving the taxonomic output ($seq->species->binomial is often mutilated at the moment). Is it still worthwhile to change parsing modules like Bio::SeqIO::genbank? Is anyone already working on a rewrite? Because if this is the case I may be better off writing my own indexing scheme? Below is (outline of) my indexing program, which uses Bio::DB::Flat::DBD. If anyone knows of a better way to get a locally searchable refseq flat file index, I would be very interested. Thanks for your help, Erikjan ------------- use Bio::DB::Flat; my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; my $db=Bio::DB::Flat->new( -directory => $refseq_dir, -dbname => 'refseq', -format => 'genbank', -index => 'bdb', -write_flag => 1, ); my @files = getfiles($refseq_dir); for my $f (@files) { db->build_index($f); } From hlapp at gmx.net Sat Dec 30 20:48:33 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 30 Dec 2006 20:48:33 -0500 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Message-ID: Can you send examples and the resulting error messages? Also, I'm assuming you running the 1.5.2 release of Bioperl; if not that's what I would try first. -hilmar On Dec 30, 2006, at 7:05 PM, Erik wrote: > Hi all, > > I downloaded the refseq files (.gbff) and want to index the lot with > Bio::DB::Flat. > > It turns out that there are many cases where the SOURCE and > ORGANISM lines > are messed up, sometimes to a degree where the indexing fails on a > Bio::SeqIO::genbank error. > > I'd like to change Bio::SeqIO::genbank to let this parsing go at > least so > far as to make the indexing of the refseq files possible, and > hopefully > improving the taxonomic output ($seq->species->binomial is often > mutilated > at the moment). > > Is it still worthwhile to change parsing modules like > Bio::SeqIO::genbank? > Is anyone already working on a rewrite? Because if this is the > case I may > be better off writing my own indexing scheme? > > Below is (outline of) my indexing program, which uses > Bio::DB::Flat::DBD. > If anyone knows of a better way to get a locally searchable refseq > flat > file index, I would be very interested. > > Thanks for your help, > > Erikjan > > > ------------- > use Bio::DB::Flat; > > my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; > my $db=Bio::DB::Flat->new( > -directory => $refseq_dir, > -dbname => 'refseq', > -format => 'genbank', > -index => 'bdb', > -write_flag => 1, > ); > my @files = getfiles($refseq_dir); > for my $f (@files) { > db->build_index($f); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat Dec 30 21:33:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 30 Dec 2006 20:33:23 -0600 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Message-ID: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> Agree with Hilmar, in that we need examples. If you are referring to your submitted bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2167 we could add this in as long as it passes (I'll try giving it a workout with my local bacterial seqs tonight or tomorrow). However, in the not-too-distant future your patch would likely be rendered obsolete, as any parsing in Bio::SeqIO modules pertaining to Bio::Species-related matters will be deprecated in favor of simple parsing (more foolproof, less uncertainty) and Bio::Taxon (which has optional db lookups using NCBI Taxonomy). Bio::Species and anything related to it are considered marked for deprecation. Fair warning... chris On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote: > Can you send examples and the resulting error messages? Also, I'm > assuming you running the 1.5.2 release of Bioperl; if not that's what > I would try first. > > -hilmar > > On Dec 30, 2006, at 7:05 PM, Erik wrote: > >> Hi all, >> >> I downloaded the refseq files (.gbff) and want to index the lot with >> Bio::DB::Flat. >> >> It turns out that there are many cases where the SOURCE and >> ORGANISM lines >> are messed up, sometimes to a degree where the indexing fails on a >> Bio::SeqIO::genbank error. >> >> I'd like to change Bio::SeqIO::genbank to let this parsing go at >> least so >> far as to make the indexing of the refseq files possible, and >> hopefully >> improving the taxonomic output ($seq->species->binomial is often >> mutilated >> at the moment). >> >> Is it still worthwhile to change parsing modules like >> Bio::SeqIO::genbank? >> Is anyone already working on a rewrite? Because if this is the >> case I may >> be better off writing my own indexing scheme? >> >> Below is (outline of) my indexing program, which uses >> Bio::DB::Flat::DBD. >> If anyone knows of a better way to get a locally searchable refseq >> flat >> file index, I would be very interested. >> >> Thanks for your help, >> >> Erikjan >> >> >> ------------- >> use Bio::DB::Flat; >> >> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; >> my $db=Bio::DB::Flat->new( >> -directory => $refseq_dir, >> -dbname => 'refseq', >> -format => 'genbank', >> -index => 'bdb', >> -write_flag => 1, >> ); >> my @files = getfiles($refseq_dir); >> for my $f (@files) { >> db->build_index($f); >> } >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Dec 31 14:36:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 31 Dec 2006 13:36:47 -0600 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> Message-ID: <37FB5BDF-25A9-44F0-9E82-964684A73A58@uiuc.edu> As a followup, I have committed the fix Erik had in Bugzilla. I don't know if this helps with the below issue Erik describes (they sound unrelated). chris On Dec 30, 2006, at 8:33 PM, Chris Fields wrote: > Agree with Hilmar, in that we need examples. If you are referring to > your submitted bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2167 > > we could add this in as long as it passes (I'll try giving it a > workout with my local bacterial seqs tonight or tomorrow). However, > in the not-too-distant future your patch would likely be rendered > obsolete, as any parsing in Bio::SeqIO modules pertaining to > Bio::Species-related matters will be deprecated in favor of simple > parsing (more foolproof, less uncertainty) and Bio::Taxon (which has > optional db lookups using NCBI Taxonomy). Bio::Species and anything > related to it are considered marked for deprecation. Fair warning... > > chris > > On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote: > >> Can you send examples and the resulting error messages? Also, I'm >> assuming you running the 1.5.2 release of Bioperl; if not that's what >> I would try first. >> >> -hilmar >> >> On Dec 30, 2006, at 7:05 PM, Erik wrote: >> >>> Hi all, >>> >>> I downloaded the refseq files (.gbff) and want to index the lot with >>> Bio::DB::Flat. >>> >>> It turns out that there are many cases where the SOURCE and >>> ORGANISM lines >>> are messed up, sometimes to a degree where the indexing fails on a >>> Bio::SeqIO::genbank error. >>> >>> I'd like to change Bio::SeqIO::genbank to let this parsing go at >>> least so >>> far as to make the indexing of the refseq files possible, and >>> hopefully >>> improving the taxonomic output ($seq->species->binomial is often >>> mutilated >>> at the moment). >>> >>> Is it still worthwhile to change parsing modules like >>> Bio::SeqIO::genbank? >>> Is anyone already working on a rewrite? Because if this is the >>> case I may >>> be better off writing my own indexing scheme? >>> >>> Below is (outline of) my indexing program, which uses >>> Bio::DB::Flat::DBD. >>> If anyone knows of a better way to get a locally searchable refseq >>> flat >>> file index, I would be very interested. >>> >>> Thanks for your help, >>> >>> Erikjan >>> >>> >>> ------------- >>> use Bio::DB::Flat; >>> >>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; >>> my $db=Bio::DB::Flat->new( >>> -directory => $refseq_dir, >>> -dbname => 'refseq', >>> -format => 'genbank', >>> -index => 'bdb', >>> -write_flag => 1, >>> ); >>> my @files = getfiles($refseq_dir); >>> for my $f (@files) { >>> db->build_index($f); >>> } >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Fri Dec 1 02:47:03 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 07:47:03 +0000 Subject: [Bioperl-l] Upgrading my BioPerl RC via ppm? In-Reply-To: <519167.29410.qm@web50804.mail.yahoo.com> References: <519167.29410.qm@web50804.mail.yahoo.com> Message-ID: <456FDDF7.1080403@sheffield.ac.uk> Caitlin wrote: > Hi all. > > I'm currently using BioPerl 1.5.2 RC2 but I've seen multiple references > to 1.5.2 RC5. Can anyone tell me how to upgrade to the latest version? > The ppm GUI (ActivePerl Build 819) doesn't include any BioPerl packages > among those deemed upgradable. > > Thanks, > > ~Katie > > > Hi Katie, Currently there is not an RC5 PPM package available - we are hoping to have the official 1.5.2 release out pretty soon and there will definitely be a PPM package for that! Are you experiencing any problems with your current version of bioperl? If not, there is no need to worry, once we've released an updated PPM package your PPM GUI should then be able to see it as an upgrade - hopefully! :o) Sendu, I know you were working on automatically generating PPM packages - what is the current situation with regards to this? Nath --- avast! Antivirus: Inbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 07:46:58 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 07:47:04 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 04:00:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:00:18 +0000 Subject: [Bioperl-l] BLASTing with a seqio/seq object... In-Reply-To: <456F27E9.70205@york.ac.uk> References: <01ba01c714a2$b9659c10$15327e82@pyrimidine> <456F27E9.70205@york.ac.uk> Message-ID: <456FEF22.4090004@sendu.me.uk> Samantha Thompson wrote: You missed a step... > use strict; > use Bio::Perl; > use Bio::Seq; > use Bio::SeqIO; > > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > > #seq bit > > #$seq_obj = Bio::Seq->new(-format => 'fasta'); > > my $seqio_obj = Bio::SeqIO->new(-file => > "/biol/people/mres/st537/MalEfasta.txt", -format => 'fasta'); > > my $seq_obj = $seqio_obj->next_seq; > > > > #blast bit > > my $remote_blast = Bio::Tools::Run::RemoteBlast->new ( > -prog => 'blastp', -db => 'nr', -expect => '1e-15' ); > > my $blast_report = $remote_blast->submit_blast($seq_obj); Go back to the Bptutorial: http://www.bioperl.org/wiki/Bptutorial.pl#Running_BLAST_.28using_RemoteBlast.pm.29 And you'll see that submit_blast doesn't return a SearchIO object. For a complete working example see the synopsis for RemoteBlast: http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/RemoteBlast.html > #new part for SearchIO... > > while( my $result = $blast_report->next_result ) { > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > if( $hsp->length('total') > 100 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Hit= ", $hit->name, > ",Length=", $hsp->length('total'), > ",Percent_id=", $hsp->percent_identity, "\n"; > } > } > } > } > } From bix at sendu.me.uk Fri Dec 1 04:03:13 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:03:13 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> Message-ID: <456FEFD1.4070704@sendu.me.uk> pelikan at cs.pitt.edu wrote: > Hello all, > > I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, > without Cygwin. The "make test"s have all completed without error. This > is my first time dealing with bioperl, so bear with me. > > I've successfully loaded the most recent taxonomy information using the > biosql-schema scripts. After this, I attempted to load the most recent > release of the uniprot flat file dataset with the following command: > > load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass > ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat > > I am subsequently greeted by many of the following errors: > > Could not store Q7N3Q6: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: The supplied lineage does not start near 'Photorhabdus luminescens > subsp. laumondii' In your uniprot_sprot.dat file there'll be some kind of entry with that Photorhabdus species. Can you post that entry (sans sequence if it has one) so I can take a look at it? Maybe post a few that cause problems, and a few that don't. From bix at sendu.me.uk Fri Dec 1 04:19:09 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:19:09 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <000301c714b4$7846e790$15327e82@pyrimidine> References: <000301c714b4$7846e790$15327e82@pyrimidine> Message-ID: <456FF38D.3070508@sendu.me.uk> Chris Fields wrote: >> Nathan S. Haigh wrote: >>> More updates: >>> >>> After the failed install I updating Module::Build, and re-ran the >>> install, I get: >>> >>> -- snip -- >>> Creating new 'Build' script for 'bioperl' version '1.005002005' >>> Warning: while trying to determine prerequisites for >>> S/SE/SENDU/bioperl-1.5.2_005-RCb.tar.gz wi th the help of >>> Module::Build the following error occurred: 'Failed to re-load >>> 'ModuleBuildBiope >>> rl': Can't locate ModuleBuildBioperl.pm in @INC (@INC contains: >>> _build\lib C:\Perl\site\lib C:\ >>> Perl\lib C:\Documents and Settings\test) at (eval 105) line 1. >>> ' >>> >>> Falling back to META.yml for prerequisites 'YAML' not installed, >>> cannot parse 'C:\Perl\cpan\build\bioperl-1.5.2_005-RC\META.yml' >>> -- snip -- >> I had that problem fleetingly and it drove me crazy because >> later I couldn't reproduce it. Is it reproducible on your end? > > During Module::Build installation I see this: > > ... > t\metadata........ok > 8/43 skipped: YAML_support feature is not enabled You were pointing out the YAML issue? I think I'm less concerned with that (solution: install YAML) and much more concerned with why it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The module in question is in the same dir as the Build script, so it should be found automatically. The only thing I can think of is that CPAN doesn't manage to chdir to the directory. Hopefully I'll be able to reproduce this and then I can investigate further. From n.haigh at sheffield.ac.uk Fri Dec 1 04:26:22 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 09:26:22 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF233.6040704@sendu.me.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> Message-ID: <456FF53E.90907@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: >> >> I know that setting up the PPM is a pain, but I have to say it is >> much faster, and all required PPMs are available. Which makes me >> curious: why bother with trying out a CPAN installation process at >> this point, especially when you have to use PPM to install some of >> the prereqs properly anyway? > > Firstly, problems discovered and resulting fixes will help all > platforms, not just Windows. So thanks for trying it out and reporting > back. Secondly, the PPM method, like Bundle::BioPerl, is > all-or-nothing. The CPAN installation method allows an interactive > choice of which optional things to install. > > If what you say about DB_File is true, then that's a great shame! > > > So I can do further trouble-shooting of my own, what is the sure-fire > way to completely clean-out an ActivePerl install, including any > modules you might have installed with PPMs or CPAN? > > In addition, using CPAN allows you to run the test suite easily without the need to download it separately and run it after a PPM install. I don't know of a way to clean out ActivePerl - I use VMWare Workstation and have a virtual machine with a fresh install of WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas? Nath --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 09:26:23 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 04:13:23 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:13:23 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> Message-ID: <456FF233.6040704@sendu.me.uk> Chris Fields wrote: > > I know that setting up the PPM is a pain, but I have to say it is much > faster, and all required PPMs are available. Which makes me curious: > why bother with trying out a CPAN installation process at this point, > especially when you have to use PPM to install some of the prereqs > properly anyway? Firstly, problems discovered and resulting fixes will help all platforms, not just Windows. So thanks for trying it out and reporting back. Secondly, the PPM method, like Bundle::BioPerl, is all-or-nothing. The CPAN installation method allows an interactive choice of which optional things to install. If what you say about DB_File is true, then that's a great shame! So I can do further trouble-shooting of my own, what is the sure-fire way to completely clean-out an ActivePerl install, including any modules you might have installed with PPMs or CPAN? From cjfields at uiuc.edu Fri Dec 1 09:08:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 08:08:55 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF233.6040704@sendu.me.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> Message-ID: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote: > Chris Fields wrote: >> I know that setting up the PPM is a pain, but I have to say it is >> much faster, and all required PPMs are available. Which makes me >> curious: why bother with trying out a CPAN installation process at >> this point, especially when you have to use PPM to install some of >> the prereqs properly anyway? > > Firstly, problems discovered and resulting fixes will help all > platforms, not just Windows. So thanks for trying it out and > reporting back. Secondly, the PPM method, like Bundle::BioPerl, is > all-or-nothing. The CPAN installation method allows an interactive > choice of which optional things to install. Yes, I understand that. My point is, you are generally forced to use PPM anyway due to several modules not installing properly (all the 'trouble' distributions, like DB_File, are available via PPM). I can see using CPAN as an alternative way of installing Bioperl for a distribution, or as the primary method via CVS or manually, but not for distributions. At least not until the kinks are worked out for Windows users. What are the significant issues for a bioperl PPM installation, based on the last PPM Nathan set up? If there is a redirection problem, could we just modify the installation docs to address that ('due to problem X, you must install the following modules prior to installing BioPerl 1.5.2...'). > If what you say about DB_File is true, then that's a great shame! We need to go through the various prereqs to see which ones need PPM vs CPAN. In general, anything that requires C code compilation (and thus needs a recent VC++) will likely be an issue. > So I can do further trouble-shooting of my own, what is the sure- > fire way to completely clean-out an ActivePerl install, including > any modules you might have installed with PPMs or CPAN? Not sure, beyond uninstalling and cleaning out the Perl directory (I think you might be able to delete the site/ directory, but I haven't tried it). ActivePerl comes preloaded with a number of non-core modules which makes it tricky to uninstall them one-by-one. chris From cjfields at uiuc.edu Fri Dec 1 09:10:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 08:10:34 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <456FF38D.3070508@sendu.me.uk> References: <000301c714b4$7846e790$15327e82@pyrimidine> <456FF38D.3070508@sendu.me.uk> Message-ID: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote: > You were pointing out the YAML issue? I think I'm less concerned > with that (solution: install YAML) and much more concerned with why > it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The > module in question is in the same dir as the Build script, so it > should be found automatically. > > The only thing I can think of is that CPAN doesn't manage to chdir > to the directory. Hopefully I'll be able to reproduce this and then > I can investigate further. My thought was the two were related in some way. I'm not sure to tell the truth. -chris From bix at sendu.me.uk Fri Dec 1 09:17:41 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 14:17:41 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> Message-ID: <45703985.5050203@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> I know that setting up the PPM is a pain, but I have to say it is >>> much faster, and all required PPMs are available. Which makes me >>> curious: why bother with trying out a CPAN installation process at >>> this point, especially when you have to use PPM to install some of >>> the prereqs properly anyway? >> >> Firstly, problems discovered and resulting fixes will help all >> platforms, not just Windows. So thanks for trying it out and reporting >> back. Secondly, the PPM method, like Bundle::BioPerl, is >> all-or-nothing. The CPAN installation method allows an interactive >> choice of which optional things to install. > > Yes, I understand that. My point is, you are generally forced to use > PPM anyway due to several modules not installing properly (all the > 'trouble' distributions, like DB_File, are available via PPM). I can > see using CPAN as an alternative way of installing Bioperl for a > distribution, or as the primary method via CVS or manually, but not for > distributions. At least not until the kinks are worked out for Windows > users. CPAN isn't being suggested as the primary or preferred installation method for Windows. That will still be PPM. I'm mentioning CPAN / manual installation in the Windows INSTALL docs for the benefit of anyone who wants a simple install and test environment when checking out from CVS. > What are the significant issues for a bioperl PPM installation None that I'm aware of - I just need to find the time to start looking into generating an appropriate PPD. Hopefully Nathan's wiki page on the subject will be all I need. From bix at sendu.me.uk Fri Dec 1 09:18:43 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 14:18:43 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> References: <000301c714b4$7846e790$15327e82@pyrimidine> <456FF38D.3070508@sendu.me.uk> <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> Message-ID: <457039C3.30907@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote: > >> You were pointing out the YAML issue? I think I'm less concerned with >> that (solution: install YAML) and much more concerned with why it >> can't reload ModuleBuildBioperl (claiming it isn't in @INC). The >> module in question is in the same dir as the Build script, so it >> should be found automatically. >> >> The only thing I can think of is that CPAN doesn't manage to chdir to >> the directory. Hopefully I'll be able to reproduce this and then I can >> investigate further. > > My thought was the two were related in some way. I'm not sure to tell > the truth. They weren't, using YAML is the fall-back position incase of earlier failure. I've fixed it now in any case. From gwu at molbio.mgh.harvard.edu Fri Dec 1 10:19:42 2006 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Fri, 01 Dec 2006 10:19:42 -0500 Subject: [Bioperl-l] One more load_seqdatabase.pl question In-Reply-To: <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net> References: <4a9ad8800611270907x64a4a4c0jad92bff6641e300@mail.gmail.com> <53C6D534-6E36-4061-B955-E74537839265@gmx.net> <456CA667.6010609@molbio.mgh.harvard.edu> <456F5648.6070207@molbio.mgh.harvard.edu> <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net> Message-ID: <4570480E.1020701@molbio.mgh.harvard.edu> Thanks Hilmar. I did include the -lookup switch on the command line. The warning messages say that the code failed to "INSERT" instead of "UPDATE", which sounds like a match was not found. But I was just loading the same Genbank file for the second time. To test if it actually updated the records, I made a minor modification on one of the COMMENT feature. Unfortunately it's not updated. By the way, the test genbank file has four "COMMENT" features but they are different. Any idea what's happening there? I wonder if it's a bad idea to "UPDATE" a sequence. Say I got a new sequence version with 5 features removed, 5 features modified and 5 features new. If only --lookup is included, according to the POD, the 5 new features will be inserted, the 5 modified features will be updated and the 5 removed features will be in the database untouched. This rendered the new sequence records a mixture of old and new versions. I did not see a reason anyone would like to have a sequence like this. Either include -remove to replace the old version if only one version is needed, or put the new version under a different name space if multiple versions are needed. Do I have the correct understanding of these issues? I deeply appreciate your help. Gang Hilmar Lapp wrote: > Right. You need to tell it to lookup sequences first if you know that > you are loading sequences which may be in the database already (see > the POD of load_seqdatabase.pl, switch --lookup; there are several > other command line options that control what will happen if a sequence > entry is already present in the database.). > > The messages in you report are warnings, not errors. It looks like > some of the comments are duplicated for a sequence, it doesn't look > like reason for concern. Is not so good if you get errors thrown. > > -hilmar > > On Nov 30, 2006, at 5:08 PM, gang wu wrote: > >> Thanks Hilmar. Do you mean the NVL() clause will make >> load_seqdatabase.pl not work when update? >> >> I have problem with updating. Seems load_seqdatabase.pl only tries to >> insert instead of update. I used one of the test genbank file coming >> whith bioperl-db. Please take a look at the attached output. >> >> Thanks. >> >> Gang >> >> ========================================= >> >perl load_seqdatabase.pl -lookup -host elegans -driver Oracle >> -dbname sparc -dbuser biosqldb-sgowner -dbpass PASS -format genbank >> -namespace test >> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb >> Loading >> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb >> ... >> >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, >> values were ("This sequence was reannotated via the Ensembl system. >> Please visit the Ensembl web site, http://www.ensembl.org/ for more >> information. ","1") FKs (389109) >> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated >> (DBD ERROR: OCIStmtExecute) >> --------------------------------------------------- >> >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, >> values were ("The /gene indicates a unique id for a gene, /cds a >> unique id for a translation and a /exon a unique id for an exon. >> These ids are maintained wherever possible between versions. For more >> information on how to interpret the feature table, please visit >> http://www.ensembl.org/Docs/embl.html. ","2") FKs (389109) >> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated >> (DBD ERROR: OCIStmtExecute) >> --------------------------------------------------- >> ... >> ... >> ========================================================== >> Hilmar Lapp wrote: >>> These are the protein translations stored in the feature table as >>> tags of features, right? You can change the type of the column >>> (although there may be some issues when you update the column >>> because the NVL() clause won't work if I recall that correctly), but >>> doing so will deprive you of any 'normal' searches against that >>> column. (You can still use functions >from the DBMS_LOB package, but >>> they will be much slower and are completely non-standard.) It is up >>> to you whether that is too big of a price to pay for having some >>> redundant protein translations (translating the feature's DNA >>> sequence should give you the same) in the database. I always trimmed >>> those feature tags off (using a custom SeqProcessor). An alternative >>> is to convert these feature tags into actual bioentries (i.e., >>> Bio::Seq objects; again, a custom SeqProcessor will allow you to do >>> that). -hilmar On Nov 28, 2006, at 4:13 PM, gang wu wrote: >>>> Hi everyone, I'm using load_seqdatabase.pl to upload some Genbank >>>> genome sequences to my Oracle BioSQL database. I saw some >>>> errors(See attached warning message) related to >>>> seqfeature_qualifier_value (SG_SEQFEATURE_QUALIFIER_ASSOC.VALUE >>>> column), which has Varchar2 data type of maximum 4000 bytes. Did >>>> anybody mention this issue before? Should I just modify the column >>>> to a type being able store more data such as LONG or CLOB? Thanks. >>>> Gang Log information: ============================================ >>>> load_seqdatabase.pl -host elegans -driver Oracle -dbname sparc >>>> -dbuser biosqldb-sgowner -dbpass PASS -format genbank -namespace >>>> genbank /genomeseq/arabidopsis//NC_003070.gbk Loading >>>> /genomeseq/arabidopsis//NC_003070.gbk ... -------------------- >>>> WARNING --------------------- MSG: SimpleValueAdaptor::add_assoc: >>>> unexpected failure of statement execution: ORA-01461: can bind a >>>> LONG value only for insert into a LONG column (DBD ERROR: error >>>> possibly near <*> indicator at char 12 in 'INSERT INTO >>>> <*>seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) >>>> VALUES (:p1, :p2, :p3, :p4)') name: INSERT ASSOC [2] >>>> Bio::SeqFeature::Generic;Bio::Annotation::SimpleValue values: >>>> FK[Bio::SeqFeature::Generic]:14898, >>>> FK[Bio::Annotation::SimpleValue]:800, >>>> value:"MVAVTGEVLHLLRRYLGEYVHGLSTEALRISVWKGDVVLKDLKLKAEALNSLKLPVAVKSGFV >>>> GTITLKVPWKSLGKEPVIVLIDRVFVLAYPAPDDRTLKFFTLVGTEFAYTNYIPGGRQGKASRNQASADR >>>> GTSYFWLMELHGYEAETATLEARAKSKLGSPPQGNSWLGSIIATIIGNLKVSISNVHIRYEDSTRDSSEI >>>> LASFFSYFNNICSSNPGHPFAAGITLAKLAAVTMDEEGNETFDTSGALDKLRKSLQLERLALYHDSNSFP >>>> WEIEKQWDNITPEEWIEMFEDGIKEQTEHKIKSKWALNRHYLLSPINGSLKYHRLGNQERNNPEIPFERA >>>> SVILNDVNVTITEEQYHDWIKLVEVVSRYKTYIEISHLRPMVPVSEAPRLWWRFAAQASLQQKRLWYTRY >>>> IQLYANFLQQSSDVNYPEMREIEKDLDSKVILLWRLLAHAKVESVKSKEAAEQRKLKKGGWFSFNWRTEA >>>> EDDPEVDSVAGGSKLMEERLTKDEWKAINKLLSHQPDEEMNLYSGKDMQNMTHFLVTVSIGQGAARIVDI >>>> NQTEVLCGRFEQLDVTTKFRHRSTQCDVSLRFYGLSAPEGSLAQSVSSERKTNALMASFVNAPIGENIDW >>>> RLSATISPCHATIWTESYDRVLEFVKRSNAVSPTVALETAAVLQMKLEEVTRRAQEQLQIVLEEQSRFAL >>>> DIDIDAPKVRIPLRASGSSKCSSHFLLDFGNFTLTTMDTRSEEQRQNLYSRFCISGRDIAAFFTDCGSDN >>>> QGCSLVMEDFTNQPILSPILEKADNVYSLIDRCGMAVIVDQIKVPHPSYPSTRISIQVPNIGVHFSPTRY >>>> MRIMQLFDILYGAMKTYSQAPVDHMPDGIQPWSPTDLASDARILVWKGIGNSVATWQSCRLVLSGLYLYT >>>> FESEKSLDYQRYLCMAGRQVFEVPPANIGGSPYCLAVGVRGTDLKKALESSSTWIIEFQGEEKAAWLRGL >>>> VQATYQASA! >>>> PLSGDVLGQTSDGDGDFHEPQTRNMKAADLVITGALVETKLYLYGKIKNECDEQVEEVLLLKVLASGGKV >>>> HLISSESGLTVRTKLHSLKIKDELQQQQSGSAQYLAYSVLKNEDIQESLGTCDSFDKEMPVGHADDEDAY >>>> TDALPEFLSPTEPGTPDMDMIQCSMMMDSDEHVGLEDTEGGFHEKDTSQGKSLCDEVFYEVQGGEFSDFV >>>> SVVFLTRSSSSHDYNGIDTQMSIRMSKLEFFCSRPTVVALIGFGFDLSTASYIENDKDANTLVPEKSDSE >>>> KETNDESGRIEGLLGYGKDRVVFYLNMNVDNVTVFLNKEDGSQLAMFVQERFVLDIKVHPSSLSVEGTLG >>>> NFKLCDKSLDSGNCWSWLCDIRDPGVESLIKFKFSSYSAGDDDYEGYDYSLSGKLSAVRIVFLYRFVQEV >>>> TAYFMGLATPHSEEVIKLVDKVGGFEWLIQKDEMDGATAVKLDLSLDTPIIVVPRDSLSKDYIQLDLGQL >>>> EVSNEISWHGCPEKDATAVRVDVLHAKILGLNMSVGINGSIGKPMIREGQGLDIFVRRSLRDVFKKVPTL >>>> SVEVKIDFLHAVMSDKEYDIIVSCTSMNLFEEPKLPPDFRGSSSGPKAKMRLLADKVNLNSQMIMSRTVT >>>> ILAVDINYALLELRNSVNEESSLAHVAVRASEPNSSISWMTSLSETDLYVSVPKVSVLDIRPNTKPEMRL >>>> MLGSSVDASKQASSESLPFSLNKGSFKRANSRAVLDFDAPCSTMLLMDYRWRASSQSCVLRVQQPRILAV >>>> PDFLLAVGEFFVPALRAITGRDETLDPTNDPITRSRGIVLSEPLYKQTEDVVHLSPRRQLVADSLGIDEY >>>> TYDGCGKVISLSEQGEKDLNVGRLEPIIIVGHGKKLRFVNVKIKNGSLLSKCIYLSNDSSCLFSPEDGVD >>>> ISMLENASSNPENVLSNAHKSSDVSDTCQYDSKSGQSFTFEAQVVSPEFTFFDGTKSSLDDSSAVEKLLR >>>> VKLDFNFM! >>>> YASKEKDIWVRALLKNLVVETGSGLIILDPVDISGGYTSVKEKTNMSLTSTDIYMHLSLSALSLLLNLQS >>>> QVTGALQSGNAIPLASCTNFDRIWVSPKENGPRNNLTIWRPQAPSNYVILGDCVTSRAIPPTQAVMAVSN >>>> TYGRVRKPIGFNRIGLFSVIQGLEGDNVQHSHNSNECSLWMPVAPVGYTAMGCVANIGSEQPPDHIVYCL >>>> SIWRADNVLGAFYAHTSTAAPSKKYSPGLSHCLLWNPLQSKTSSSSDPSSTSGSRSEQSSDQTGNSSGWD >>>> ILRSISKATSYHVSTPNFERIWWDKGGDLRRPVSIWRPVPRPGFAILGDSITEGLEPPALGILFKADDSE >>>> IAAKPVQFNKVAHIVGKGFDEVFCWFPVAPPGYVSLGCVLSKFDEAPHVDSFCCPRIDLVNQANIYEASV >>>> TRSSSSKSSQLWSIWKVDNQACTFLARSDLKRPPSRMAFAVGESVKPKTQENVNAEIKLRCFSLTLLDGL >>>> HGMMTPLFDTTVTNIKLATHGRPEAMNAVLISSIAASTFNPQLEAWEPLLEPFDGIFKLETYDTALNQSS >>>> KPGKRLRIAATNILNINVSAANLETLGDAVVSWRRQLELEERAAKMKEESAASRESGDLSAFSALDEDDF >>>> QTIVVENKLGRDIYLKKLEENSDVVVKLCHDENTSVWVPPPRFSNRLNVADSSREARNYMTVQILEAKGL >>>> HIIDDGNSHSFFCTLRLVVDSQGAEPQKLFPQSARTKCVKPSTTIVNDLMECTSKWNELFIFEIPRKGVA >>>> RLEVEVTNLAAKAGKGEVVGSLSFPVGHGESTLRKVASVRMLHQSSDAENISSYTLQRKNAEDKHDNGCL >>>> LISTSYFEKTTIPNTLRNMESKDFVDGDTGFWIGVRPDDSWHSIRSLLPLCIAPKSLQNDFIAMEVSMRN >>>> GRKHATFRCLATVVNDSDVNLEISISSDQNVSSGVSNHNAVIASRSSYVLPWGCLSKDNEQCLHIRPKVE >>>> NSHHSYAWGYCIAVSSGCGKDQPFVDQGLLTRQNTIKQSSRASTFFLRLNQLEKKDMLFCCQPSTGSKPL >>>> WLSVGADAS! >>>> VLHTDLNTPVYDWKISISSPLKLENRLPCPVKFTVWEKTKEGTYLERQHGVVSSRKSAHVYSADIQRPVY >>>> LTLAVHGGWALEKDPIPVLDISSNDSVSSFWFVHQQSKRRLRVSIERDVGETGAAPKTIRFFVPYWITND >>>> SYLPLSYRVVEIEPSENVEAGSPCLTRASKSFKKNPVFSMERRHQKKNVRVLESIEDTSPMPSMLSPQES >>>> AGRSGVVLFPSQKDSYVSPRIGIAVAARDSDSYSPGISLLELEKKERIDVKAFCKDASYYMLSAVLNMTS >>>> DRTKVIHLQPHTLFINRVGVSICLQQCDCQTEEWINPSDPPKLFGWQSSTRLELLKLRVKGYRWSTPFSV >>>> FSEGTMRVPVPKEDGTDQLQLRVQVRSGTKNSRYEVIFRPNSISGPYRIENRSMFLPIRYRQVEGVSESW >>>> QFLPPNAAASFYWENLGRRHLFELLVDGNDPSNSEKFDIDKIGDYPPRSESGPTRPIRVTILKEDKKNIV >>>> RISDWMPAIEPTSSISRRLPASSLSELSGNESQQSHLLASEDSEFHVIVELAELGISVIDHAPEEILYMS >>>> VQNLFVAYSTGLGSGLSRFKLRMQGIQVDNQLPLAPMPVLFRPQRTGDKADYILKFSVTLQSNAGLDLRV >>>> YPYIDFQGRENTAFLINIHEPIIWRIHEMIQQANLSRLSDPNSTAVSVDPFIQIGVLNFSEVRFRVSMAM >>>> SPSQRPRGVLGFWSSLMTALGNTENMPVRISERFHENISMRQSTMINNAIRNVKKDLLGQPLQLLSGVDI >>>> LGNASSALGHMSQGIAALSMDKKFIQSRQRQENKGVEDFGDIIREGGGALAKGLFRGVTGILTKPLEGAK >>>> SSGVEGFVSGFGKGIIGAAAQPVSGVLDLLSKTTEGANAMRMKIAAAITSDEQLLRRRLPRAVGADSLLR >>>> PYNDYRAQGQVILQLAESGSFLGQVDLFKVRGKFALTDAYESHFILPKGKVLMITHRRVILLQQPSNIMG >>>> QRKFIPAK! >>>> DACSIQWDILWNDLVTMELSDGKKDPPNSPPSRLILYLKAKPHDPKEQFRVVKCIPNSKQAFDVYSAIDQ >>>> AINLYGQNALKGMVKNKVTRPYSPISESSWAEGASQQMPASVTPSSTFGTSPTTSSS", >>>> rank:"1" -------------------------------------------------- >>>> ============================================= >>>> _______________________________________________ Bioperl-l mailing >>>> list Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From bosborne11 at verizon.net Fri Dec 1 09:55:18 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 01 Dec 2006 09:55:18 -0500 Subject: [Bioperl-l] An announcement Message-ID: bioperl-l, I would like to call your attention to a job posting and in doing so I realize that I?m probably breaking a rule of this list. I apologize and and acknowledge that I?ve transgressed. The reason I do this is because this is an interesting job that is relevant to a lot of what we do in this mailing list, and some of you might want to consider it. The posting is here: http://www.nescent.org/main/employment.html#gmodhelpdesk I encourage you to pass this on to anyone who you think might be interested. Thanks again, Brian O. From cjfields at uiuc.edu Fri Dec 1 11:49:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 10:49:32 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF53E.90907@sheffield.ac.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> <456FF53E.90907@sheffield.ac.uk> Message-ID: On Dec 1, 2006, at 3:26 AM, Nathan S. Haigh wrote: ... > In addition, using CPAN allows you to run the test suite easily > without the need to download it separately and run it after a PPM > install. A PPM, by design, is supposed to imply that the distribution passes tests for the specified platform, at that point in time, after all prereqs are installed and any additional postinstall operations (install C libraries, modify config files, etc) are complete. The ActiveState automated PPM building process dictates that; if it fails any test, it will not be made into a PPM. It's sort of a 'stamp of approval' that all tests pass, so you don't need to run them. However, a test may fail (and a PPM may not get generated) for pretty superficial reasons, such as the makefile not specifying that a module is needed, server issues, etc, so the automated process isn't fullproof. That's why Kobes and the other repositories are available, where the PPM/PPD is manually generated and made to work specifically for Windows (or whatever other platform). Saying that, it is completely up to the person packaging the distribution to follow those rules if one were to make a PPM manually. You don't even have to run tests prior to using 'nmake ppd'. We can currently state, though, that all tests pass when all prereqs are installed for this distribution. At least at this point in time! > I don't know of a way to clean out ActivePerl - I use VMWare > Workstation and have a virtual machine with a fresh install of > WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas? I haven't tried it that way. I have Parallels on Mac OS X (I run a SigmaPlot/Excel combo off it). My tests were using a native WinXP installation (i.e. not virtually) on my old Dell. It shouldn't make a difference; VMWare, Parallels, and the like should all run ActivePerl for WinXP since it's a virtual machine. Windows Vista, on the other hand... I think with PPM4 you can install to a custom directory. It may be possible to install all new modules to that directory, then you would at least have an idea of what was there (though I don't think you can delete it directly w/o screwing up the PPM database). chris From bix at sendu.me.uk Fri Dec 1 12:12:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 17:12:49 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> Message-ID: <45706291.80201@sendu.me.uk> pelikan at cs.pitt.edu wrote: > Hello all, > > I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, > without Cygwin. The "make test"s have all completed without error. This > is my first time dealing with bioperl, so bear with me. > > I've successfully loaded the most recent taxonomy information using the > biosql-schema scripts. After this, I attempted to load the most recent > release of the uniprot flat file dataset with the following command: > > load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass > ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat > > I am subsequently greeted by many of the following errors: > > Could not store Q7N3Q6: I extracted just Q7N3Q6 from ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz and was able to load it in using load_seqdatabase.pl under linux with no errors. If you make a file with just that sequence do you still get the error? Is anyone else able to reproduce the problem? From cjfields at uiuc.edu Fri Dec 1 12:57:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 11:57:18 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <45703985.5050203@sendu.me.uk> Message-ID: <006301c71572$24be8830$15327e82@pyrimidine> > Chris Fields wrote: > PPM). I can > > see using CPAN as an alternative way of installing Bioperl for a > > distribution, or as the primary method via CVS or manually, but not > > for distributions. At least not until the kinks are worked out for > > Windows users. > > CPAN isn't being suggested as the primary or preferred > installation method for Windows. That will still be PPM. I'm > mentioning CPAN / manual installation in the Windows INSTALL > docs for the benefit of anyone who wants a simple install and > test environment when checking out from CVS. That's fine by me. I think the focus is making sure the PPM works, but that shouldn't hold up the final 1.5.2 release. The PPM for previous releases was never released concurrently with the distribution (if at all); it generally followed by a few weeks to a few months past a final release. > > What are the significant issues for a bioperl PPM installation > > None that I'm aware of - I just need to find the time to > start looking into generating an appropriate PPD. Hopefully > Nathan's wiki page on the subject will be all I need. I'll try testing it out today and next week (the more people we have looking into the issue the better). I'm sure that Module::Build hasn't updated to using PPM4 XML formatting, but the tags are similar enough. I can always create a local PPM database using a similar directory structure to bioperl.org/DIST and test an installation from it. chris From n.haigh at sheffield.ac.uk Fri Dec 1 13:52:55 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 18:52:55 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine> References: <006301c71572$24be8830$15327e82@pyrimidine> Message-ID: <45707A07.7000106@sheffield.ac.uk> Chris Fields wrote: >> Chris Fields wrote: >> PPM). I can >> >>> see using CPAN as an alternative way of installing Bioperl for a >>> distribution, or as the primary method via CVS or manually, but not >>> for distributions. At least not until the kinks are worked out for >>> Windows users. >>> >> CPAN isn't being suggested as the primary or preferred >> installation method for Windows. That will still be PPM. I'm >> mentioning CPAN / manual installation in the Windows INSTALL >> docs for the benefit of anyone who wants a simple install and >> test environment when checking out from CVS. >> > > That's fine by me. I think the focus is making sure the PPM works, but that > shouldn't hold up the final 1.5.2 release. The PPM for previous releases > was never released concurrently with the distribution (if at all); it > generally followed by a few weeks to a few months past a final release. > > >>> What are the significant issues for a bioperl PPM installation >>> >> None that I'm aware of - I just need to find the time to >> start looking into generating an appropriate PPD. Hopefully >> Nathan's wiki page on the subject will be all I need. >> > > I'll try testing it out today and next week (the more people we have looking > into the issue the better). I'm sure that Module::Build hasn't updated to > using PPM4 XML formatting, but the tags are similar enough. I can always > create a local PPM database using a similar directory structure to > bioperl.org/DIST and test an installation from it. > > chris > To clarify a few things about PPM4 XML and to highlight the main differences: 1) The use of PROVIDE and REQUIRE tags 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules. 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma separated tuples like PPM3 XML 4) the VERSION in PROVIDE and REQUIRE are used internally to do version comparisons for upgrades and solving prereqs etc 5) Module names should all contain '::' either natively according their namespace, if it doesn't have one natively, then one is appended to the end e.g. "GD::" 6) the VERSION in the SOFTPKG key is for human readability only 7) the NAME in SOFTPKG is used to identify which packages are actually the same. Nath --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 18:52:57 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 13:52:44 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 18:52:44 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: <457079FC.7010209@sendu.me.uk> Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: [snip] >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux with no > errors. If you make a file with just that sequence do you still get the > error? > > Is anyone else able to reproduce the problem? In fact, if I just try and load it again I reproduce the problem. The situation is similar to http://bugzilla.bioperl.org/show_bug.cgi?id=2092 And I have a tentative fix that extends Brian's fix there. Committed to HEAD only atm. I don't know anything about bioperl-db and don't have the faintest clue why this is happening, nor the time to figure it out. Can someone please have a proper look at this and decide if my fix is sane? All I can say is the the test suites for bioperl-live and bioperl-db continue to pass, but that isn't really saying much. PS. having used load_seqdatabase.pl to load a sequence, how do I remove it afterwards? From cjfields at uiuc.edu Fri Dec 1 14:00:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:00:13 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: >> Hello all, >> >> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >> without Cygwin. The "make test"s have all completed without error. >> This >> is my first time dealing with bioperl, so bear with me. >> >> I've successfully loaded the most recent taxonomy information >> using the >> biosql-schema scripts. After this, I attempted to load the most >> recent >> release of the uniprot flat file dataset with the following command: >> >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - >> dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/ > knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux > with no > errors. If you make a file with just that sequence do you still get > the > error? > > Is anyone else able to reproduce the problem? I can reproduce on both WinXP and Mac OS X using the latest bioperl- db/bioperl-live and a BioSQL database preloaded with taxonomy. Notably the bug doesn't show up with a database lacking taxonomy, where no lookup is used (I guess). Here's some overly verbose debugging (apologies): Loading saved.flat ... attempting to load adaptor class for Bio::Seq::RichSeq attempting to load module Bio::DB::BioSQL::RichSeqAdaptor attempting to load adaptor class for Bio::Seq attempting to load module Bio::DB::BioSQL::SeqAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor attempting to load adaptor class for Bio::Species attempting to load module Bio::DB::BioSQL::SpeciesAdaptor instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor attempting to load adaptor class for Bio::Tree::Tree attempting to load module Bio::DB::BioSQL::TreeAdaptor attempting to load adaptor class for Bio::Root::Root attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::Root::RootI attempting to load module Bio::DB::BioSQL::RootIAdaptor attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::Tree::TreeI attempting to load module Bio::DB::BioSQL::TreeIAdaptor attempting to load module Bio::DB::BioSQL::TreeAdaptor attempting to load adaptor class for Bio::Tree::NodeI attempting to load module Bio::DB::BioSQL::NodeIAdaptor attempting to load module Bio::DB::BioSQL::NodeAdaptor attempting to load adaptor class for Bio::Tree::TreeFunctionsI attempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor attempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor no adaptor found for class Bio::Tree::Tree attempting to load adaptor class for Bio::DB::Taxonomy::list attempting to load module Bio::DB::BioSQL::listAdaptor attempting to load adaptor class for Bio::DB::Taxonomy attempting to load module Bio::DB::BioSQL::TaxonomyAdaptor no adaptor found for class Bio::DB::Taxonomy::list attempting to load adaptor class for Bio::Annotation::Collection attempting to load module Bio::DB::BioSQL::CollectionAdaptor attempting to load adaptor class for Bio::AnnotationCollectionI attempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor attempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor attempting to load adaptor class for Bio::Annotation::TypeManager attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for Bio::Annotation::SimpleValue attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor attempting to load adaptor class for Bio::Annotation::Reference attempting to load module Bio::DB::BioSQL::ReferenceAdaptor instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor attempting to load adaptor class for Bio::Annotation::Comment attempting to load module Bio::DB::BioSQL::CommentAdaptor instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor attempting to load adaptor class for Bio::Annotation::DBLink attempting to load module Bio::DB::BioSQL::DBLinkAdaptor instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor attempting to load adaptor class for Bio::PrimarySeq attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor attempting to load adaptor class for Bio::SeqFeature::Generic attempting to load module Bio::DB::BioSQL::GenericAdaptor attempting to load adaptor class for Bio::SeqFeatureI attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor attempting to load adaptor class for Bio::Location::Simple attempting to load module Bio::DB::BioSQL::SimpleAdaptor attempting to load adaptor class for Bio::Location::Atomic attempting to load module Bio::DB::BioSQL::AtomicAdaptor attempting to load adaptor class for Bio::LocationI attempting to load module Bio::DB::BioSQL::LocationIAdaptor attempting to load module Bio::DB::BioSQL::LocationAdaptor instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for BioNamespace attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load driver for adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BioNamespaceAdaptor preparing UK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ? BioNamespaceAdaptor: binding UK column 1 to "Swiss-Prot" (namespace) preparing INSERT statement: INSERT INTO biodatabase (name, authority) VALUES (?, ?) BioNamespaceAdaptor::insert: binding column 1 to "Swiss- Prot" (namespace) BioNamespaceAdaptor::insert: binding column 2 to "" (authority) no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for Bio::DB::BioSQL::SpeciesAdaptor preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND ncbi_taxon_id = ? SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) SpeciesAdaptor: binding UK column 2 to "141679" (ncbi_taxid) prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM taxon node, taxon taxon, taxon_name name WHERE name.taxon_id = node.taxon_id AND taxon.left_value BETWEEN node.left_value AND node.right_value AND taxon.taxon_id = ? AND name.name_class = 'scientific name' ORDER BY node.left_value attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqAdaptor Could not store Q7N3Q6: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The supplied lineage does not start near 'Photorhabdus luminescens subsp. laumondii' STACK: Error::throw STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ Root/Root.pm:359 STACK: Bio::Species::classification /Users/cjfields/src/bioperl-live/ Bio/Species.pm:166 STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:552 STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /Library/ Perl/5.8.6/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:271 STACK: load_seqdatabase.pl:620 ----------------------------------------------------------- at load_seqdatabase.pl line 633 chris From cjfields at uiuc.edu Fri Dec 1 14:01:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:01:59 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <45707A07.7000106@sheffield.ac.uk> References: <006301c71572$24be8830$15327e82@pyrimidine> <45707A07.7000106@sheffield.ac.uk> Message-ID: On Dec 1, 2006, at 12:52 PM, Nathan S. Haigh wrote: > Chris Fields wrote: >>> Chris Fields wrote: >>> PPM). I can >>>> see using CPAN as an alternative way of installing Bioperl for a >>>> distribution, or as the primary method via CVS or manually, but >>>> not for distributions. At least not until the kinks are worked >>>> out for Windows users. >>>> >>> CPAN isn't being suggested as the primary or preferred >>> installation method for Windows. That will still be PPM. I'm >>> mentioning CPAN / manual installation in the Windows INSTALL docs >>> for the benefit of anyone who wants a simple install and test >>> environment when checking out from CVS. >>> >> >> That's fine by me. I think the focus is making sure the PPM >> works, but that >> shouldn't hold up the final 1.5.2 release. The PPM for previous >> releases >> was never released concurrently with the distribution (if at all); it >> generally followed by a few weeks to a few months past a final >> release. >> >> >>>> What are the significant issues for a bioperl PPM installation >>>> >>> None that I'm aware of - I just need to find the time to start >>> looking into generating an appropriate PPD. Hopefully Nathan's >>> wiki page on the subject will be all I need. >>> >> >> I'll try testing it out today and next week (the more people we >> have looking >> into the issue the better). I'm sure that Module::Build hasn't >> updated to >> using PPM4 XML formatting, but the tags are similar enough. I can >> always >> create a local PPM database using a similar directory structure to >> bioperl.org/DIST and test an installation from it. >> >> chris >> > > To clarify a few things about PPM4 XML and to highlight the main > differences: > > 1) The use of PROVIDE and REQUIRE tags > 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules. > 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma > separated tuples like PPM3 XML > 4) the VERSION in PROVIDE and REQUIRE are used internally to do > version comparisons for upgrades and solving prereqs etc > 5) Module names should all contain '::' either natively according > their namespace, if it doesn't have one natively, then one is > appended to the end e.g. "GD::" > 6) the VERSION in the SOFTPKG key is for human readability only > 7) the NAME in SOFTPKG is used to identify which packages are > actually the same. > > Nath Okay. Maybe place this in the wiki (PPM page) for future reference? Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Fri Dec 1 14:05:38 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 19:05:38 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine> References: <006301c71572$24be8830$15327e82@pyrimidine> Message-ID: <45707D02.9070504@sheffield.ac.uk> Chris Fields wrote: >> Chris Fields wrote: >> PPM). I can >> >>> see using CPAN as an alternative way of installing Bioperl for a >>> distribution, or as the primary method via CVS or manually, but not >>> for distributions. At least not until the kinks are worked out for >>> Windows users. >>> >> CPAN isn't being suggested as the primary or preferred >> installation method for Windows. That will still be PPM. I'm >> mentioning CPAN / manual installation in the Windows INSTALL >> docs for the benefit of anyone who wants a simple install and >> test environment when checking out from CVS. >> > > That's fine by me. I think the focus is making sure the PPM works, but that > shouldn't hold up the final 1.5.2 release. The PPM for previous releases > was never released concurrently with the distribution (if at all); it > generally followed by a few weeks to a few months past a final release. > > Forgot to say, one really annoying thing about PPM is that it seems to display all the versions of Bioperl defined in the XML file. An addition, I think a bug in PPM4 means that if a package is available in ActiveStates repo PPM4 always want to install it rather than a more recent version in a different repo (this includes upgrades). This results in this annoying behaviour: 1) If activestate and bioperl repos are active, searching for bioperl lists several versions 2) If you are using PPM4 GUI, and have installed a non activestate version, then it says you can upgrade to the version in activestates repo (even if it's actually a downgrade). 3) Using ppm-shell, if you choose "install bioperl" or "upgrade bioperl" it will always install the version in the activestate repo. 4) I'm sure there are also some other annoyances. In the end, it means the best way to install and upgrade bioperl, is to search for bioperl packages and install the latest version by eye rather than relying in the "upgrade feature" (at least for the time being). You may also need to remove an old version of bioperl before installing a more recent version. NOTE: using "upgrade" runs the risk of installing bioperl 1.2.3 from activestate and not the latest version in any other repo! I'll update the wiki when I have time. Nath >>> What are the significant issues for a bioperl PPM installation >>> >> None that I'm aware of - I just need to find the time to >> start looking into generating an appropriate PPD. Hopefully >> Nathan's wiki page on the subject will be all I need. >> > > I'll try testing it out today and next week (the more people we have looking > into the issue the better). I'm sure that Module::Build hasn't updated to > using PPM4 XML formatting, but the tags are similar enough. I can always > create a local PPM database using a similar directory structure to > bioperl.org/DIST and test an installation from it. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0652-4, 30/11/2006 > Tested on: 01/12/2006 18:29:23 > avast! - copyright (c) 1988-2006 ALWIL Software. > http://www.avast.com > > > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 19:05:39 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From cjfields at uiuc.edu Fri Dec 1 14:06:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:06:53 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: >> Hello all, >> >> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >> without Cygwin. The "make test"s have all completed without error. >> This >> is my first time dealing with bioperl, so bear with me. >> >> I've successfully loaded the most recent taxonomy information >> using the >> biosql-schema scripts. After this, I attempted to load the most >> recent >> release of the uniprot flat file dataset with the following command: >> >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - >> dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/ > knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux > with no > errors. If you make a file with just that sequence do you still get > the > error? > > Is anyone else able to reproduce the problem? Okay, just updated to get your latest CVS fixes for bioperl-live and it passes now for both Mac OS X and WinXP. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Dec 1 14:09:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:09:15 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457079FC.7010209@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk> Message-ID: On Dec 1, 2006, at 12:52 PM, Sendu Bala wrote: > > PS. having used load_seqdatabase.pl to load a sequence, how do I > remove > it afterwards? There's not much documentation on it, but it demonstrated several times in the test suite. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Dec 1 14:39:17 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 19:39:17 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> Message-ID: <457084E5.2050300@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > >> pelikan at cs.pitt.edu wrote: >>> Hello all, >>> >>> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >>> without Cygwin. The "make test"s have all completed without error. This >>> is my first time dealing with bioperl, so bear with me. >>> >>> I've successfully loaded the most recent taxonomy information >>> using the >>> biosql-schema scripts. After this, I attempted to load the most recent >>> release of the uniprot flat file dataset with the following command: >>> >>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass >>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >>> >>> I am subsequently greeted by many of the following errors: >>> >>> Could not store Q7N3Q6: >> >> I extracted just Q7N3Q6 from >> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz >> >> and was able to load it in using load_seqdatabase.pl under linux with no >> errors. If you make a file with just that sequence do you still get the >> error? >> >> Is anyone else able to reproduce the problem? > > Okay, just updated to get your latest CVS fixes for bioperl-live and it > passes now for both Mac OS X and WinXP. Can you confirm if it is actually working correctly though? Like, having stored a previously-problem sequence, can you get it back out from the database and is its ->species() correct? From cjfields at uiuc.edu Fri Dec 1 14:52:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:52:13 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457084E5.2050300@sendu.me.uk> Message-ID: <000001c71582$329d4d50$15327e82@pyrimidine> > > > > Okay, just updated to get your latest CVS fixes for > bioperl-live and > > it passes now for both Mac OS X and WinXP. > > Can you confirm if it is actually working correctly though? > Like, having stored a previously-problem sequence, can you > get it back out from the database and is its ->species() correct? I would assume so, if we can trust the species tests. I will have to try it again over the weekend. I planned on loading a ton of protein sequences in anyway, most of which are bacterial; if anything breaks it will probably be with those. I think Jason and Hilmar were going to get together about the BioSQL paper at the hackathon. That may be a good place to bring some of the species issues, if they persist. chris From hlapp at gmx.net Fri Dec 1 20:42:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Dec 2006 20:42:05 -0500 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457079FC.7010209@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk> Message-ID: <8414723F-BA02-4936-8F53-781276C3B526@gmx.net> Either using SQL: -- theoretically you should convince yourself first that there -- is only one such record (the UK is over acc,version,namespace) SQL> DELETE FROM bioentry WHERE accession = 'Q7N3Q6'; or through bioperl-db (see the delete test for examples): my $db = Bio::DB::BioDB->new(....); my $seq = Bio::PrimarySeq->new(-accession_number=>'Q7N3Q6', -namespace=>'whatever you used when loading'); my $adp = $db->get_persistence_adaptor($seq); my $pseq = $adp->find_by_unique_key($seq); $pseq->remove(); $pseq->commit(); -hilmar On Dec 1, 2006, at 1:52 PM, Sendu Bala wrote: > PS. having used load_seqdatabase.pl to load a sequence, how do I > remove > it afterwards? -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From chhalling at verizon.net Sun Dec 3 20:56:51 2006 From: chhalling at verizon.net (Conrad Halling) Date: Sun, 03 Dec 2006 20:56:51 -0500 Subject: [Bioperl-l] BioPerl Wiki is down Message-ID: <45738063.1070504@verizon.net> When I attempted to navigate to http://www.bioperl.org/, I got the following message: A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "MediaWikiBagOStuff::_doquery". MySQL returned error "1205: Lock wait timeout exceeded; try restarting transaction (localhost)". -- Conrad Halling chhalling at verizon.net From rbirnie at totalise.co.uk Sun Dec 3 16:38:02 2006 From: rbirnie at totalise.co.uk (richard) Date: Sun, 3 Dec 2006 21:38:02 +0000 Subject: [Bioperl-l] confused by Bio::Graphics Message-ID: <200612032138.02522.rbirnie@totalise.co.uk> Hi all, I'm having a little trouble getting Bio::Graphics to give me the correct output and I'm looking for some help. I am trying to extend from example 5 of the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. Eventually I intend the script to follow example 6 but I thought I'd try the simpler version first. The basic aim of the script is that it takes as input a file containing a list of GenBank IDs plus some other info for alternative transcripts of a gene. This information is stored in a hash and the GenBank IDs are used to retrieve the appropriate entries from GenBank. I then want to use Bio::Graphics to generate a figure from the feature tables showing the CDSs from the alternative transcripts. So far I have managed to retrieve the GenBank entries extract the feature tables and store a reference to these in the hash mentioned above. I've also got Bio::Graphics to draw a basic image but some of the details aren't right and I don't understand why. I have attached the code I have so far, the input file and the output image to this mail. I didn't want to display it all in the main message but I'm not actually sure which bit is causing the problem. The code is very rough and in need of polishing but I need to get it to work correctly first. These are the problems: 1) As I understand it this: my $wholeseq = Bio::SeqFeature::Generic->new ( -start => 1, -end => $refseq->length, -display_name =>$refseq->display_name ); should display the name of the gene (CD133/Prominin1) near the top of image. It doesn't, am I misunderstanding or is there an error in the code? 2) In the quoted example the CDS is broken up into smaller regions which are then linked together in example 6. This isn't happening in my code and I think it should be, I get one solid block for the CDS. I don't understand why this is because I'm not clear which parts of the feature table are used to define where the CDS should be split. I think this is the relevant bit of code: foreach my $alt_trans (keys %main) { foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { my $feature = $main{$alt_trans}{'features'}{$tag}; $panel->add_track($feature, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'black', -key => $alt_trans, -bump => +1, -height => 8, -label => 1, -description => 1, ) if ($tag eq 'CDS'); } } Can anyone tell me what I am doing wrong? RefSeq entry for the gene of interest is here: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386 If I understand correctly the example file used in the HOWTO is this gene: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=116805320 Final question, does bioperl come with example scripts and is so where whould they normally be found on a Linux system? If anyone is still reading this thanks for your patience. Any clarification will be appreciated. regards, Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: CD133_graphic_code Type: application/x-perl Size: 2702 bytes Desc: not available URL: -------------- next part -------------- sequence_ID Exon_Boundary Assay_location Amplicon_length NM_006017 9 - 10 1118 106 AF027208.1 9 - 10 1118 106 AK027420.1 9 - 10 1312 106 AK027422.1 9 - 10 1334 106 BC012089.1 9 - 10 1289 106 AY449689.1 8 - 9 1054 106 AY449690.1 8 - 9 1054 106 AY449691.1 8 - 9 1054 106 AY449692.1 9 - 10 1081 106 AY449693.1 9 - 10 1081 106 AF507034.1 8 - 9 1091 106 AK075411.1 9 - 10 1289 106 AF117225.1 9 - 10 1334 106 AK226033.1 - 1312 106 DQ895452.1 - 1054 106 -------------- next part -------------- A non-text attachment was scrubbed... Name: CD133.png Type: image/png Size: 4322 bytes Desc: not available URL: From cjfields at uiuc.edu Sun Dec 3 22:35:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 3 Dec 2006 21:35:17 -0600 Subject: [Bioperl-l] BioPerl Wiki is down In-Reply-To: <45738063.1070504@verizon.net> References: <45738063.1070504@verizon.net> Message-ID: <41422FC7-B579-4B45-B8CC-341B8F462BCB@uiuc.edu> On Dec 3, 2006, at 7:56 PM, Conrad Halling wrote: > When I attempted to navigate to http://www.bioperl.org/, I got the > following message: > > A database query syntax error has occurred. This may indicate a bug in > the software. The last attempted database query was: > > (SQL query hidden) > > from within function "MediaWikiBagOStuff::_doquery". MySQL returned > error "1205: Lock wait timeout exceeded; try restarting transaction > (localhost)". > > -- Conrad Halling > chhalling at verizon.net This has been an ongoing problem with the server; I have reported it previously to open-bio support. There have been a few attempts to fix it which seem to work short-term but something else must be wrong. Jason? Chris D? For my part, Googling found the following link, which indicates that this error may be due to heavy server load: http://tibia.erig.net/TibiaWiki:Bug_reports Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Derek.Fairley at bll.n-i.nhs.uk Mon Dec 4 05:18:37 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Mon, 4 Dec 2006 10:18:37 -0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk> Message-ID: Richard, You can find instructions for installing the example scripts directory here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_SCRIPTS or you can get individual scripts from here: http://www.bioperl.org/wiki/Bioperl_scripts11 Derek. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of richard Sent: 03 December 2006 21:38 To: Bioperl list Subject: [Bioperl-l] confused by Bio::Graphics Hi all, I'm having a little trouble getting Bio::Graphics to give me the correct output and I'm looking for some help. I am trying to extend from example 5 of the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. Eventually I intend the script to follow example 6 but I thought I'd try the simpler version first. The basic aim of the script is that it takes as input a file containing a list of GenBank IDs plus some other info for alternative transcripts of a gene. This information is stored in a hash and the GenBank IDs are used to retrieve the appropriate entries from GenBank. I then want to use Bio::Graphics to generate a figure from the feature tables showing the CDSs from the alternative transcripts. So far I have managed to retrieve the GenBank entries extract the feature tables and store a reference to these in the hash mentioned above. I've also got Bio::Graphics to draw a basic image but some of the details aren't right and I don't understand why. I have attached the code I have so far, the input file and the output image to this mail. I didn't want to display it all in the main message but I'm not actually sure which bit is causing the problem. The code is very rough and in need of polishing but I need to get it to work correctly first. These are the problems: 1) As I understand it this: my $wholeseq = Bio::SeqFeature::Generic->new ( -start => 1, -end => $refseq->length, -display_name =>$refseq->display_name ); should display the name of the gene (CD133/Prominin1) near the top of image. It doesn't, am I misunderstanding or is there an error in the code? 2) In the quoted example the CDS is broken up into smaller regions which are then linked together in example 6. This isn't happening in my code and I think it should be, I get one solid block for the CDS. I don't understand why this is because I'm not clear which parts of the feature table are used to define where the CDS should be split. I think this is the relevant bit of code: foreach my $alt_trans (keys %main) { foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { my $feature = $main{$alt_trans}{'features'}{$tag}; $panel->add_track($feature, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'black', -key => $alt_trans, -bump => +1, -height => 8, -label => 1, -description => 1, ) if ($tag eq 'CDS'); } } Can anyone tell me what I am doing wrong? RefSeq entry for the gene of interest is here: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386 If I understand correctly the example file used in the HOWTO is this gene: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=1168053 20 Final question, does bioperl come with example scripts and is so where whould they normally be found on a Linux system? If anyone is still reading this thanks for your patience. Any clarification will be appreciated. regards, Richard From rbirnie at totalise.co.uk Mon Dec 4 04:30:36 2006 From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk) Date: 04 Dec 2006 09:30:36 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From bix at sendu.me.uk Mon Dec 4 09:37:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 14:37:16 +0000 Subject: [Bioperl-l] BLASTing with a seqio/seq object... In-Reply-To: <45706671.9000201@york.ac.uk> References: <01ba01c714a2$b9659c10$15327e82@pyrimidine> <456F27E9.70205@york.ac.uk> <456FEF22.4090004@sendu.me.uk> <45706671.9000201@york.ac.uk> Message-ID: <4574329C.2030905@sendu.me.uk> Samantha Thompson wrote: > Hi, > Thanks for all your help so far, I am still trying to understand a > couple of things... You should make sure your replies are sent to the list, as you're likely to get a faster response. [where $blast_report is the value returned by Bio::Tools::Run::RemoteBlast->submit_blast($seq_object)] > when I run this line.. > > $searchio = Bio::SearchIO->new(-format => 'blast', > -file => $blast_report); > > between submitting the blast search and trying to to process the searchio object like I was attempting before I get the following errors back: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open 1: No such file or directory [snip] > Does this mean that my BLAST is failing when I submit it? No, the -file option of SearchIO->new() takes, unsurprisingly, a filename. I'd tell you to pay careful attention to the docs, but sadly the RemoteBlast docs are currently wrong. submit_blast() claims to return 'Blast report object' (which in any case certainly wouldn't be a filename) when in fact it returns, as you discovered, a (for our purposes) meaningless number. As I suggested before, you need to look at the synopsis for Bio::Tools::Run::RemoteBlast instead. (having called submit_blast you must do the each_rid loop) Does anyone care to go through the POD for RemoteBlast and update it to an accurate state? From bix at sendu.me.uk Mon Dec 4 09:40:27 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 14:40:27 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: References: Message-ID: <4574335B.805@sendu.me.uk> rbirnie at totalise.co.uk wrote: > Hi all, > > I've just seen my previous mail come through on the digest and I noticed > that the code I attached has been scrubbed which means that the message > won't make much sense. If I've contravened list rules by posting > attachments then apologies, I did look for a posting guide but couldn't > see one on the wiki. I deliberatley didn't put the whole code in the > main message because it's quite long. I'm not sure which part is wrong > so I don't know which part to post I'm just not seeing the output I > would expect from the example. What is the best thing for me to do? I saw a few attachments on your post (including your code example), so I think what you did was fine. From cjfields at uiuc.edu Mon Dec 4 10:40:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 09:40:20 -0600 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <4574335B.805@sendu.me.uk> Message-ID: <002001c717ba$823c1500$15327e82@pyrimidine> > rbirnie at totalise.co.uk wrote: > > Hi all, > > > > I've just seen my previous mail come through on the digest and I > > noticed that the code I attached has been scrubbed which means that > > the message won't make much sense. If I've contravened list > rules by > > posting attachments then apologies, I did look for a > posting guide but > > couldn't see one on the wiki. I deliberatley didn't put the > whole code > > in the main message because it's quite long. I'm not sure > which part > > is wrong so I don't know which part to post I'm just not seeing the > > output I would expect from the example. What is the best > thing for me to do? > > I saw a few attachments on your post (including your code > example), so I think what you did was fine. Same here. I received a PNG file and two text files (a script and a data file). chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From rbirnie at totalise.co.uk Mon Dec 4 11:06:51 2006 From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk) Date: 04 Dec 2006 16:06:51 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <002001c717ba$823c1500$15327e82@pyrimidine> References: <002001c717ba$823c1500$15327e82@pyrimidine> Message-ID: An HTML attachment was scrubbed... URL: From dmessina at wustl.edu Mon Dec 4 11:46:16 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 4 Dec 2006 10:46:16 -0600 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk> References: <200612032138.02522.rbirnie@totalise.co.uk> Message-ID: Hi Richard, > [richard] > > These are the problems: > 1) As I understand it this: > > my $wholeseq = Bio::SeqFeature::Generic->new ( > -start => 1, > -end => $refseq->length, > -display_name =>$refseq->display_name > ); > > should display the name of the gene (CD133/Prominin1) near the top > of image. > It doesn't, am I misunderstanding or is there an error in the code? The contents of a sequence object's display_name varies depending on the type of sequence record; for a sequence object created from a Genbank record, it's the value of the LOCUS field on the first line of the record. If you want the gene name, you'll have to dig it out of the feature table. If you look at the Genbank record for your first sequence, you'll see that under both the gene and CDS primary features, the HUGO gene abbreviation is stored under the "gene" secondary tag, and various synonyms are under the "note" and "product" secondary tags. LOCUS NM_006017 3794 bp mRNA linear PRI 17-NOV-2006 DEFINITION Homo sapiens prominin 1 (PROM1), mRNA. ACCESSION NM_006017 VERSION NM_006017.1 GI:5174386 [...skipping irrelevant part of the Genbank record...] FEATURES Location/Qualifiers source 1..3794 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /chromosome="4" /map="4p15.32" gene 1..3794 /gene="PROM1" /note="prominin 1; synonyms: AC133, CD133, PROML1, MSTP061" /db_xref="GeneID:8842" /db_xref="HGNC:9454" /db_xref="HPRD:HPRD_05079" /db_xref="MIM:604365" CDS 38..2635 /gene="PROM1" /go_component="integral to plasma membrane [pmid 9389720]; membrane" /go_process="response to stimulus; visual perception" /note="hProminin; prominin (mouse)-like 1; hematopoietic stem cell antigen" /codon_start=1 /product="prominin 1" /protein_id="NP_006008.1" /db_xref="GI:5174387" /db_xref="GeneID:8842" /db_xref="HGNC:9454" /db_xref="HPRD:HPRD_05079" /db_xref="MIM:604365" [....more...] In your script, you grab the primary features between lines 34-60. You can grab the secondary feature you want with something like: [cribbed from the Feature-Annotation HOWTO] for my $feat_object ($seq_object->get_SeqFeatures) { push @ids, $feat_object->get_tag_values("gene") if ($feat_object- >has_tag("gene")); } > 2) In the quoted example the CDS is broken up into smaller regions > which are > then linked together in example 6. This isn't happening in my code > and I > think it should be, I get one solid block for the CDS. I don't > understand why > this is because I'm not clear which parts of the feature table are > used to > define where the CDS should be split. I think this is the relevant > bit of > code: > > foreach my $alt_trans (keys %main) { > foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { > > my $feature = $main{$alt_trans}{'features'}{$tag}; > > $panel->add_track($feature, > -glyph => 'generic', > -bgcolor => $colors[$idx++ % @colors], > -fgcolor => 'black', > -font2color => 'black', > -key => $alt_trans, > -bump => +1, > -height => 8, > -label => 1, > -description => 1, > ) if ($tag eq 'CDS'); > > } > } The problem here is that RefSeq mRNA records don't contain intron- exon boundary information. I think you'll have to get that from an assembly record. From the Entrez gene page for PROM1, I obtained a Genbank record for the PROM1 genomic locus: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? val=NC_000004.10&from=15578955&to=15686664&strand=2&dopt=gb Saving that as 'PROM1.gb' (the suffix is important), and running the bp_embl2picture.pl script on it, I got an image similar to Figure 6 (attached). Hope this helps, Dave ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PROM1.png Type: image/png Size: 8646 bytes Desc: not available URL: From bix at sendu.me.uk Mon Dec 4 14:37:13 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 19:37:13 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <000001c717db$3ca7b910$15327e82@pyrimidine> References: <000001c717db$3ca7b910$15327e82@pyrimidine> Message-ID: <457478E9.3060405@sendu.me.uk> Chris Fields wrote: > Sendu, > > Are current plans to still try getting the final 1.5.2 release out > before the hackathon next week? Yes, I seriously hope so. I was kind of hoping to see test results from you and Nathan on the wiki though... > There are a few commits I want to make, but I may wait until after > 1.5.2 is out before I add them. But don't let the release stop you. As long as you don't commit to the 1.5.2 branch it will be fine. From cjfields at uiuc.edu Mon Dec 4 14:34:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 13:34:34 -0600 Subject: [Bioperl-l] Timeline on the 1.5.2 release? Message-ID: <000001c717db$3ca7b910$15327e82@pyrimidine> Sendu, Are current plans to still try getting the final 1.5.2 release out before the hackathon next week? There are a few commits I want to make, but I may wait until after 1.5.2 is out before I add them. chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Dec 4 15:23:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 14:23:45 -0600 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> Message-ID: <000001c717e2$19d18e00$15327e82@pyrimidine> > Chris Fields wrote: > > Sendu, > > > > Are current plans to still try getting the final 1.5.2 release out > > before the hackathon next week? > > Yes, I seriously hope so. I was kind of hoping to see test > results from you and Nathan on the wiki though... Ah, forgot to post those! Working on that now... > > There are a few commits I want to make, but I may wait until after > > 1.5.2 is out before I add them. > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. There are a few things I plan on adding over the next few weeks, including some things for Bio::Location::SplitLocation. However I'm sure some of the latter will break tests, so I'll be adding it in a bit at a time. It all depends when I can squeeze time in to work on them! chris From pelikan at cs.pitt.edu Mon Dec 4 17:34:59 2006 From: pelikan at cs.pitt.edu (pelikan at cs.pitt.edu) Date: Mon, 4 Dec 2006 17:34:59 -0500 (EST) Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries Message-ID: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> Hello, My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, and the latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB memory. "make test"s past fine. The problem is that I'm not getting similar numbers of anything when I load datasets using load_seqdatabase.pl. For instance, if I want to load only protiens from Homo Sapiens, I go to UniProt, use the database search function, do a text search for Homo Sapiens (returns 70914 hits), export the hits to flat file format (--format swiss) using the data set manager, and load it using load_seqdatabase.pl. The result of "select count(*) from bioentry;" results in only 1003 entries. Moreover it seems like the entries don't go past the B's in the alphabet - I can't find bioentry.descriptions like '%cytochrome%' or '%myoglobin%', but I can find apolipoproteins, for example. I know this is an annoying question, but if someone has more experience in dealing with this issue, I would be grateful for any assistance. I don't get any error messages, so it's difficult for me to tell what's going on. -Richard From n.haigh at sheffield.ac.uk Tue Dec 5 01:53:34 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 05 Dec 2006 06:53:34 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> Message-ID: <4575176E.3020906@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: > >> Sendu, >> >> Are current plans to still try getting the final 1.5.2 release out >> before the hackathon next week? >> > > Yes, I seriously hope so. I was kind of hoping to see test results from > you and Nathan on the wiki though... > > > OK, I'll get onto this today. >> There are a few commits I want to make, but I may wait until after >> 1.5.2 is out before I add them. >> > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > A: Yes. >> Q: Are you sure? >> >>> A: Because it reverses the logical flow of conversation. >>> >>>> Q: Why is top posting frowned upon? >>>> Get Thunderbird From n.haigh at sheffield.ac.uk Tue Dec 5 06:43:16 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 05 Dec 2006 11:43:16 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> Message-ID: <45755B54.7080902@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: > >> Sendu, >> >> Are current plans to still try getting the final 1.5.2 release out >> before the hackathon next week? >> > > Yes, I seriously hope so. I was kind of hoping to see test results from > you and Nathan on the wiki though... > > > I've added my test results for Debian to the wiki. Nath >> There are a few commits I want to make, but I may wait until after >> 1.5.2 is out before I add them. >> > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > A: Yes. >> Q: Are you sure? >> >>> A: Because it reverses the logical flow of conversation. >>> >>>> Q: Why is top posting frowned upon? >>>> Get Thunderbird From bix at sendu.me.uk Tue Dec 5 06:47:06 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 05 Dec 2006 11:47:06 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <45755B54.7080902@sheffield.ac.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> <45755B54.7080902@sheffield.ac.uk> Message-ID: <45755C3A.9050903@sendu.me.uk> Nathan S. Haigh wrote: > Sendu Bala wrote: >> Chris Fields wrote: >> >>> Sendu, >>> >>> Are current plans to still try getting the final 1.5.2 release out >>> before the hackathon next week? >>> >> Yes, I seriously hope so. I was kind of hoping to see test results from >> you and Nathan on the wiki though... > > I've added my test results for Debian to the wiki. Thanks (and to Chris as well). I can't tell you how much I loath and despise TCoffee and Tmhmm now ;) From cjfields at uiuc.edu Tue Dec 5 11:04:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Dec 2006 10:04:38 -0600 Subject: [Bioperl-l] Build.PL changes Message-ID: <001b01c71887$10be3160$15327e82@pyrimidine> Sendu, I think the Build.PL commits which force installation of XML::SAX::Expat should be rolled back. XML::Simple works with any XML::SAX backend, not just XML::SAX::Expat, which hasn't been actively maintained since 2003 and is deprecated in favor of XML::SAX::ExpatXS. In fact, forcing XML::SAX::Expat to install as the default XML::SAX backend currently breaks blastxml parsing. Note that forcing this also forces one to install the Expat library (now at v 2), which now has some compatibility problems with XML::SAX::Expat (but not ExpatXS). Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From qetzal at tutopia.com.br Wed Dec 6 10:21:20 2006 From: qetzal at tutopia.com.br (giovani) Date: Wed, 06 Dec 2006 10:21:20 -0500 Subject: [Bioperl-l] Biodiversity graphic Message-ID: An HTML attachment was scrubbed... URL: From benoit at ebi.ac.uk Wed Dec 6 12:30:12 2006 From: benoit at ebi.ac.uk (Benoit Ballester) Date: Wed, 06 Dec 2006 17:30:12 +0000 Subject: [Bioperl-l] Biodiversity graphic In-Reply-To: References: Message-ID: <4576FE24.1030807@ebi.ac.uk> giovani wrote: > > Hello there. I'm trying to write a programa to set a graphic with two > axis and two data sets to each axis. Anyone know some tool similar to > the GD module to set this graphic, because with GD I'm having troubles. > here is an example of what I want to do: > http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm > using with GD module. It looks to me that the graph you pointing too has been made by gnuplot. Why don't you use gnuplot or R instead ? Ben > > #!/usr/bin/perl -w > > use GD::Graph::mixed; > @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 3, 4, 14, 30, 12, 8, 7, 20, 15], > [ 2, 8, 2, 5, 3, 1, 3, 4, 1], > [ 5, 12, 24, 33, 19, 8, 6, 15, 21], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > ); > > $my_graph = new GD::Graph::mixed( ); > $my_graph->set( > x_label => 'X Label', > y1_label => 'Y1 label', > y2_label => 'Y2 label', > title => 'Using two axes', > y1_max_value => 40, > y2_max_value => 8, > y_tick_number => 8, > y_label_skip => 2, > long_ticks => 1, > two_axes => 1, > use_axis => [1,2,1,2], > legend_placement => 'BR', > x_labels_vertical => 1, > x_label_position => 1/2, > ); > > $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY'); > my $gd = $my_graph->plot(\@data) or die $my_graph->error; > open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n"; > binmode IMG; > print IMG $gd->gif; > close IMG; > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gwu at molbio.mgh.harvard.edu Wed Dec 6 16:12:57 2006 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Wed, 06 Dec 2006 16:12:57 -0500 Subject: [Bioperl-l] Biodiversity graphic In-Reply-To: References: Message-ID: <45773259.3010405@molbio.mgh.harvard.edu> Do you mean the GD code can not run or it does not generate image as you wanted? Gang giovani wrote: > > > Hello there. I'm trying to write a programa to set a graphic with two > axis and two data sets to each axis. Anyone know some tool similar to > the GD module to set this graphic, because with GD I'm having > troubles. here is an example of what I want to do: > http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm > using with GD module. > > #!/usr/bin/perl -w > > use GD::Graph::mixed; > @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 3, 4, 14, 30, 12, 8, 7, 20, 15], > [ 2, 8, 2, 5, 3, 1, 3, 4, 1], > [ 5, 12, 24, 33, 19, 8, 6, 15, 21], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > ); > > $my_graph = new GD::Graph::mixed( ); > $my_graph->set( > x_label => 'X Label', > y1_label => 'Y1 label', > y2_label => 'Y2 label', > title => 'Using two axes', > y1_max_value => 40, > y2_max_value => 8, > y_tick_number => 8, > y_label_skip => 2, > long_ticks => 1, > two_axes => 1, > use_axis => [1,2,1,2], > legend_placement => 'BR', > x_labels_vertical => 1, > x_label_position => 1/2, > ); > > $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY'); > my $gd = $my_graph->plot(\@data) or die $my_graph->error; > open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n"; > binmode IMG; > print IMG $gd->gif; > close IMG; > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Wed Dec 6 17:39:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 06 Dec 2006 22:39:49 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 Release Message-ID: <457746B5.2020006@sendu.me.uk> I am proud to announce the final release of Bioperl 1.5.2. http://www.bioperl.org/wiki/Release_1.5.2 bioperl (core): cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-1.5.2_100.zip bioperl-run: cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip bioperl-db: cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip bioperl-network: cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip http://bioperl.org/DIST/SIGNATURES.md5 (all are also available via CVS, and for Windows users, using the Perl Package Manager - see the wiki for details) The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree and bioperl-pipeline) did not see a unified release for 1.5.2. This release represents a developer release which has been thoroughly tested. We consider it the most stable (in terms of bugs) version of Bioperl and believe it to be suitable for most people. It is marked 'developer' or even 'unstable' because its API may change on short notice. It will also not be maintained or supported beyond the next bioperl release. 1.5.2 introduces the following new (core) features: * Taxonomy (Bio::Species) overhaul * Bio::Map improvements * Bio::SearchIO speedup * Build.PL installation For details, and a complete change log, see the wiki. API documentation is available here: http://doc.bioperl.org/ Acknowledgements: Enumerable thanks are due for the tireless efforts of Christopher Fields (bug fixing, testing, documentation, discussion), Nathan Haigh (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra (testing, documentation, support). Feedback and ideas provided by Hilmar Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and elsewhere proved invaluable. None of this would have been possible without the behind-the-scenes work of the open-bio support team. I'd also like to acknowledge Andreas J. Koenig for his help with CPAN matters. Finally, thank you to everyone who tried out the release candidates, and especially those that took the time to file bug reports or report problems. Remember, Bioperl can only go from strength to strength with /your/ help. If you'd like to experience the fame and fortune that naturally follow becoming a Bioperl developer (?!), become one! http://www.bioperl.org/wiki/Becoming_a_developer On behalf of the Bioperl team, Sendu Bala. From cjfields at uiuc.edu Wed Dec 6 21:30:44 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 6 Dec 2006 20:30:44 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c719a7$b48beb90$15327e82@pyrimidine> Great job Sendu! A bit of icing on the cake: all the WinXP PPMs (core, db, network, run) installed w/o a hitch following normal instructions using PPM4 (GUI and command line shell) using clean ActiveState installations. Looks like all the correct prereqs were installed with shell (only XML::SAX::ExpatXS was left out in the GUI installation for reasons outlined before). I'll run more tests tomorrow to see if tests pass with the installed bioperl (this should catch any prereq issues with PPM installation we missed). chris > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using > the Perl Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, > bioperl-pedigree and bioperl-pipeline) did not see a unified > release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of > Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas > provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the > mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with > CPAN matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or > report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. From hlapp at gmx.net Wed Dec 6 22:20:14 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Dec 2006 22:20:14 -0500 Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries In-Reply-To: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> References: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> Message-ID: <8E15592D-6475-4A4D-BA6D-BD669C4233C3@gmx.net> I seriously doubt that load_seqdatabase.pl would have deliberately stopped loading the file. Either there was an error in loading an entry (which you should see, and you can also ask the script to just keep going by providing the --safe option), or the file only contained 1003 entries. Note that you can get progress logging by using the --logchunk option, which will also give you a final count of the number of sequences loaded. I'm not sure how you ran your search and your download on Uniprot. If I try what you describe I get 70491 hits, and if I try to export them using the data set manager I get the message: This download mechanism only supports 1000 proteins. The first 1000 proteins have been added from the selected. Which perfectly explains what you see. Did you convince yourself that the file contains 70491 entries? If you don't have grep and wc on your windows machine, you can use perl one-liners directly, e.g., perl -n -e '/^ID / && ++$n; END {print "$n entries\n";}' -hilmar On Dec 4, 2006, at 5:34 PM, pelikan at cs.pitt.edu wrote: > Hello, > > My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, > and the > latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB > memory. "make test"s past fine. > > The problem is that I'm not getting similar numbers of anything when I > load datasets using load_seqdatabase.pl. For instance, if I want to > load > only protiens from Homo Sapiens, > I go to UniProt, > use the database search function, > do a text search for Homo Sapiens (returns 70914 hits), > export the hits to flat file format (--format swiss) using the data > set > manager, > and load it using load_seqdatabase.pl. > > The result of "select count(*) from bioentry;" results in only > 1003 entries. > Moreover it seems like the entries don't go past the B's in the > alphabet - > I can't find bioentry.descriptions like '%cytochrome%' or '% > myoglobin%', > but I can find apolipoproteins, for example. > > I know this is an annoying question, but if someone has more > experience in > dealing with this issue, I would be grateful for any assistance. I > don't > get any error messages, so it's difficult for me to tell what's > going on. > > -Richard > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lzhtom at hotmail.com Wed Dec 6 22:13:47 2006 From: lzhtom at hotmail.com (zhihua li) Date: Thu, 07 Dec 2006 03:13:47 +0000 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? Message-ID: Hi netters, Recently I found this: For constructing a new SeqI object, I had to write: $seq_obj=Bio::SeqIO->new( -file => '/home/myfile', -format => 'Fasta'); #Note the dash before the two arguments. If I omitted the dash: $seq_obj=Bio::SeqIO->new( file => '/home/myfile', format => 'Fasta'); I'd get error: MSG: Unknown format given or could not determine it [] STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 So it seems to me that the dashes before the arguments are essential. However, when I tried to build a factory for StandaloneBlast, I found the other way around. If the script had the dash: $blast_obj=Bio::Tools::Run::StandAloneBlast->new( -program => 'blastn', -database => '/home/mydatabase'); I'd get the error message: MSG: Unallowed parameter: - ! STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 If I left out the dash by saying: $blast_obj=Bio::Tools::Run::StandAloneBlast->new( program => 'blastn', database => '/home/mydatabase'); Everyting is fine. Now I'm confused. Why sometimes I have to add the dash, while sometimes I'm not allowed to? Thanks in advance! _________________________________________________________________ ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn From hlapp at gmx.net Wed Dec 6 22:56:44 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Dec 2006 22:56:44 -0500 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: Congrats! Great work, Sendu! Don't forget to celebrate. -hilmar On Dec 6, 2006, at 5:39 PM, Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher > Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by > Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list > and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN > matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or report > problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From arareko at campus.iztacala.unam.mx Wed Dec 6 22:53:21 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 06 Dec 2006 21:53:21 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <45779031.3050202@campus.iztacala.unam.mx> This has been a great effort. Congrats and thanks to everyone involved! Mauricio. Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN matters. > > Finally, thank you to everyone who tried out the release candidates, and > especially those that took the time to file bug reports or report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From jason at bioperl.org Thu Dec 7 00:06:36 2006 From: jason at bioperl.org (Jason Stajich) Date: Wed, 6 Dec 2006 21:06:36 -0800 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <41A863C9-1B69-4C7B-9271-C577EDD011BB@bioperl.org> hear! hear! Excellent work. Thanks for leading the effort on this release and all of the behind the scenes work, attention to detail, and cat herding work it took make this possible. -jason On Dec 6, 2006, at 2:39 PM, Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher > Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by > Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list > and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN > matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or report > problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From n.haigh at sheffield.ac.uk Thu Dec 7 02:23:47 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 07 Dec 2006 07:23:47 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <4577C183.7010501@sheffield.ac.uk> I know I'm very new to Bioperl development and don't know very much yet, so I'm probably not the best person to express the views of the Bioperl developers or users. However, I'm sure I'm safe in saying that on behalf of everyone associated with Bioperl a *huge* thank you must go out to Sendu for the gargantuan effort he has put into this release. Just looking over some of the e-mails he's sent over the past few weeks alone, it's clear that he has devoted a huge amount of time to the effort and in some cases with little sleep. Since there is very little (or should I say no) monetary recognition in such an important and time consuming role as "Release Pumpkin", I hope Sendu has a warm glow, safe in the knowledge that his efforts have helped enormously and are clearly recognised and fully appreciated by the Bioperl community. Therefore, I'd just like to iterate what others have already said.....Well done, excellent work!!! Nath From valiente at lsi.upc.edu Thu Dec 7 03:25:27 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Thu, 7 Dec 2006 09:25:27 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: Message-ID: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> The following popped out when input more the 110 species to taxonomy2tree script version 1.4: (in cleanup) ------------- EXCEPTION ------------- MSG: Must supply a Bio::Taxon STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ flatfile.pm:260 STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 STACK (eval) taxonomy2tree.pl:0 STACK toplevel taxonomy2tree.pl:0 Any clues? Thanks, Gabriel From bix at sendu.me.uk Thu Dec 7 04:24:39 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Dec 2006 09:24:39 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> Message-ID: <4577DDD7.7060208@sendu.me.uk> Gabriel Valiente wrote: > The following popped out when input more the 110 species to > taxonomy2tree script version 1.4: > > (in cleanup) > ------------- EXCEPTION ------------- > MSG: Must supply a Bio::Taxon > STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ > flatfile.pm:260 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 > STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 > STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 > STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 > STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 > STACK (eval) taxonomy2tree.pl:0 > STACK toplevel taxonomy2tree.pl:0 > > Any clues? Thanks, Are you able to narrow the problem down? What was your command line, what species were you using? Does it work with the first 110 species you tried? Is there anything special about the 111th? Do I understand correctly that this was a problem during cleanup only, and didn't affect the correctness and completeness of the result? From bix at sendu.me.uk Thu Dec 7 04:33:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Dec 2006 09:33:18 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> Message-ID: <4577DFDE.6000500@sendu.me.uk> Gabriel Valiente wrote: > The following popped out when input more the 110 species to > taxonomy2tree script version 1.4: > > (in cleanup) > ------------- EXCEPTION ------------- > MSG: Must supply a Bio::Taxon > STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ > flatfile.pm:260 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 > STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 > STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 > STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 > STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 > STACK (eval) taxonomy2tree.pl:0 > STACK toplevel taxonomy2tree.pl:0 > > Any clues? Thanks, Oh, does it work with option -e? Or does it work if you delete your old indexes of the nodes and names files and let it re-create them? From valiente at lsi.upc.edu Thu Dec 7 04:38:03 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Thu, 7 Dec 2006 10:38:03 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4577DDD7.7060208@sendu.me.uk> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> <4577DDD7.7060208@sendu.me.uk> Message-ID: Hi, If you run the attached shell script you should be able to reproduce the problem. It is not about any species in particular, but about the total number of species: it crushes with more than 120 species. The resulting tree is not correct, I'm checking it further now. Thanks, Gabriel -------------- next part -------------- A non-text attachment was scrubbed... Name: fetch-bork.sh Type: application/octet-stream Size: 7378 bytes Desc: not available URL: -------------- next part -------------- On Dec 7, 2006, at 10:24 AM, Sendu Bala wrote: > Gabriel Valiente wrote: >> The following popped out when input more the 110 species to >> taxonomy2tree script version 1.4: >> (in cleanup) >> ------------- EXCEPTION ------------- >> MSG: Must supply a Bio::Taxon >> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ >> flatfile.pm:260 >> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 >> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 >> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 >> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 >> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 >> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 >> STACK (eval) taxonomy2tree.pl:0 >> STACK toplevel taxonomy2tree.pl:0 >> Any clues? Thanks, > > Are you able to narrow the problem down? What was your command > line, what species were you using? Does it work with the first 110 > species you tried? Is there anything special about the 111th? > > Do I understand correctly that this was a problem during cleanup > only, and didn't affect the correctness and completeness of the > result? From cjfields at uiuc.edu Thu Dec 7 10:22:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 09:22:47 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110species In-Reply-To: Message-ID: <000001c71a13$8feec840$15327e82@pyrimidine> > Hi, > > If you run the attached shell script you should be able to > reproduce the problem. It is not about any species in > particular, but about the total number of species: it crushes > with more than 120 species. The resulting tree is not > correct, I'm checking it further now. Thanks, > > Gabriel Gabriel, My guess is this may have to do with using an old taxonomy dump file. I got this to work on winXP using the latest NCBI taxonomy. I had to modify taxonomy2tree and your shell script to get it to play nice with Windows, but I didn't get the error and I did get a tree (abbreviated for brevity): (((((("Agrobacterium tumefaciens str. C58","Sinorhizobium meliloti")Rhizobiaceae,... chris From cjfields at uiuc.edu Thu Dec 7 13:44:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 12:44:32 -0600 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: References: Message-ID: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu> On Dec 6, 2006, at 9:13 PM, zhihua li wrote: > Hi netters, > > Recently I found this: > > For constructing a new SeqI object, I had to write: > $seq_obj=Bio::SeqIO->new( > -file => '/home/myfile', > -format => 'Fasta'); #Note the dash before the > two arguments. > > If I omitted the dash: > $seq_obj=Bio::SeqIO->new( > file => '/home/myfile', > format => 'Fasta'); > I'd get error: > MSG: Unknown format given or could not determine it [] > STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 > > So it seems to me that the dashes before the arguments are > essential. However, when I tried to build a factory for > StandaloneBlast, I found the other way around. > > If the script had the dash: > $blast_obj=Bio::Tools::Run::StandAloneBlast->new( > -program => 'blastn', > -database => '/home/mydatabase'); > > I'd get the error message: MSG: Unallowed parameter: - ! > STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ > site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 > STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ > site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 > > If I left out the dash by saying: > $blast_obj=Bio::Tools::Run::StandAloneBlast->new( > program => 'blastn', > database => '/home/mydatabase'); > > Everyting is fine. > > Now I'm confused. Why sometimes I have to add the dash, while > sometimes I'm not allowed to? > > Thanks in advance! I agree that this should be more consistent. Does anyone know the reasoning for this? Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Thu Dec 7 14:32:21 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 07 Dec 2006 14:32:21 -0500 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu> Message-ID: Chris, The latest StandAloneBlast takes "dashed parameters", as in: @params = (-database => 'swissprot',-outfile => 'blast1.out'); $factory = Bio::Tools::Run::StandAloneBlast->new(@params); Or my $factory = Bio::Tools::Run::StandAloneBlast->new(-program =>"wublastp", -database=>"swissprot", -e => 1e-20); So that's why I asked "what version?" Someone made the change to allow dashes in @params a few months ago and I believe that that someone was you! Brian O. On 12/7/06 1:44 PM, "Chris Fields" wrote: > > On Dec 6, 2006, at 9:13 PM, zhihua li wrote: > >> Hi netters, >> >> Recently I found this: >> >> For constructing a new SeqI object, I had to write: >> $seq_obj=Bio::SeqIO->new( >> -file => '/home/myfile', >> -format => 'Fasta'); #Note the dash before the >> two arguments. >> >> If I omitted the dash: >> $seq_obj=Bio::SeqIO->new( >> file => '/home/myfile', >> format => 'Fasta'); >> I'd get error: >> MSG: Unknown format given or could not determine it [] >> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 >> >> So it seems to me that the dashes before the arguments are >> essential. However, when I tried to build a factory for >> StandaloneBlast, I found the other way around. >> >> If the script had the dash: >> $blast_obj=Bio::Tools::Run::StandAloneBlast->new( >> -program => 'blastn', >> -database => '/home/mydatabase'); >> >> I'd get the error message: MSG: Unallowed parameter: - ! >> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ >> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 >> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ >> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 >> >> If I left out the dash by saying: >> $blast_obj=Bio::Tools::Run::StandAloneBlast->new( >> program => 'blastn', >> database => '/home/mydatabase'); >> >> Everyting is fine. >> >> Now I'm confused. Why sometimes I have to add the dash, while >> sometimes I'm not allowed to? >> >> Thanks in advance! > > I agree that this should be more consistent. Does anyone know the > reasoning for this? > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Dec 7 14:44:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 13:44:19 -0600 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: References: Message-ID: On Dec 7, 2006, at 1:32 PM, Brian Osborne wrote: > Chris, > > The latest StandAloneBlast takes "dashed parameters", as in: > > @params = (-database => 'swissprot',-outfile => 'blast1.out'); > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > Or > > my $factory = Bio::Tools::Run::StandAloneBlast->new(-program > =>"wublastp", > - > database=>"swissprot", > -e => 1e-20); > > So that's why I asked "what version?" > > Someone made the change to allow dashes in @params a few months ago > and I > believe that that someone was you! > > Brian O. Nope, I plead innocent (at least to this!). I haven't made any commits to StandAloneBlast. These were added in by Torsten (see commits 1.59, 1.60), so you'll need to blame/thank him... http://tinyurl.com/y7ym9g So they're now a bit more consistent. That's not to say StandAloneBlast doesn't need some major revisions.... BTW, I didn't see a post from you asking about the version. Chris From akarger at CGR.Harvard.edu Thu Dec 7 16:32:51 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu, 7 Dec 2006 16:32:51 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq Message-ID: I need to know how to get the frame information in exon features (created by Bio::Tools::GFF) into a whole-gene feature that will be translated into a protein. I'm reading in some fungal GFFs generated by Jason Stajich. I - Use Bio::Tools::GFF to create a feature for each exon in a gene - Create a Bio::Location::Split object containing each feature's location - Create a Bio::SeqFeature::Generic object whose location is the above BL::Split - Attach my contig Bio::Seq to the feature - get the protein with feature->spliced_seq->translate->seq (Code below) Unfortunately, I get the wrong result when the GFF features have frame != 0. This happens for only a few percent of the exons, but when it does, I end up translating in the wrong frame. If I read the docs correctly, Location objects don't have a frame. So how do I get the correct spliced_seq, which skips one or two bp at the beginning of certain exons? I suspect the answer to this is that I'm going about this in completely the wrong way, in which case, please tell me how I ought to be doing it. Thanks, - Amir Karger Research Computing Life Sciences Division Harvard University P.S. In case you want to see actual code, here it is. After using Bio::Tools::GFF to create a sorted list of features for each exon (basically stolen from the module POD), I: # Create a new object representing the exons' gene my $coding_loc_obj = new Bio::Location::Split; foreach my $exon (@sorted_exons) { $coding_loc_obj->add_sub_Location($exon->location); } # Build a spliced feature representing the whole gene my $spliced_feat = new Bio::SeqFeature::Generic( -start => $coding_loc_obj->start, -end => $coding_loc_obj->end, -strand => $strand_num, -primary=> "splicedGene", ); $spliced_feat->location($coding_loc_obj); # Attach a contig object containing the sequence $spliced_feat->attach_seq($contig_obj->bioperl_object); # Get the spliced seq and translate to protein: my $coding_seq = $spliced_feat->spliced_seq->seq; my $protein = $spliced_feat->spliced_seq->translate->seq; From bix at sendu.me.uk Thu Dec 7 17:45:32 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 7 Dec 2006 15:45:32 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release Message-ID: <000001c71a51$671a79d0$6400a8c0@CodonSolutions.local> I am proud to announce the final release of Bioperl 1.5.2. http://www.bioperl.org/wiki/Release_1.5.2 bioperl (core): cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-1.5.2_100.zip bioperl-run: cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip bioperl-db: cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip bioperl-network: cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip http://bioperl.org/DIST/SIGNATURES.md5 (all are also available via CVS, and for Windows users, using the Perl Package Manager - see the wiki for details) The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree and bioperl-pipeline) did not see a unified release for 1.5.2. This release represents a developer release which has been thoroughly tested. We consider it the most stable (in terms of bugs) version of Bioperl and believe it to be suitable for most people. It is marked 'developer' or even 'unstable' because its API may change on short notice. It will also not be maintained or supported beyond the next bioperl release. 1.5.2 introduces the following new (core) features: * Taxonomy (Bio::Species) overhaul * Bio::Map improvements * Bio::SearchIO speedup * Build.PL installation For details, and a complete change log, see the wiki. API documentation is available here: http://doc.bioperl.org/ Acknowledgements: Enumerable thanks are due for the tireless efforts of Christopher Fields (bug fixing, testing, documentation, discussion), Nathan Haigh (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra (testing, documentation, support). Feedback and ideas provided by Hilmar Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and elsewhere proved invaluable. None of this would have been possible without the behind-the-scenes work of the open-bio support team. I'd also like to acknowledge Andreas J. Koenig for his help with CPAN matters. Finally, thank you to everyone who tried out the release candidates, and especially those that took the time to file bug reports or report problems. Remember, Bioperl can only go from strength to strength with /your/ help. If you'd like to experience the fame and fortune that naturally follow becoming a Bioperl developer (?!), become one! http://www.bioperl.org/wiki/Becoming_a_developer On behalf of the Bioperl team, Sendu Bala. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From cjfields at uiuc.edu Thu Dec 7 18:00:43 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 16:00:43 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c71a53$85cb4f10$6400a8c0@CodonSolutions.local> Great job Sendu! A bit of icing on the cake: all the WinXP PPMs (core, db, network, run) installed w/o a hitch following normal instructions using PPM4 (GUI and command line shell) using clean ActiveState installations. Looks like all the correct prereqs were installed with shell (only XML::SAX::ExpatXS was left out in the GUI installation for reasons outlined before). I'll run more tests tomorrow to see if tests pass with the installed bioperl (this should catch any prereq issues with PPM installation we missed). chris > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using > the Perl Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, > bioperl-pedigree and bioperl-pipeline) did not see a unified > release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of > Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas > provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the > mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with > CPAN matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or > report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From kaboroev at sfu.ca Thu Dec 7 17:26:35 2006 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Thu, 07 Dec 2006 14:26:35 -0800 Subject: [Bioperl-l] Bio::Graphics xyplot Message-ID: <4578951B.5050206@sfu.ca> Hi everyone, I'm attempting to add an xyplot of the phred quality scores to an Bio::Graphics image, and cannot get it to work. I have the panel with a track for both the scale and the DNA displaying properly. When I attempt to add the xyplot i just get a garbled track of, what looks like, timy xyplots for each datapoint. I have the cvs (updated today) of bioperl-live running. I think what I am missing is the creation of a "Sequence Feature Group" to hold the individual points of the plot. However, I cannot seem to find such an object. This is what I attempted: -------BEGIN---CODE----------- # start panel my $panel = Bio::Graphics::Panel->new(-length => $f_seqlen, -width => $f_seqlen*10, -pad_left => 10, -pad_right => 10, -grid => 1 ); # add scale $panel->add_track(arrow => Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen), -double => 1, -tick => 2, -fgcolor => 'black'); # add DNA ($feature is of type Bio::SeqFeature::Annotated) $panel->add_track(dna => $feature); # get list of quality scores from database my ($pqs_value) = $dbh->selectrow_array($sql); my @pqs_value = split(/\s/,$pqs_value); # create track my $track = $panel->add_track(-glyph => 'xyplot', -graph_type => 'points', -point_symbol => 'point', -max_score => 100, -min_score => 0, -scale => 'none'); # add "subfeatures" to for (my $i=0;$i<$f_seqlen;$i++) { $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i])); } print $panel->png(); $panel->finished; ------END---CODE---------- I also attempted to create an array of the point features and passed that by reference to the panel "add_track" as it describes in the xyplot documentation, but that resulted in the exact same image. keith -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 From arareko at campus.iztacala.unam.mx Thu Dec 7 18:15:53 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 7 Dec 2006 16:15:53 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c71a55$a479da60$6400a8c0@CodonSolutions.local> This has been a great effort. Congrats and thanks to everyone involved! Mauricio. Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN matters. > > Finally, thank you to everyone who tried out the release candidates, and > especially those that took the time to file bug reports or report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From cain at cshl.edu Thu Dec 7 17:46:09 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 07 Dec 2006 17:46:09 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq In-Reply-To: References: Message-ID: <1165531569.2569.49.camel@localhost.localdomain> Amir, I don't know for sure what the problem is, but here is one possibility: the number in column 8 of a GFF file is not the frame, it is the phase. See the GFF3 spec for a description of what the phase is: http://www.sequenceontology.org/gff3.shtml (It doesn't matter if you are using GFF3 or GFF2, as the phase is the same in both). Scott On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > I need to know how to get the frame information in exon features > (created by Bio::Tools::GFF) into a whole-gene feature that will be > translated into a protein. > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > - Create a Bio::Location::Split object containing each feature's > location > - Create a Bio::SeqFeature::Generic object whose location is the above > BL::Split > - Attach my contig Bio::Seq to the feature > - get the protein with feature->spliced_seq->translate->seq > > (Code below) > > Unfortunately, I get the wrong result when the GFF features have frame > != 0. This happens for only a few percent of the exons, but when it > does, I end up translating in the wrong frame. > > If I read the docs correctly, Location objects don't have a frame. So > how do I get the correct spliced_seq, which skips one or two bp at the > beginning of certain exons? > > I suspect the answer to this is that I'm going about this in completely > the wrong way, in which case, please tell me how I ought to be doing it. > > Thanks, > - Amir Karger > Research Computing > Life Sciences Division > Harvard University > > P.S. In case you want to see actual code, here it is. After using > Bio::Tools::GFF to create a sorted list of features for each exon > (basically stolen from the module POD), I: > # Create a new object representing the exons' gene > my $coding_loc_obj = new Bio::Location::Split; > foreach my $exon (@sorted_exons) { > $coding_loc_obj->add_sub_Location($exon->location); > } > > # Build a spliced feature representing the whole gene > my $spliced_feat = new Bio::SeqFeature::Generic( > -start => $coding_loc_obj->start, > -end => $coding_loc_obj->end, > -strand => $strand_num, > -primary=> "splicedGene", > ); > $spliced_feat->location($coding_loc_obj); > > # Attach a contig object containing the sequence > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > # Get the spliced seq and translate to protein: > my $coding_seq = $spliced_feat->spliced_seq->seq; > my $protein = $spliced_feat->spliced_seq->translate->seq; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Thu Dec 7 21:52:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 20:52:47 -0600 Subject: [Bioperl-l] Using frame info from GFF in gettinga Seq->spliced_seq In-Reply-To: <1165531569.2569.49.camel@localhost.localdomain> Message-ID: <002d01c71a73$f16ecc40$15327e82@pyrimidine> Another issue is the splittype() is not defined, though I don't think that would kill anything as currently implemented. However, one thing we have passingly discussed is having Bio::Location::Split objects possibly exhibit different (but expected) behaviors based upon the splittype() (order, join, or bond). It's one of the things I want to work out for the next release. If Scott's fix doesn't work and the problem persists, you should file a bug report with some sample data for us to test out. chris > Amir, > > I don't know for sure what the problem is, but here is one > possibility: > the number in column 8 of a GFF file is not the frame, it is > the phase. > See the GFF3 spec for a description of what the phase is: > > http://www.sequenceontology.org/gff3.shtml > > (It doesn't matter if you are using GFF3 or GFF2, as the > phase is the same in both). > > Scott > > > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > > I need to know how to get the frame information in exon features > > (created by Bio::Tools::GFF) into a whole-gene feature that will be > > translated into a protein. > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > - Create a Bio::Location::Split object containing each feature's > > location > > - Create a Bio::SeqFeature::Generic object whose location > is the above > > BL::Split > > - Attach my contig Bio::Seq to the feature > > - get the protein with feature->spliced_seq->translate->seq > > > > (Code below) > > > > Unfortunately, I get the wrong result when the GFF features > have frame > > != 0. This happens for only a few percent of the exons, but when it > > does, I end up translating in the wrong frame. > > > > If I read the docs correctly, Location objects don't have a > frame. So > > how do I get the correct spliced_seq, which skips one or > two bp at the > > beginning of certain exons? > > > > I suspect the answer to this is that I'm going about this in > > completely the wrong way, in which case, please tell me how > I ought to be doing it. > > > > Thanks, > > - Amir Karger > > Research Computing > > Life Sciences Division > > Harvard University > > > > P.S. In case you want to see actual code, here it is. After using > > Bio::Tools::GFF to create a sorted list of features for each exon > > (basically stolen from the module POD), I: > > # Create a new object representing the exons' gene > > my $coding_loc_obj = new Bio::Location::Split; > > foreach my $exon (@sorted_exons) { > > $coding_loc_obj->add_sub_Location($exon->location); > > } > > > > # Build a spliced feature representing the whole gene > > my $spliced_feat = new Bio::SeqFeature::Generic( > > -start => $coding_loc_obj->start, > > -end => $coding_loc_obj->end, > > -strand => $strand_num, > > -primary=> "splicedGene", > > ); > > $spliced_feat->location($coding_loc_obj); > > > > # Attach a contig object containing the sequence > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > # Get the spliced seq and translate to protein: > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > my $protein = $spliced_feat->spliced_seq->translate->seq; From jason at bioperl.org Thu Dec 7 21:01:33 2006 From: jason at bioperl.org (Jason Stajich) Date: Thu, 7 Dec 2006 18:01:33 -0800 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq In-Reply-To: References: Message-ID: <866F6CEE-62BB-4880-9B13-6DDE29EAF94E@bioperl.org> This was a problem in the gene prediction output I suspect, more recent versions of the program should have fixed this. I do not currently have free time to deal with the errors in the small number of ORFs where this has happened. I think you just need to do start -= start- (frame*strand) for 1st exons. You can also probably provide the 1st exon's frame to the translate function as another possibility but you should try and get the CDS correct first depending on your downstream analyses. -jason On Dec 7, 2006, at 1:32 PM, Amir Karger wrote: > I need to know how to get the frame information in exon features > (created by Bio::Tools::GFF) into a whole-gene feature that will be > translated into a protein. > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > - Create a Bio::Location::Split object containing each feature's > location > - Create a Bio::SeqFeature::Generic object whose location is the above > BL::Split > - Attach my contig Bio::Seq to the feature > - get the protein with feature->spliced_seq->translate->seq > > (Code below) > > Unfortunately, I get the wrong result when the GFF features have frame > != 0. This happens for only a few percent of the exons, but when it > does, I end up translating in the wrong frame. > > If I read the docs correctly, Location objects don't have a frame. So > how do I get the correct spliced_seq, which skips one or two bp at the > beginning of certain exons? > > I suspect the answer to this is that I'm going about this in > completely > the wrong way, in which case, please tell me how I ought to be > doing it. > > Thanks, > - Amir Karger > Research Computing > Life Sciences Division > Harvard University > > P.S. In case you want to see actual code, here it is. After using > Bio::Tools::GFF to create a sorted list of features for each exon > (basically stolen from the module POD), I: > # Create a new object representing the exons' gene > my $coding_loc_obj = new Bio::Location::Split; > foreach my $exon (@sorted_exons) { > $coding_loc_obj->add_sub_Location($exon->location); > } > > # Build a spliced feature representing the whole gene > my $spliced_feat = new Bio::SeqFeature::Generic( > -start => $coding_loc_obj->start, > -end => $coding_loc_obj->end, > -strand => $strand_num, > -primary=> "splicedGene", > ); > $spliced_feat->location($coding_loc_obj); > > # Attach a contig object containing the sequence > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > # Get the spliced seq and translate to protein: > my $coding_seq = $spliced_feat->spliced_seq->seq; > my $protein = $spliced_feat->spliced_seq->translate->seq; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From neetisomaiya at gmail.com Fri Dec 8 05:21:50 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 8 Dec 2006 15:51:50 +0530 Subject: [Bioperl-l] need help with phrap parser Message-ID: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> Can anyone point me to a Phrap parser which parses the ace file to extract what reads make up each contig (eg. read_a and read_b make contig1; read_d read_e and read_z make contig2, and other information of the reads (like whether the read is complemented or not with respect to the contig, what region of the contig does each read contribute etc), basically the AF and BS lines of the ACE output. -- -Neeti Even my blood says, B positive From pmiguel at purdue.edu Fri Dec 8 09:17:02 2006 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 08 Dec 2006 09:17:02 -0500 Subject: [Bioperl-l] need help with phrap parser In-Reply-To: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> References: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> Message-ID: <457973DE.6050900@purdue.edu> neeti somaiya wrote: > Can anyone point me to a Phrap parser which parses the ace file to extract > what reads make up each contig (eg. read_a and read_b make contig1; read_d > read_e and read_z make contig2, and other information of the reads (like > whether the read is complemented or not with respect to the contig, what > region of the contig does each read contribute etc), basically the AF and BS > lines of the ACE output. > > neeti, To find the reads that went into each contig, you do *not* want the BS tagged records. My understanding is that BS is just what consed uses to populate its consensus line from the ace file. I write this because of an email sent me by David Gordon in 2001 included here without his permission: > > Phrap writes BS lines which > > indicate, for each consensus position, which read phrap uses at that > > position to become the consensus. These BS ("base segments") are > > manipulated by Consed when there are changes to the assembly, such as > > joins, tears, removing reads, or changing the consensus. > The simplest way is: egrep '^CO|AF|RD' acefilename if you are on a unix system. Or with perl while (<>) { print if (/^CO|AF|RD/); } But then you would need to parse the fields of interest. You get the position/strand in the contig from AF, then you get the length of the read from RD. There does look like there is a part of bioperl that meant to perform this task--including Bio::Assembly::IO::ace but it looks like it was started, but never completed. From cjfields at uiuc.edu Fri Dec 8 10:17:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 09:17:31 -0600 Subject: [Bioperl-l] NAR Database Issue Papers Message-ID: <000601c71adb$fdd60490$15327e82@pyrimidine> For those interested, the Nucleic Acids Research Database issue papers have been popping up in the Advance Access section of the NAR website: http://nar.oxfordjournals.org/papbyrecent.dtl Ensembl, UCSC Browser, Entrez Gene, and a number of others of possible are represented. Of particular note are a few mentions of formatting changes to UniProt, EMBL, and other records, which should be taken care of in the latest BioPerl release (fingers crossed!). chris From cjfields at uiuc.edu Fri Dec 8 10:31:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 09:31:19 -0600 Subject: [Bioperl-l] need help with phrap parser In-Reply-To: <457973DE.6050900@purdue.edu> Message-ID: <000001c71add$ec7147d0$15327e82@pyrimidine> ... > But then you would need to parse the fields of interest. You get the > position/strand in the contig from AF, then you get the length of the > read from RD. > > There does look like there is a part of bioperl that meant to perform > this task--including Bio::Assembly::IO::ace but it looks like it was > started, but never completed. ...and if anyone wants to chip in and work on it, let us know! The various Bio::Assembly modules are one of many areas that needs some updating. chris From akarger at CGR.Harvard.edu Fri Dec 8 13:25:47 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 8 Dec 2006 13:25:47 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq Message-ID: > This was a problem in the gene prediction output I suspect, more > recent versions of the program should have fixed this. I do not > currently have free time to deal with the errors in the small number > of ORFs where this has happened. > > I think you just need to do > start -= start- (frame*strand) > for 1st exons. I used if (strand==1) {start += exon->frame} else {end -= exon->frame} This took me from 90 translations that had * within the sequence to just 9, out of 5500 CDS in S bayanus. > You can also probably provide the 1st exon's frame to the translate > function as another possibility but you should try and get the CDS > correct first depending on your downstream analyses. Yes, I think. Scott Cain pointed out that GFF column 8 is the "phase", which I had never heard of before. My current, very limited, understanding is that sometimes you'll have an exon with, say, 31 bp, followed by an exon with 29 bp. When the intron gets spliced out, you eventually get an mRNA of 60 bp, which translates to a protein of 20 aa. But the second exon has a phase of 1, not 0, because you can't just start translating at the first bp of the second exon and expect to get nice amino acids. By the way, whether or not phase is the same thing as frame, when I call the frame() method on the features created by Bio::Tools::GFF, I get the phase info. I assume that's a feature (no pun intended), not a bug? I'm still confused as to why you would have a phase in the first exon, though. Why not just say the CDS starts 1 or 2 bp later? (This is probably a bio question, not a bioperl question, but a quick Google didn't get me an answer. "Phase" isn't a very good search term.) I guess the real question here, which Jason alludes to, is whether SeqFeature->spliced_seq ought to take into account the phase information of the first exon. Right now, it doesn't, so when you call SeqFeature->spliced_seq->translate, you get gibberish. Are there cases where you would want spliced_seq to include the first bp or two? Should there be an option to spliced_seq for whether you want to take phase information into account? I can't submit a bug report until we confirm it's a bug. Thanks, -Amir Karger > -jason > On Dec 7, 2006, at 1:32 PM, Amir Karger wrote: > > > I need to know how to get the frame information in exon features > > (created by Bio::Tools::GFF) into a whole-gene feature that will be > > translated into a protein. > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > - Create a Bio::Location::Split object containing each feature's > > location > > - Create a Bio::SeqFeature::Generic object whose location > is the above > > BL::Split > > - Attach my contig Bio::Seq to the feature > > - get the protein with feature->spliced_seq->translate->seq > > > > (Code below) > > > > Unfortunately, I get the wrong result when the GFF features > have frame > > != 0. This happens for only a few percent of the exons, but when it > > does, I end up translating in the wrong frame. > > > > If I read the docs correctly, Location objects don't have a > frame. So > > how do I get the correct spliced_seq, which skips one or > two bp at the > > beginning of certain exons? > > > > I suspect the answer to this is that I'm going about this in > > completely > > the wrong way, in which case, please tell me how I ought to be > > doing it. > > > > Thanks, > > - Amir Karger > > Research Computing > > Life Sciences Division > > Harvard University > > > > P.S. In case you want to see actual code, here it is. After using > > Bio::Tools::GFF to create a sorted list of features for each exon > > (basically stolen from the module POD), I: > > # Create a new object representing the exons' gene > > my $coding_loc_obj = new Bio::Location::Split; > > foreach my $exon (@sorted_exons) { > > $coding_loc_obj->add_sub_Location($exon->location); > > } > > > > # Build a spliced feature representing the whole gene > > my $spliced_feat = new Bio::SeqFeature::Generic( > > -start => $coding_loc_obj->start, > > -end => $coding_loc_obj->end, > > -strand => $strand_num, > > -primary=> "splicedGene", > > ); > > $spliced_feat->location($coding_loc_obj); > > > > # Attach a contig object containing the sequence > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > # Get the spliced seq and translate to protein: > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > my $protein = $spliced_feat->spliced_seq->translate->seq; > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From akarger at CGR.Harvard.edu Fri Dec 8 13:33:09 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 8 Dec 2006 13:33:09 -0500 Subject: [Bioperl-l] Using frame info from GFF in gettinga Seq->spliced_seq Message-ID: > Another issue is the splittype() is not defined, though I > don't think that > would kill anything as currently implemented. However, one > thing we have > passingly discussed is having Bio::Location::Split objects > possibly exhibit > different (but expected) behaviors based upon the splittype() > (order, join, > or bond). It's one of the things I want to work out for the > next release. Should I be writing -splittype => "JOIN" or some such in my new()? -Amir Karger > > chris > > > Amir, > > > > I don't know for sure what the problem is, but here is one > > possibility: > > the number in column 8 of a GFF file is not the frame, it is > > the phase. > > See the GFF3 spec for a description of what the phase is: > > > > http://www.sequenceontology.org/gff3.shtml > > > > (It doesn't matter if you are using GFF3 or GFF2, as the > > phase is the same in both). > > > > Scott > > > > > > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > > > I need to know how to get the frame information in exon features > > > (created by Bio::Tools::GFF) into a whole-gene feature > that will be > > > translated into a protein. > > > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > > - Create a Bio::Location::Split object containing each feature's > > > location > > > - Create a Bio::SeqFeature::Generic object whose location > > is the above > > > BL::Split > > > - Attach my contig Bio::Seq to the feature > > > - get the protein with feature->spliced_seq->translate->seq > > > > > > (Code below) > > > > > > Unfortunately, I get the wrong result when the GFF features > > have frame > > > != 0. This happens for only a few percent of the exons, > but when it > > > does, I end up translating in the wrong frame. > > > > > > If I read the docs correctly, Location objects don't have a > > frame. So > > > how do I get the correct spliced_seq, which skips one or > > two bp at the > > > beginning of certain exons? > > > > > > I suspect the answer to this is that I'm going about this in > > > completely the wrong way, in which case, please tell me how > > I ought to be doing it. > > > > > > Thanks, > > > - Amir Karger > > > Research Computing > > > Life Sciences Division > > > Harvard University > > > > > > P.S. In case you want to see actual code, here it is. After using > > > Bio::Tools::GFF to create a sorted list of features for each exon > > > (basically stolen from the module POD), I: > > > # Create a new object representing the exons' gene > > > my $coding_loc_obj = new Bio::Location::Split; > > > foreach my $exon (@sorted_exons) { > > > $coding_loc_obj->add_sub_Location($exon->location); > > > } > > > > > > # Build a spliced feature representing the whole gene > > > my $spliced_feat = new Bio::SeqFeature::Generic( > > > -start => $coding_loc_obj->start, > > > -end => $coding_loc_obj->end, > > > -strand => $strand_num, > > > -primary=> "splicedGene", > > > ); > > > $spliced_feat->location($coding_loc_obj); > > > > > > # Attach a contig object containing the sequence > > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > > > # Get the spliced seq and translate to protein: > > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > > my $protein = $spliced_feat->spliced_seq->translate->seq; > > > > From cjfields at uiuc.edu Fri Dec 8 14:04:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 13:04:55 -0600 Subject: [Bioperl-l] Using frame info from GFF ingettinga Seq->spliced_seq In-Reply-To: Message-ID: <000901c71afb$bf504210$15327e82@pyrimidine> > > Another issue is the splittype() is not defined, though I > don't think > > that would kill anything as currently implemented. > However, one thing > > we have passingly discussed is having Bio::Location::Split objects > > possibly exhibit different (but expected) behaviors based upon the > > splittype() (order, join, or bond). It's one of the things > I want to > > work out for the next release. > > Should I be writing -splittype => "JOIN" or some such in my new()? > > -Amir Karger I missed the fact that 'JOIN' is the default splittype() from looking at the constructor in Location::Split, so you actually don't have to explicitly set it; apologies for that. If we make any changes that affect how Location::Split behaves we'll likely leave the default splittype() as 'JOIN' as it's by far the most common join operator. chris From cjfields at uiuc.edu Fri Dec 8 15:03:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 14:03:16 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: Message-ID: <000001c71b03$e6741e90$15327e82@pyrimidine> > Yes, I think. Scott Cain pointed out that GFF column 8 is the > "phase", which I had never heard of before. My current, very > limited, understanding is that sometimes you'll have an exon > with, say, 31 bp, followed by an exon with 29 bp. When the > intron gets spliced out, you eventually get an mRNA of 60 bp, > which translates to a protein of 20 aa. > But the second exon has a phase of 1, not 0, because you > can't just start translating at the first bp of the second > exon and expect to get nice amino acids. I think the use of 'frame' here is meant relative to the DNA sequence (i.e. ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e. translation, three frames). At least I think that's what is meant! > By the way, whether or not phase is the same thing as frame, > when I call the frame() method on the features created by > Bio::Tools::GFF, I get the phase info. I assume that's a > feature (no pun intended), not a bug? > > I'm still confused as to why you would have a phase in the > first exon, though. Why not just say the CDS starts 1 or 2 bp > later? (This is probably a bio question, not a bioperl > question, but a quick Google didn't get me an answer. "Phase" > isn't a very good search term.) It could be b/c the location coordinates delineate the exon coding boundary. It's conceivable the first exon in a sequence record is not the first exon of the mRNA (i.e. there may be one or more exons prior to or past the exon of interest that are in 'remote' sequence records). Like this admittedly extreme example (GB acc AF130134): join(AF130124.1:2563..2964,AF130125.1:21..157,AF130126.1:12..174, AF130127.1:21..112,AF130128.1:21..162,AF130128.1:281..595, AF130128.1:661..842,AF130128.1:916..1030,AF130129.1:21..115, AF130130.1:21..165,AF130131.1:21..125,AF130132.1:21..428, AF130132.1:492..746,AF130133.1:21..168,AF130133.1:232..401, AF130133.1:475..906,AF130133.1:970..1107,AF130133.1:1176..1367,21..128) Also, the ends of the lcoation may be uncertain ('fuzzy'): join(complement(1009..>1260),complement(AF081827.1:<1..177)) > I guess the real question here, which Jason alludes to, is whether > SeqFeature->spliced_seq ought to take into account the phase > information > of the first exon. Right now, it doesn't, so when you call > SeqFeature->spliced_seq->translate, you get gibberish. Are there cases > where you would want spliced_seq to include the first bp or > two? Should there be an option to spliced_seq for whether you > want to take phase information into account? > > I can't submit a bug report until we confirm it's a bug. > > Thanks, > -Amir Karger You can already pass the frame or an offset to PrimarySeqI::translate(). Here are the args: Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 The offset comes from some GenBank seqfeatures which have an '\codon_start' tag indicating which nucleotide to start translation from (1,2,3). This is essentially just the phase+1. We could add a '-phase' argument for convenience which accepts 0,1,2. chris From bobfreemanma at speakeasy.net Fri Dec 8 15:47:15 2006 From: bobfreemanma at speakeasy.net (Bob Freeman) Date: Fri, 8 Dec 2006 15:47:15 -0500 Subject: [Bioperl-l] writing blastxml In-Reply-To: <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com> <000301c6f846$d6227760$15327e82@pyrimidine> <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> Message-ID: Can't seem to find a good post on this to answer my question: Does anyone know a good way to (re)write BLAST reports in XML format? I've got about 30,000 reports I need to rewrite for a (good!) piece of java software that will only import xml formatted BLAST reports. Right now, all mine are plain text. I don't think bioperl can do this yet, correct? If not, any suggestions, besides reblasting all 30,000? I'd like to save a few trees and lumps of coal. TIA, Bob -- ----------------------------------------------------- Bob Freeman, Ph.D. Bioinformatics consultant 51 Downer Avenue, #2 Dorchester, MA 02125 617/699.7057, vox If brains were taxed, he'd get a refund. -- Anonymous From camp_boot at hotmail.com Sun Dec 10 05:00:55 2006 From: camp_boot at hotmail.com (synapse) Date: Sun, 10 Dec 2006 10:00:55 +0000 (UTC) Subject: [Bioperl-l] Driver program for PestFind.pm Message-ID: Dear All, I apologize in advance for my almost total lack of knowledge of perl as a programming language. I need to use PestFind program, part of the biop_run package of bioperl. My understanding is that I will need a simple wrapper program that will read arguments from the command line, and pass them to that module. - Is there such program available that I can just use? - Does anyone know if pestfind can work on multiple sequence files (in fasta format), or does it only process single sequence files? Thanks a lot for the feedback. From cjfields at uiuc.edu Sun Dec 10 13:45:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 12:45:26 -0600 Subject: [Bioperl-l] writing blastxml In-Reply-To: References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com> <000301c6f846$d6227760$15327e82@pyrimidine> <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> Message-ID: <7FB4EBB9-BEDC-4250-BE2F-3F695D36F350@uiuc.edu> On Dec 8, 2006, at 2:47 PM, Bob Freeman wrote: > Can't seem to find a good post on this to answer my question: > > Does anyone know a good way to (re)write BLAST reports in XML format? > I've got about 30,000 reports I need to rewrite for a (good!) piece > of java software that will only import xml formatted BLAST reports. > Right now, all mine are plain text. > > I don't think bioperl can do this yet, correct? If not, any > suggestions, besides reblasting all 30,000? I'd like to save a few > trees and lumps of coal. > > TIA, > Bob The only BioPerl writers for BLAST reports are in BSML and HTML, not BLAST XML. I don't think there there have been any requests for it, and no one has really stepped forward to submit one. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Dec 10 13:55:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 12:55:16 -0600 Subject: [Bioperl-l] Driver program for PestFind.pm In-Reply-To: References: Message-ID: <32B0F15D-4144-43B6-AA81-5ED9BA848F45@uiuc.edu> On Dec 10, 2006, at 4:00 AM, synapse wrote: > Dear All, > > I apologize in advance for my almost total lack of knowledge of > perl as a > programming language. > > I need to use PestFind program, part of the biop_run package of > bioperl. My > understanding is that I will need a simple wrapper program that > will read > arguments from the command line, and pass them to that module. PestFind is part of the EMBOSS suite of programs: http://emboss.sourceforge.net/ The PestFind module in bioperl-run is actually used via Pise. > - Is there such program available that I can just use? See above > - Does anyone know if pestfind can work on multiple sequence > files (in fasta > format), or does it only process single sequence files? > > Thanks a lot for the feedback. No idea there, but the EMBOSS docs should tell you. chris From cjfields at uiuc.edu Mon Dec 11 00:38:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 23:38:32 -0600 Subject: [Bioperl-l] bioperl-run parameter question Message-ID: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> I am writing up a few bioperl-run modules and have a simple question, though I don't know if anyone knows the answer. I was curious as to why parameters for most (all?) bioperl-run modules lack the '-' preceding them. This came up re: StandAloneBlast last week (something Torsten fixed), but I noticed just about every bioperl-run module uses the dashless parameters. chris From n.haigh at sheffield.ac.uk Mon Dec 11 01:44:25 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Mon, 11 Dec 2006 06:44:25 +0000 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> Message-ID: <457CFE49.5010201@sheffield.ac.uk> Chris Fields wrote: > I am writing up a few bioperl-run modules and have a simple question, > though I don't know if anyone knows the answer. I was curious as to > why parameters for most (all?) bioperl-run modules lack the '-' > preceding them. This came up re: StandAloneBlast last week > (something Torsten fixed), but I noticed just about every bioperl-run > module uses the dashless parameters. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > No idea! Is there any reason for/against using dashed/dashless parameters? I suppose dshed parameters allow you to easy see which tokens on the command line are parameters and which are values. Should modules be able to accept both? Should dashed be preferred? Nath From cjfields at uiuc.edu Mon Dec 11 08:06:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 07:06:32 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457CFE49.5010201@sheffield.ac.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457CFE49.5010201@sheffield.ac.uk> Message-ID: On Dec 11, 2006, at 12:44 AM, Nathan S. Haigh wrote: > Chris Fields wrote: >> I am writing up a few bioperl-run modules and have a simple question, >> though I don't know if anyone knows the answer. I was curious as to >> why parameters for most (all?) bioperl-run modules lack the '-' >> preceding them. This came up re: StandAloneBlast last week >> (something Torsten fixed), but I noticed just about every bioperl-run >> module uses the dashless parameters. >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > No idea! > > Is there any reason for/against using dashed/dashless parameters? I > suppose dshed parameters allow you to easy see which tokens on the > command line are parameters and which are values. Should modules be > able > to accept both? Should dashed be preferred? > > Nath > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l I'm thinking about it from the point of consistency. When using a mix of core and run modules it can be a bit confusing, particularly when (as pointed out in the previous thread on StandAloneBlast) you can use only dashed parameters with core modules, while most (all?) run modules only accept dashless ones (in most cases some exception is thrown). Torsten fixed this in StandAloneBlast so it accepts both, but shouldn't this rule also apply to all run modules? Much of this probably is probably due to the donated nature of much of the bioperl-run code and Jason's 'cat-herding', and I understand that it would be a lot of work to change this for all run modules. However, we could at least try to start enforcing some loose rules with new bioperl-run wrappers (e.g. implement WrapperBase, use core- like parameters, etc). chris From akarger at CGR.Harvard.edu Mon Dec 11 11:20:03 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 11 Dec 2006 11:20:03 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq Message-ID: Chris Fields wrote: > > > Yes, I think. Scott Cain pointed out that GFF column 8 is the > > "phase", which I had never heard of before. My current, very > > limited, understanding is that sometimes you'll have an exon > > with, say, 31 bp, followed by an exon with 29 bp. When the > > intron gets spliced out, you eventually get an mRNA of 60 bp, > > which translates to a protein of 20 aa. > > But the second exon has a phase of 1, not 0, because you > > can't just start translating at the first bp of the second > > exon and expect to get nice amino acids. > > I think the use of 'frame' here is meant relative to the DNA > sequence (i.e. > ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e. > translation, three frames). At least I think that's what is meant! I agree. By the way, I'd love a reference to a simple bio-explanation of what's happening here. Google searches for "coding sequence phase" are not all that relevant. > > I'm still confused as to why you would have a phase in the > > first exon, though. Why not just say the CDS starts 1 or 2 bp > > later? (This is probably a bio question, not a bioperl > > question, but a quick Google didn't get me an answer. "Phase" > > isn't a very good search term.) > > It could be b/c the location coordinates delineate the exon > coding boundary. > It's conceivable the first exon in a sequence record is not > the first exon > of the mRNA (i.e. there may be one or more exons prior to or > past the exon > of interest that are in 'remote' sequence records). That's certainly not the case here, because the files have the entire genomes in them. > Also, the ends of the lcoation may be uncertain ('fuzzy'): > > join(complement(1009..>1260),complement(AF081827.1:<1..177)) Also not the case here. These locations aren't listed as fuzzy. Any other thoughts? > > I guess the real question here, which Jason alludes to, is whether > > SeqFeature->spliced_seq ought to take into account the phase > > information > > of the first exon. Right now, it doesn't, so when you call > > SeqFeature->spliced_seq->translate, you get gibberish. Are > there cases > > where you would want spliced_seq to include the first bp or > > two? Should there be an option to spliced_seq for whether you > > want to take phase information into account? > > You can already pass the frame or an offset to > PrimarySeqI::translate(). > We could add a '-phase' argument for > convenience which accepts 0,1,2. But as Jason pointed out, you should find the problem earlier. What if I want to get the RNA sequence that will become the protein? then having a phase arg to translate() doesn't help. Should there be a phase arg to spliced_seq? Which raises another bio question: at what point are the first 1 or 2 bp dropped when you have a phase of 1 or 2? Do they appear in the mRNA? -Amir Karger From bix at sendu.me.uk Mon Dec 11 13:21:42 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 11 Dec 2006 13:21:42 -0500 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> Message-ID: <457DA1B6.1060706@sendu.me.uk> Chris Fields wrote: > I am writing up a few bioperl-run modules and have a simple question, > though I don't know if anyone knows the answer. I was curious as to > why parameters for most (all?) bioperl-run modules lack the '-' > preceding them. This came up re: StandAloneBlast last week > (something Torsten fixed), but I noticed just about every bioperl-run > module uses the dashless parameters. I didn't follow that particular thread, but from my experience there is a useful distinction between bioperl options using the - as normal for full consistency with core (eg. -verbose), whilst the options that belong to the program the run module is a wrapper for do not take dashes. Again, this seems consistent within the run package. I'd suggest sticking to the current pattern. Cheers, Sendu. From cjfields at uiuc.edu Mon Dec 11 15:07:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 14:07:16 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457DA1B6.1060706@sendu.me.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> Message-ID: On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote: > Chris Fields wrote: >> I am writing up a few bioperl-run modules and have a simple >> question, though I don't know if anyone knows the answer. I was >> curious as to why parameters for most (all?) bioperl-run modules >> lack the '-' preceding them. This came up re: StandAloneBlast >> last week (something Torsten fixed), but I noticed just about >> every bioperl-run module uses the dashless parameters. > > I didn't follow that particular thread, but from my experience > there is a useful distinction between bioperl options using the - > as normal for full consistency with core (eg. -verbose), whilst the > options that belong to the program the run module is a wrapper for > do not take dashes. Again, this seems consistent within the run > package. I respectfully disagree that this is a 'useful' distinction. My main point is consistency. To me, it's counterintuitive to have two Bioperl classes, both which inherit Bio::Root::Root, use two different syntaxes for any parameters passed to the constructor, even if some are 'program' parameters. It's also not consistent with StandAloneBlast or RemoteBlast, both which are considered bioperl-run modules even though they are in core, and both or which use dashed parameters (StandAloneBlast actually allows both). In fact, it isn't consistent within bioperl-run itself. Bio::Tools::Run::EMBOSSApplication uses dashes for parameters in a hashref! Okay, judging by the previous examples, 'consistency' isn't a word I would use to describe bioperl-run as a whole (back to Jason's 'cat- herding' analogy). It would be easier to let it slide for now, especially since changing them would be a serious pain, not to mention an API issue. But shouldn't there be some consistency? And what about new modules? Do we follow the historical (possibly confusing) 'dashless' route, or use the core-like dashed approach (thus breaking from the other run modules)? > I'd suggest sticking to the current pattern. > > > Cheers, > Sendu. I'll allow for both, ala StandAloneBlast. Doesn't hurt to be safe. ; > Have fun at the hackathon! chris From bix at sendu.me.uk Mon Dec 11 16:19:55 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 11 Dec 2006 16:19:55 -0500 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> Message-ID: <457DCB7B.8050500@sendu.me.uk> Chris Fields wrote: > > On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> I am writing up a few bioperl-run modules and have a simple >>> question, though I don't know if anyone knows the answer. I was >>> curious as to why parameters for most (all?) bioperl-run modules >>> lack the '-' preceding them. This came up re: StandAloneBlast last >>> week (something Torsten fixed), but I noticed just about every >>> bioperl-run module uses the dashless parameters. >> >> I didn't follow that particular thread, but from my experience there >> is a useful distinction between bioperl options using the - as normal >> for full consistency with core (eg. -verbose), whilst the options that >> belong to the program the run module is a wrapper for do not take >> dashes. Again, this seems consistent within the run package. > > I respectfully disagree that this is a 'useful' distinction. My main > point is consistency. [snip] We're on the same page in terms of what we think would be a Good Thing, and allowing both ways (dashed and dashless) sounds reasonable. I was just suggesting why bioperl-run might be the way it was. Further to that, there is the practical aspect that it is a lot simpler to figure out which are the program options so they can be farmed out to the AUTOLOAD methods - again something that isn't done in core. If you come up with some generic way of dealing with options and farming to AUTOLOAD, perhaps there's scope for applying it to all the run wrappers (ideally via one of their base classes), so they all instantly gain dashed-mode capability. From cjfields at uiuc.edu Mon Dec 11 17:05:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 16:05:56 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457DCB7B.8050500@sendu.me.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> <457DCB7B.8050500@sendu.me.uk> Message-ID: On Dec 11, 2006, at 3:19 PM, Sendu Bala wrote: ... >> >> I respectfully disagree that this is a 'useful' distinction. My main >> point is consistency. > [snip] > > We're on the same page in terms of what we think would be a Good > Thing, > and allowing both ways (dashed and dashless) sounds reasonable. I was > just suggesting why bioperl-run might be the way it was. Further to > that, there is the practical aspect that it is a lot simpler to figure > out which are the program options so they can be farmed out to the > AUTOLOAD methods - again something that isn't done in core. Maybe b/c AUTOLOAD is frowned upon for a number of reasons, mainly code maintenance. I'm somewhat neutral on the idea of using AUTOLOAD as a short-term solution, though using heredoc and an eval{} block works well for me (and shows up when using $self->can('method') or when checking for methods via Class::Inspector). > If you come up with some generic way of dealing with options and > farming > to AUTOLOAD, perhaps there's scope for applying it to all the run > wrappers (ideally via one of their base classes), so they all > instantly > gain dashed-mode capability. I think that's the crux of the problem; they do not all have the same base class (except Bio::Root::Root). Most use WrapperBase. I thought at one point a Run-specific root module would be a good idea, but WrapperBase already works well. I'll go ahead with my modules and think about it some more. You could ask the powers-that-be (jason, hilmar, etc) what they think as well. chris From bosborne11 at verizon.net Mon Dec 11 17:24:54 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 11 Dec 2006 17:24:54 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: Message-ID: Amir, Google "intron phase", you will see a number of useful links. Brian O. On 12/11/06 11:20 AM, "Amir Karger" wrote: > I agree. By the way, I'd love a reference to a simple bio-explanation of > what's happening here. Google searches for "coding sequence phase" are > not all that relevant. From cjfields at uiuc.edu Mon Dec 11 22:20:06 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 21:20:06 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: On Dec 11, 2006, at 10:20 AM, Amir Karger wrote: >> I think the use of 'frame' here is meant relative to the DNA >> sequence (i.e. >> ORF searching, 6 frames) and the 'phase' is relative to the mRNA >> (i.e. >> translation, three frames). At least I think that's what is meant! > > I agree. By the way, I'd love a reference to a simple bio- > explanation of > what's happening here. Google searches for "coding sequence phase" are > not all that relevant. Ah, Brian found some links I see... >> It could be b/c the location coordinates delineate the exon >> coding boundary. >> It's conceivable the first exon in a sequence record is not >> the first exon >> of the mRNA (i.e. there may be one or more exons prior to or >> past the exon >> of interest that are in 'remote' sequence records). > > That's certainly not the case here, because the files have the entire > genomes in them. > >> Also, the ends of the lcoation may be uncertain ('fuzzy'): >> >> join(complement(1009..>1260),complement(AF081827.1:<1..177)) > > Also not the case here. These locations aren't listed as fuzzy. > > Any other thoughts? Which GFF files did you use? More specifically, which genes in which GFF file? I saw a reference to S. bayanus, but it's hard to work out what could be the problem unless we know a bit more. >>> I guess the real question here, which Jason alludes to, is whether >>> SeqFeature->spliced_seq ought to take into account the phase >>> information >>> of the first exon. Right now, it doesn't, so when you call >>> SeqFeature->spliced_seq->translate, you get gibberish. Are >> there cases >>> where you would want spliced_seq to include the first bp or >>> two? Should there be an option to spliced_seq for whether you >>> want to take phase information into account? >> >> You can already pass the frame or an offset to >> PrimarySeqI::translate(). >> We could add a '-phase' argument for >> convenience which accepts 0,1,2. > > But as Jason pointed out, you should find the problem earlier. What > if I > want to get the RNA sequence that will become the protein? then > having a > phase arg to translate() doesn't help. Should there be a phase arg to > spliced_seq? You'll also note Jason mentioned there were possible errors in the gene prediction programs which produced the output spliced_seq() is supposed to return the DNA sequence of a split location by splicing together the sublocation sequences in their 'join' order. So, if the first exon was out of phase, once spliced they should all be out of phase to the same degree, assuming all exons are joined together correctly. Translating this using the phase should produce the correct amino acid sequence. Note that Jason suggested passing the frame/phase of the first exon to translate(), not spliced_seq(). I also suggested translate(). > Which raises another bio question: at what point are the first 1 or > 2 bp > dropped when you have a phase of 1 or 2? Do they appear in the mRNA? > > -Amir Karger Any sequence present in the sublocations (exons) would be in the spliced sequence. This would have to include those nucleotides in exons skipped b/c of the phase since they are part of the coding region. chris From neetisomaiya at gmail.com Tue Dec 12 07:06:20 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:36:20 +0530 Subject: [Bioperl-l] need help in phredPhrap Message-ID: <764978cf0612120406m796b116dncd3a9e6c82ffe682@mail.gmail.com> Hi, I am running phredPharp, which runs phred, phrap and polyphred. Please refer to the "Using a reference sequence" section of this link http://droog.mbt.washington.edu/poly_doc50.html#REFER. I am using the reference sequence as described in the link above. With this I am getting the SNP positions on the contig sequence as well as on the reference sequence. Does anyone know if there is some output file which can also give me mapping between contig sequence and reference sequence? -- -Neeti Even my blood says, B positive From akarger at CGR.Harvard.edu Tue Dec 12 11:05:43 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 12 Dec 2006 11:05:43 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq Message-ID: (sorry if this thread is boring people) Chris Fields wrote: > > I agree. By the way, I'd love a reference to a simple bio- > > explanation of > > what's happening here. Google searches for "coding sequence > phase" are > > not all that relevant. > > Ah, Brian found some links I see... Thanks, Brian! Amazing how "coding sequence phase" finds nothing but "intron phase" finds a ton. This is why you need to actually learn biology, rather than Googling it. > Which GFF files did you use? More specifically, which genes > in which > GFF file? I saw a reference to S. bayanus, but it's hard to > work out > what could be the problem unless we know a bit more. http://fungal.genome.duke.edu/annotations/sbay/gff/saccharomyces_bayanus .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!) c127 (for example) has two lines in that file: sbay_c127 AUGUSTUS mRNA 263 723 . + . ID=sbay_c127-g1.1 sbay_c127 AUGUSTUS CDS 263 723 . + 1 Parent=sbay_c127-g1.1 Now go to gbrowse page: http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/ Type "sbay_c127:250-300" in the search box. As you can see from the translation track, if you start at bp 263, you hit a stop codon after just a few aas. But if you use frame2/phase 1, you get no stop codons all the way to the end of the contig. > >> You can already pass the frame or an offset to > >> PrimarySeqI::translate(). > >> We could add a '-phase' argument for > >> convenience which accepts 0,1,2. > > > > What if I > > want to get the RNA sequence that will become the protein? then > > having a > > phase arg to translate() doesn't help. Should there be a > phase arg to > > spliced_seq? > > You'll also note Jason mentioned there were possible errors in the > gene prediction programs which produced the output That's certainly possible. No gene prediction program will be perfect. In this case, though, it's clear that it found a large region without stop codons in it, and correctly identified the place to start translating. I guess I'm just surprised that, if it found just one exon in a gene (in the whole contig) why it would say the exon starts at 263 with a phase 1, instead of just saying it starts at 264. > spliced_seq() is supposed to return the DNA sequence of a split > location by splicing together the sublocation sequences in their > 'join' order. So, if the first exon was out of phase, once spliced > they should all be out of phase to the same degree, assuming all > exons are joined together correctly. Translating this using the > phase should produce the correct amino acid sequence. > > Note that Jason suggested passing the frame/phase of the first exon > to translate(), not spliced_seq(). I also suggested translate(). You're right. This brings the number of translated polypeptide sequences that have lots of *s in them to 9 instead of 90. I guess I have two requests here. The first is, if a person wants to see exactly which bps are translated to aas -- a nucelotide sequece of exactly 3N bp starting (usually) with ATG -- then they might want an argument to spliced_seq that skips the first one or two bp when necessary. After all, they might want to study the DNA, not the peptides. The second request is for "intelligent objects". If my SeqFeatures know that they're in phase 1, then when I call spliced_seq I want the resulting objects to know that they're phase one, such that when I call translate, Bioperl automatically skips the first bp or two. Admittedly, there might be big ramifications to this. Both requests of course made in the knowledge that Bioperl is open source & developers have a lot to do with their time. -Amir Karger > > Which raises another bio question: at what point are the > first 1 or > > 2 bp > > dropped when you have a phase of 1 or 2? Do they appear in the mRNA? > > > > -Amir Karger > > Any sequence present in the sublocations (exons) would be in the > spliced sequence. This would have to include those nucleotides in > exons skipped b/c of the phase since they are part of the > coding region. > > chris > From neetisomaiya at gmail.com Tue Dec 12 07:14:10 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:44:10 +0530 Subject: [Bioperl-l] needle parser in bioperl? Message-ID: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Hi, Does anyone know of a bioperl parser for needle output, basically I won't where the target sequence aligns on the template (i.e. coordinate on the template where the taget aligns). -- -Neeti Even my blood says, B positive From cjfields at uiuc.edu Tue Dec 12 11:57:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 10:57:27 -0600 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > Hi, > > Does anyone know of a bioperl parser for needle output, basically I > won't > where the target sequence aligns on the template (i.e. coordinate > on the > template where the taget aligns). > > -- > -Neeti > Even my blood says, B positive I answered this a number of months back: http://tinyurl.com/yzlbx5 Basically, newer versions of EMBOSS have changed the output for the AlignIO::emboss parser (which parses needle). I don't believe the parser has been fixed to deal with that, but Jason has pointed out you can use MSF output when running needle, then parse using AlignIO with the format set to 'msf'. chris From bosborne11 at verizon.net Tue Dec 12 11:51:05 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 12 Dec 2006 11:51:05 -0500 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: Neeti, EMBOSS' needle and water produce alignments in what Bioperl calls 'emboss' format, so you can use AlignIO to get SimpleAlign objects. The best description of how to use SimpleAlign is the documentation in the module. Brian O. On 12/12/06 7:14 AM, "neeti somaiya" wrote: > Hi, > > Does anyone know of a bioperl parser for needle output, basically I won't > where the target sequence aligns on the template (i.e. coordinate on the > template where the taget aligns). From kaboroev at sfu.ca Tue Dec 12 12:14:39 2006 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Tue, 12 Dec 2006 09:14:39 -0800 Subject: [Bioperl-l] BLAST reports Message-ID: <457EE37F.4020000@sfu.ca> Hi everyone, I would like to manipulate my blast results with bioperl but would also like to have the html output of the blast. What would be the best way of going about this, as I don't see any write functions in any of the blast modules I have looked at. Would it be better to create my own html layout from the blast data then attempt to recover this from bioperl? keith p.s. - does anyone know what the most informative blast "alignment view" output is? xml i suppose? -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 From cjfields at uiuc.edu Tue Dec 12 13:45:05 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 12:45:05 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: On Dec 12, 2006, at 10:05 AM, Amir Karger wrote: ... > http://fungal.genome.duke.edu/annotations/sbay/gff/ > saccharomyces_bayanus > .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!) > > c127 (for example) has two lines in that file: > sbay_c127 AUGUSTUS mRNA 263 723 . + > . ID=sbay_c127-g1.1 > sbay_c127 AUGUSTUS CDS 263 723 . + > 1 Parent=sbay_c127-g1.1 > > Now go to gbrowse page: > http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/ > Type "sbay_c127:250-300" in the search box. > > As you can see from the translation track, if you start at bp 263, you > hit a stop codon after just a few aas. But if you use frame2/phase 1, > you get no stop codons all the way to the end of the contig. Yes, but there are two things. First, there is no distinct start codon. Second, this is what the top NCBI BLASTX hit for that particular exon is: >gi|6323195|ref|NP_013267.1| Gene info Essential 100kDa subunit of the exocyst complex (Sec3p, Sec5p, Sec6p, Sec8p, Sec10p, Sec15p, Exo70p, and Exo84p), which has the essential function of mediating polarized targeting of secretory vesicles to active sites of exocytosis; Sec10p [Saccharomyces cerevisiae] gi|2498891|sp|Q06245|SEC10_YEAST Gene info Exocyst complex component SEC10 gi|1234854|gb|AAB67490.1| Gene info L9362.12 gene product gi|1781307|emb|CAA70041.1| Gene info 100 kD exocyst complex component [Saccharomyces cerevisiae] Length=871 Score = 285 bits (728), Expect = 7e-77 Identities = 141/152 (92%), Positives = 149/152 (98%), Gaps = 0/152 (0%) Frame = +2 Query 2 FNDFYSMGKSDIVEQLRLSKNWKFNLKSVILMKNLLILSSKLETNSIPKTINTKLIIEKY 181 +NDFYSMGKSDIVEQLRLSKNWK NLKSV LMKNLLILSSKLET+SIPKTINTKL +IEKY Sbjct 168 YNDFYSMGKSDIVEQLRLSKNWKLNLKSVKLMKNLLILSSKLETSSIPKTINTKLVIEKY 227 Query 182 SEMMENKLLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE 361 SEMMEN +LLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE Sbjct 228 SEMMENELLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE 287 Query 362 NEFENVFIKNVKFKERLVDFESHSVIVEASMQ 457 NEFENVFIKNVKFKE+L+DFE+HSVI+E SMQ Sbjct 288 NEFENVFIKNVKFKEQLIDFENHSVIIETSMQ 319 Note the query start is well into the predicted coding sequence. Both the lack of a start codon and the above BLASTX hit suggest this is not actually the first exon in the coding region. Therefore the sequence retrieved from spliced_seq() is only part of the full coding region (it seems to lack at least one 3' exon as well). >>>> You can already pass the frame or an offset to >>>> PrimarySeqI::translate(). >>>> We could add a '-phase' argument for >>>> convenience which accepts 0,1,2. >>> >>> What if I >>> want to get the RNA sequence that will become the protein? then >>> having a >>> phase arg to translate() doesn't help. Should there be a >> phase arg to >>> spliced_seq? >> >> You'll also note Jason mentioned there were possible errors in the >> gene prediction programs which produced the output > > That's certainly possible. No gene prediction program will be perfect. > In this case, though, it's clear that it found a large region without > stop codons in it, and correctly identified the place to start > translating. I guess I'm just surprised that, if it found just one > exon > in a gene (in the whole contig) why it would say the exon starts at > 263 > with a phase 1, instead of just saying it starts at 264. Maybe the gene prediction didn't find the first exon, or didn't tie the predicted exons together. Not unusual considering the number of predictions made. >> spliced_seq() is supposed to return the DNA sequence of a split >> location by splicing together the sublocation sequences in their >> 'join' order. So, if the first exon was out of phase, once spliced >> they should all be out of phase to the same degree, assuming all >> exons are joined together correctly. Translating this using the >> phase should produce the correct amino acid sequence. >> >> Note that Jason suggested passing the frame/phase of the first exon >> to translate(), not spliced_seq(). I also suggested translate(). > > You're right. This brings the number of translated polypeptide > sequences > that have lots of *s in them to 9 instead of 90. > > I guess I have two requests here. The first is, if a person wants > to see > exactly which bps are translated to aas -- a nucelotide sequece of > exactly 3N bp starting (usually) with ATG -- then they might want an > argument to spliced_seq that skips the first one or two bp when > necessary. After all, they might want to study the DNA, not the > peptides. > > The second request is for "intelligent objects". If my SeqFeatures > know > that they're in phase 1, then when I call spliced_seq I want the > resulting objects to know that they're phase one, such that when I > call > translate, Bioperl automatically skips the first bp or two. > Admittedly, > there might be big ramifications to this. > > Both requests of course made in the knowledge that Bioperl is open > source & developers have a lot to do with their time. > > -Amir Karger You may want to post these as enhancement requests to Bugzilla just so we can keep track. I think passing a phase parameter to spliced_seq() can be easily accomplished; it's just a matter of returning a subseq of the spliced sequence based on the phase if set. In fact, I am testing it out now. The second may be more problematic, since there may be a time when one would want those extra nucleotides, so I don't think we would want removal of said nucleotides to be the default behavior. Chris From dmessina at wustl.edu Tue Dec 12 13:44:29 2006 From: dmessina at wustl.edu (David Messina) Date: Tue, 12 Dec 2006 12:44:29 -0600 Subject: [Bioperl-l] BLAST reports In-Reply-To: <457EE37F.4020000@sfu.ca> References: <457EE37F.4020000@sfu.ca> Message-ID: <083B4D17-CC7A-406C-9037-4DA5DC31AA05@wustl.edu> Hi Keith, Take a look at: http://www.bioperl.org/wiki/HOWTO:SearchIO You can read in a whole bunch of different blast formats (see Table 1), and it is possible to write out in HTML. See: http://www.bioperl.org/wiki/HOWTO:SearchIO#Writing_and_formatting_output I'm not sure what you mean by the most informative blast output. If you mean which one gives the most information, I'm pretty sure the standard Blast report has everything. Dave From neetisomaiya at gmail.com Tue Dec 12 07:09:39 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:39:39 +0530 Subject: [Bioperl-l] problem in running needle Message-ID: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> I am trying to run needle for the attached two sequence files, on a linux machine. It says "Uncaught exception: Assertion failed, raised at ajmem.c :187". Can anyone tell me what this could be coz of? -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: SEQ_1.REF Type: application/octet-stream Size: 44208 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: seq_of_contig11 Type: application/octet-stream Size: 44344 bytes Desc: not available URL: From cjfields at uiuc.edu Tue Dec 12 15:55:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 14:55:07 -0600 Subject: [Bioperl-l] problem in running needle In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> Message-ID: On Dec 12, 2006, at 6:09 AM, neeti somaiya wrote: > I am trying to run needle for the attached two sequence files, on a > linux > machine. It says "Uncaught exception: Assertion failed, raised at > ajmem.c > :187". > Can anyone tell me what this could be coz of? > > -- > -Neeti > Even my blood says, B positive > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l This would be an EMBOSS error, not a BioPerl error. Maybe the emboss list is the best place for this question? http://emboss.open-bio.org/mailman/listinfo/emboss Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Dec 12 16:30:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 15:30:30 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: <093AE0FF-3C88-4F97-B33F-836B295E3DE3@uiuc.edu> On Dec 12, 2006, at 10:05 AM, Amir Karger wrote: >> Note that Jason suggested passing the frame/phase of the first exon >> to translate(), not spliced_seq(). I also suggested translate(). > > You're right. This brings the number of translated polypeptide > sequences > that have lots of *s in them to 9 instead of 90. > > I guess I have two requests here. The first is, if a person wants > to see > exactly which bps are translated to aas -- a nucelotide sequece of > exactly 3N bp starting (usually) with ATG -- then they might want an > argument to spliced_seq that skips the first one or two bp when > necessary. After all, they might want to study the DNA, not the > peptides. > > The second request is for "intelligent objects". If my SeqFeatures > know > that they're in phase 1, then when I call spliced_seq I want the > resulting objects to know that they're phase one, such that when I > call > translate, Bioperl automatically skips the first bp or two. > Admittedly, > there might be big ramifications to this. > > Both requests of course made in the knowledge that Bioperl is open > source & developers have a lot to do with their time. > > -Amir Karger ... Amir, I committed some code to CVS where I added a -phase parameter option to SeqFeatureI::spliced_seq(). I also added some tests to SeqFeature.t. If you run the following after creating the SeqFeature object $sf (the seq object is $seq): $sf->attach_seq($seq); for my $phase (-1..3) { my $spliced = $sf->spliced_seq(-phase => $phase); print $spliced->seq,"\n"; print $spliced->translate->seq,"\n"; } You should get warnings for any other value than 0, 1, or 2. I'll also note that the sequence you are having trouble with (sbay_c127) is 712 bp, so it doesn't contain the complete coding region. I used it in the test case in SeqFeature.t. Chris From boris.steipe at utoronto.ca Tue Dec 12 16:26:14 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Tue, 12 Dec 2006 16:26:14 -0500 Subject: [Bioperl-l] problem in running needle In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> Message-ID: Looks like a memory allocation problem. Your whole sequence is in one single line, throwing a few linebreaks in there every 80th character or so will probably do the trick. HTH Boris On 12-Dec-06, at 7:09 AM, neeti somaiya wrote: > I am trying to run needle for the attached two sequence files, on a > linux > machine. It says "Uncaught exception: Assertion failed, raised at > ajmem.c > :187". > Can anyone tell me what this could be coz of? > > -- > -Neeti > Even my blood says, B positive > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Derek.Fairley at bll.n-i.nhs.uk Wed Dec 13 05:00:16 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Wed, 13 Dec 2006 10:00:16 -0000 Subject: [Bioperl-l] BLAST reports In-Reply-To: <457EE37F.4020000@sfu.ca> Message-ID: Hi Keith, >I would like to manipulate my blast results with bioperl but would also >like to have the html output of the blast. What would be the best way >of going about this, as I don't see any write functions in any of the >blast modules I have looked at. Would it be better to create my own >html layout from the blast data then attempt to recover this from bioperl? Take a look at some of the example scripts here: http://www.bioperl.org/wiki/Bioperl_scripts Depending on your Bioperl installation, you may already have these in your /scripts directory or similar. The /examples/searchio/htmlwriter.pl script may be a good starting point. >p.s. - does anyone know what the most informative blast "alignment view" >output is? xml i suppose? Assuming you want to get the HSPs, parsing blastxml reports seems to be the most reliable approach. Again, there's a useful script for this: take a look at /scripts/utilities/search2alnblocks.pls. Derek. -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Dec 13 13:02:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 13 Dec 2006 12:02:14 -0600 Subject: [Bioperl-l] Proposal for Meta data Message-ID: I am working on a few RNA-related things related to structure and have a few questions, specifically about Meta data. This is sort of a proposal, but I would like to get everybody's thoughts about this to gauge what everyone thinks. Jason, sorry to bug you but I thought it might be something that would be of use phylohackathon-wise. Heikki has several modules present which adds meta data to sequences (Bio::Seq::Meta). In this case, the meta data is stored as a string (Bio::Seq::Meta) or an array (Bio::Seq::Meta::Array). In both cases you can have multiple types of meta data for a sequence based on a particular tag. However, this also assumes that the meta data is somehow attached strictly to sequence data of some type. It also doesn't allow for having mixed meta data types for a single sequence, such as attaching array data and string data to the same sequence. Hence, I was thinking of a having a simple, generic meta data type (Bio::Meta), one which could encompass simple strings (Bio::Meta::Simple), arrays (Bio::Meta::Array), or any other structured type of data. This could be used to annotate any PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, maybe in a collection (similar to AnnotationCollection). I thought something like this may be of general use for any PrimarySeq (quality, structure), alignments like NEXUS and Stockholm, SeqFeatures where structure could be stored (tRNA or riboswitches), etc. However, this also seems to fall into the category of sequence annotation. So, would it be better to have a set of Bio::Annotation classes used for this purpose? Flames and jibes welcome; I'm wearing my asbestos suit today.... chris From stewarta at nmrc.navy.mil Wed Dec 13 20:06:14 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Wed, 13 Dec 2006 20:06:14 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects Message-ID: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> I am trying to StandAloneBlast->blastall an array or Bio::Seq objects. The documentation claims that blastall can be passed a file name, a Bio::Seq object, or an array of Bio::Seq objects, while the usage suggests that a reference to an array of Bio::Seq objects is what must be passed to blastall. (from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ Bio/Tools/Run/StandAloneBlast.html#POD5) Usage: $seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects $blast_report = $factory->blastall(\@seq_array); Should this be... $report = $factory->blastall(@seq_array); or $report = $factory->blastall(\@seq_array); ??? And if you are blastall'ing an array of Seq objects, then does blastall just return one big blast report or should I be expecting an array of blast reports? I've tried $report = $factory->blastall(@seq_array); which seems to work ok, except that when I process the results, there are only results for the first Seq object in the array. -Andrew -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From arareko at campus.iztacala.unam.mx Wed Dec 13 20:37:27 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 13 Dec 2006 19:37:27 -0600 Subject: [Bioperl-l] BioPerl page in Wikipedia Message-ID: <4580AAD7.3000900@campus.iztacala.unam.mx> Folks, I've updated a little bit of the BioPerl page in the Wikipedia. I think it would be nice if we expand the article a little bit more since it's tagged as a "stub". Here's the link: http://en.wikipedia.org/wiki/BioPerl Cheers, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From lubapardo at gmail.com Thu Dec 14 05:54:07 2006 From: lubapardo at gmail.com (Luba Pardo) Date: Thu, 14 Dec 2006 11:54:07 +0100 Subject: [Bioperl-l] (no subject) Message-ID: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> Hello, I am new bioperl and I have been trying to run the examples available in bptutorial.pl and other basic literature. I have installed the latest release of bioperl 1.5.2 in a usr/local/src directory. Any time I try to retrieve the SwissProt and EMBL databases it gives me an error. With genbank it seems to be fine. I wonder if the installation was not successful, as I would expect that these databases accesses were included in the modules of BioPerl Core. In addition, I would like to ask whether to run Clustaw within the setting of BioPerl I need to download and install it in the same directory in which I have installed bioperl, or is it included in the module of Bio::Align. I am not sure whether this is the best place to ask these very basic questions. If not, could anyone please refer me to the proper e mail account? Thank you very much in advance. Luba Pardo MD, PhD From bix at sendu.me.uk Thu Dec 14 09:10:43 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 09:10:43 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> Message-ID: <45815B63.1020003@sendu.me.uk> Andrew Stewart wrote: > I am trying to StandAloneBlast->blastall an array or Bio::Seq > objects. The documentation claims that blastall can be passed a file > name, You're referring to 'In addition, sequence input may be in the form of either a Bio::Seq object or or an array of Bio::Seq objects'? I agree its not clear, but supplying a reference to an array is still supplying an array. Anyway, I'll clarify it. In any case, the usage for the method is what you should pay attention to: > Usage: > $seq_array_ref = \@seq_array; # where @seq_array is an array of > Bio::Seq objects > $blast_report = $factory->blastall(\@seq_array); > > Should this be... > $report = $factory->blastall(@seq_array); > or > $report = $factory->blastall(\@seq_array); > ??? It should be exactly what it says. A reference to the array. > And if you are blastall'ing an array of Seq objects, then does > blastall just return one big blast report or should I be expecting an > array of blast reports? Returns : Reference to a Blast object or BPlite object containing the blast report. That means, just one big object, not an array. From bix at sendu.me.uk Thu Dec 14 09:42:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 09:42:18 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> References: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> Message-ID: <458162CA.5030803@sendu.me.uk> Luba Pardo wrote: > Hello, I am new bioperl and I have been trying to run the examples > available in bptutorial.pl and other basic literature. I have > installed the latest release of bioperl 1.5.2 in a usr/local/src > directory. Any time I try to retrieve the SwissProt and EMBL > databases it gives me an error. What exactly are you trying? Paste some relevant code along with the exact error message you get when running that code. > I wonder if the installation was not successful, as I would expect > that these databases accesses were included in the modules of BioPerl > Core. They should work with just core installed. In addition, I would like to ask whether to run Clustaw within > the setting of BioPerl I need to download and install it in the same > directory in which I have installed bioperl, or is it included in the > module of Bio::Align. The ClustalW module is in the bioperl-run package, so install that in the same way you installed bioperl (core). The actual ClustalW program you need to download and install according to its own instructions. You let Bioperl know about where you installed ClustalW by eg. setting an environment variable. See http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/Alignment/Clustalw.html#DESCRIPTION for details. > I am not sure whether this is the best place to ask these very basic > questions. If not, could anyone please refer me to the proper e mail > account? Its certainly the correct place, I hope we can resolve your problems. From neetisomaiya at gmail.com Thu Dec 14 03:02:37 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Thu, 14 Dec 2006 13:32:37 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle). I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: 3.out Type: application/octet-stream Size: 204960 bytes Desc: not available URL: From stewarta at nmrc.navy.mil Thu Dec 14 11:34:43 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 11:34:43 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <45815B63.1020003@sendu.me.uk> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> Message-ID: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Thanks for the reply, Sendu. So I've tried passing a reference to an array of Seq objects with the following code... push @blast_run, $factory->blastall(\@query); # where @query is an array of Bio::Seq objects (In case you're wondering, I'm pushing the report into an array of reports because I'm running several instances of blastall with different parameters each time.) ....and it throws me the following exception... ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ Bio/Tools/Run/StandAloneBlast.pm:557 STACK: main::run_blastall ./new_blast_script.pl:215 STACK: ./new_blast_script.pl:115 ----------------------------------------------------------- And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm returns... 757 my $status = system($commandstring); 758 759 $self->throw("$executable call crashed: $? $commandstring \n") 760 unless ($status==0) ; So it looks like the system call isn't returning a happy $status. At this point I'm pretty much stuck, though. Blastall works just fine if I only send it a single Seq object. Looking at _setinput, it appears a reference to an array of Seq objects should end up creating a multi-fasta file. The only possibilities I can think of to explain this is... - The -i file isn't be created for some reason when an (ref to) array of Seqs is passed - There is something wrong with the -i file that is created and sent to blastall. - Something else is wrong with the $commandstring being sent to the system call. Does anyone see something here that I don't? Thanks, Andrew On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote: > Andrew Stewart wrote: >> I am trying to StandAloneBlast->blastall an array or Bio::Seq >> objects. The documentation claims that blastall can be passed a >> file name, > > You're referring to 'In addition, sequence input may be in the form > of either a Bio::Seq object or or an array of Bio::Seq objects'? I > agree its not clear, but supplying a reference to an array is still > supplying an array. Anyway, I'll clarify it. > > > In any case, the usage for the method is what you should pay > attention to: > >> Usage: >> $seq_array_ref = \@seq_array; # where @seq_array is an array of >> Bio::Seq objects >> $blast_report = $factory->blastall(\@seq_array); >> Should this be... >> $report = $factory->blastall(@seq_array); >> or >> $report = $factory->blastall(\@seq_array); >> ??? > > It should be exactly what it says. A reference to the array. > > >> And if you are blastall'ing an array of Seq objects, then does >> blastall just return one big blast report or should I be expecting >> an array of blast reports? > > Returns : Reference to a Blast object or BPlite object > containing the blast report. > > That means, just one big object, not an array. -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From cjfields at uiuc.edu Thu Dec 14 12:03:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 11:03:12 -0600 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Message-ID: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote: > Thanks for the reply, Sendu. > > So I've tried passing a reference to an array of Seq objects with the > following code... > > push @blast_run, $factory->blastall(\@query); # where @query is an > array of Bio::Seq objects > > (In case you're wondering, I'm pushing the report into an array of > reports because I'm running several instances of blastall with > different parameters each time.) > > ....and it throws me the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ > common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E > > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 > STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 > STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ > Bio/Tools/Run/StandAloneBlast.pm:557 > STACK: main::run_blastall ./new_blast_script.pl:215 > STACK: ./new_blast_script.pl:115 > ----------------------------------------------------------- > > And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm > returns... > 757 my $status = system($commandstring); > 758 > 759 $self->throw("$executable call crashed: $? $commandstring > \n") > 760 unless ($status==0) ; > > So it looks like the system call isn't returning a happy $status. At > this point I'm pretty much stuck, though. Blastall works just fine > if I only send it a single Seq object. Looking at _setinput, it > appears a reference to an array of Seq objects should end up creating > a multi-fasta file. The only possibilities I can think of to explain > this is... > > - The -i file isn't be created for some reason when an (ref to) array > of Seqs is passed > - There is something wrong with the -i file that is created and sent > to blastall. > - Something else is wrong with the $commandstring being sent to the > system call. > > Does anyone see something here that I don't? The error pops up when the executable returns a bad status, so maybe it's choking on too many input sequences (i.e. Bioperl is doing everything correctly, but you are attempting to BLAST too many sequences in one go). How many sequences are you attempting to use as input? What happens when you use fewer input sequences? chris From stewarta at nmrc.navy.mil Thu Dec 14 12:49:45 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 12:49:45 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> Message-ID: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> > So can you look at the tempfile that is created and see if it is sane? > > Set -save_tempfiles => 1 whene you initialize the factory object or do > $factory->save_tempfiles(1) > before calling the blastall. > > -jason > Jason, I was actually wondering how to do that. Thanks. Odd though, it still doesn't seem to be saving the tempfiles. Might not matter though, because... > The error pops up when the executable returns a bad status, so > maybe it's choking on too many input sequences (i.e. Bioperl is > doing everything correctly, but you are attempting to BLAST too > many sequences in one go). How many sequences are you attempting > to use as input? What happens when you use fewer input sequences? > > chris > I was processing 738 sequences for input. I cut that down to 20 sequences and I'm getting some other exception thrown further downstream, so it appears you may be correct. You don't happen to know what the max number of sequences that blastall allows for input, would ya? ;) I suppose I'll have to break @query down into smaller doses or something. Thanks, Andrew On Dec 14, 2006, at 12:03 PM, Chris Fields wrote: > > On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote: > >> Thanks for the reply, Sendu. >> >> So I've tried passing a reference to an array of Seq objects with the >> following code... >> >> push @blast_run, $factory->blastall(\@query); # where @query is an >> array of Bio::Seq objects >> >> (In case you're wondering, I'm pushing the report into an array of >> reports because I'm running several instances of blastall with >> different parameters each time.) >> >> ....and it throws me the following exception... >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: blastall call crashed: 11 /common/bin/blastall -p blastp - >> d "/ >> common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: >> 328 >> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ >> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 >> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/ >> lib/ >> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 >> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/ >> perl5/5.8.6/ >> Bio/Tools/Run/StandAloneBlast.pm:557 >> STACK: main::run_blastall ./new_blast_script.pl:215 >> STACK: ./new_blast_script.pl:115 >> ----------------------------------------------------------- >> >> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm >> returns... >> 757 my $status = system($commandstring); >> 758 >> 759 $self->throw("$executable call crashed: $? $commandstring >> \n") >> 760 unless ($status==0) ; >> >> So it looks like the system call isn't returning a happy $status. At >> this point I'm pretty much stuck, though. Blastall works just fine >> if I only send it a single Seq object. Looking at _setinput, it >> appears a reference to an array of Seq objects should end up creating >> a multi-fasta file. The only possibilities I can think of to explain >> this is... >> >> - The -i file isn't be created for some reason when an (ref to) array >> of Seqs is passed >> - There is something wrong with the -i file that is created and sent >> to blastall. >> - Something else is wrong with the $commandstring being sent to the >> system call. >> >> Does anyone see something here that I don't? > > The error pops up when the executable returns a bad status, so > maybe it's choking on too many input sequences (i.e. Bioperl is > doing everything correctly, but you are attempting to BLAST too > many sequences in one go). How many sequences are you attempting > to use as input? What happens when you use fewer input sequences? > > chris > -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From Derek.Fairley at bll.n-i.nhs.uk Thu Dec 14 12:58:10 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Thu, 14 Dec 2006 17:58:10 -0000 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> Message-ID: Neeti, >From http://emboss.sourceforge.net/apps/cvs/needle.html: "The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats." Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format. HTH, Derek. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya Sent: 14 December 2006 08:03 To: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle). I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > -- -Neeti Even my blood says, B positive From cjfields at uiuc.edu Thu Dec 14 13:36:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 12:36:09 -0600 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> Message-ID: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote: >> So can you look at the tempfile that is created and see if it is >> sane? >> >> Set -save_tempfiles => 1 whene you initialize the factory object >> or do >> $factory->save_tempfiles(1) >> before calling the blastall. >> >> -jason >> > > Jason, > I was actually wondering how to do that. Thanks. Odd though, it > still doesn't seem to be saving the tempfiles. Might not matter That needs to be checked out. Can anyone verify that? >> The error pops up when the executable returns a bad status, so >> maybe it's choking on too many input sequences (i.e. Bioperl is >> doing everything correctly, but you are attempting to BLAST too >> many sequences in one go). How many sequences are you attempting >> to use as input? What happens when you use fewer input sequences? >> >> chris >> > > I was processing 738 sequences for input. I cut that down to 20 > sequences and I'm getting some other exception thrown further > downstream, so it appears you may be correct. You don't happen to > know what the max number of sequences that blastall allows for input, > would ya? ;) I suppose I'll have to break @query down into smaller > doses or something. > > Thanks, > Andrew It was a shot in the dark, really. The fact that the return status was bad could be due to a number of problems (permissions issues, bad data, etc). The fact that a single sequence worked indicated that permissions and output format likely weren't to blame. The only other thing left was a problem with blastall itself. BTW, the blast docs do not indicate whether there is a maximum number of sequences. There may be a point where available memory becomes the limiting issue. chris From vaughn at cshl.edu Thu Dec 14 14:09:34 2006 From: vaughn at cshl.edu (Matthew Vaughn) Date: Thu, 14 Dec 2006 14:09:34 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking Message-ID: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> Dear all, I'm trying to bring some of my code into compliance with the BioPerl 1.5.2 and am running into some design decisions that I am unclear on. Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of the 'type' against SOFA? It seems to me that this should be optional behavior as is the case with the Bio::FeatureIO family. I'd be happy to write the patch if there is any agreement with me on this case. Thanks, Matt -- Matthew W. Vaughn, Ph.D. Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 phone: (516) 367-8469 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2413 bytes Desc: not available URL: From jason at bioperl.org Thu Dec 14 11:59:20 2006 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Dec 2006 11:59:20 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Message-ID: <640E2BB7-33F3-44C9-B903-9DDA54F02D12@bioperl.org> So can you look at the tempfile that is created and see if it is sane? Set -save_tempfiles => 1 whene you initialize the factory object or do $factory->save_tempfiles(1) before calling the blastall. -jason On Dec 14, 2006, at 11:34 AM, Andrew Stewart wrote: > Thanks for the reply, Sendu. > > So I've tried passing a reference to an array of Seq objects with the > following code... > > push @blast_run, $factory->blastall(\@query); # where @query is an > array of Bio::Seq objects > > (In case you're wondering, I'm pushing the report into an array of > reports because I'm running several instances of blastall with > different parameters each time.) > > ....and it throws me the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ > common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E > > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 > STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 > STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ > Bio/Tools/Run/StandAloneBlast.pm:557 > STACK: main::run_blastall ./new_blast_script.pl:215 > STACK: ./new_blast_script.pl:115 > ----------------------------------------------------------- > > And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm > returns... > 757 my $status = system($commandstring); > 758 > 759 $self->throw("$executable call crashed: $? $commandstring > \n") > 760 unless ($status==0) ; > > So it looks like the system call isn't returning a happy $status. At > this point I'm pretty much stuck, though. Blastall works just fine > if I only send it a single Seq object. Looking at _setinput, it > appears a reference to an array of Seq objects should end up creating > a multi-fasta file. The only possibilities I can think of to explain > this is... > > - The -i file isn't be created for some reason when an (ref to) array > of Seqs is passed > - There is something wrong with the -i file that is created and sent > to blastall. > - Something else is wrong with the $commandstring being sent to the > system call. > > Does anyone see something here that I don't? > > > Thanks, > Andrew > > > > On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote: > >> Andrew Stewart wrote: >>> I am trying to StandAloneBlast->blastall an array or Bio::Seq >>> objects. The documentation claims that blastall can be passed a >>> file name, >> >> You're referring to 'In addition, sequence input may be in the form >> of either a Bio::Seq object or or an array of Bio::Seq objects'? I >> agree its not clear, but supplying a reference to an array is still >> supplying an array. Anyway, I'll clarify it. >> >> >> In any case, the usage for the method is what you should pay >> attention to: >> >>> Usage: >>> $seq_array_ref = \@seq_array; # where @seq_array is an array of >>> Bio::Seq objects >>> $blast_report = $factory->blastall(\@seq_array); >>> Should this be... >>> $report = $factory->blastall(@seq_array); >>> or >>> $report = $factory->blastall(\@seq_array); >>> ??? >> >> It should be exactly what it says. A reference to the array. >> >> >>> And if you are blastall'ing an array of Seq objects, then does >>> blastall just return one big blast report or should I be expecting >>> an array of blast reports? >> >> Returns : Reference to a Blast object or BPlite object >> containing the blast report. >> >> That means, just one big object, not an array. > > > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From stewarta at nmrc.navy.mil Thu Dec 14 16:23:07 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 16:23:07 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> Message-ID: > It was a shot in the dark, really. The fact that the return status > was bad could be due to a number of problems (permissions issues, > bad data, etc). The fact that a single sequence worked indicated > that permissions and output format likely weren't to blame. The > only other thing left was a problem with blastall itself. > > BTW, the blast docs do not indicate whether there is a maximum > number of sequences. There may be a point where available memory > becomes the limiting issue. > > chris Interesting. I ran the 738-sequence dataset through blastall manually and the report only returned 198 of the 738 expected results. Not only that, it seems to have just cut off right in the middle of the 198th result and a Segmentation fault was reported. I removed the 198th sequence, wondering if it might be some issue with the input, and the segmentation fault occured again with the results ending on the 210th result. I stuck the 198th sequence back in, but at the start of the file and sure enough the Segmentation error occurred earlier. I think we can rule out the size of the input or number of sequences as the source of error here. I'm more inclined to think it has something to do with the blast databases being queried against. I found an old discussion on a problem that sounds fairly similar to this one, for anyone interested. http://bioinformatics.org/pipermail/bioclusters/2004-June/001742.html I think I'll try to work around the problem for now. andrew On Dec 14, 2006, at 1:36 PM, Chris Fields wrote: > > On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote: > >>> So can you look at the tempfile that is created and see if it is >>> sane? >>> >>> Set -save_tempfiles => 1 whene you initialize the factory object >>> or do >>> $factory->save_tempfiles(1) >>> before calling the blastall. >>> >>> -jason >>> >> >> Jason, >> I was actually wondering how to do that. Thanks. Odd though, it >> still doesn't seem to be saving the tempfiles. Might not matter > > That needs to be checked out. Can anyone verify that? > >>> The error pops up when the executable returns a bad status, so >>> maybe it's choking on too many input sequences (i.e. Bioperl is >>> doing everything correctly, but you are attempting to BLAST too >>> many sequences in one go). How many sequences are you attempting >>> to use as input? What happens when you use fewer input sequences? >>> >>> chris >>> >> >> I was processing 738 sequences for input. I cut that down to 20 >> sequences and I'm getting some other exception thrown further >> downstream, so it appears you may be correct. You don't happen to >> know what the max number of sequences that blastall allows for input, >> would ya? ;) I suppose I'll have to break @query down into smaller >> doses or something. >> >> Thanks, >> Andrew > > It was a shot in the dark, really. The fact that the return status > was bad could be due to a number of problems (permissions issues, > bad data, etc). The fact that a single sequence worked indicated > that permissions and output format likely weren't to blame. The > only other thing left was a problem with blastall itself. > > BTW, the blast docs do not indicate whether there is a maximum > number of sequences. There may be a point where available memory > becomes the limiting issue. > > chris -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From lincoln.stein at gmail.com Thu Dec 14 15:24:56 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 14 Dec 2006 15:24:56 -0500 Subject: [Bioperl-l] Bio::Graphics xyplot In-Reply-To: <4578951B.5050206@sfu.ca> References: <4578951B.5050206@sfu.ca> Message-ID: <6dce9a0b0612141224r1ef7cce2s6e6123461c3827d8@mail.gmail.com> Hi, The way it works is that you create a single feature that spans the entire range of the xyplot. It contains subfeatures, each of which has a score. The graph points correspond to each of the subfeatures. Lincoln On 12/7/06, Keith Anthony Boroevich wrote: > > Hi everyone, > > I'm attempting to add an xyplot of the phred quality scores to an > Bio::Graphics image, and cannot get it to work. > I have the panel with a track for both the scale and the DNA displaying > properly. When I attempt to add the xyplot i just get a garbled track > of, what looks like, timy xyplots for each datapoint. I have the cvs > (updated today) of bioperl-live running. I think what I am missing is > the creation of a "Sequence Feature Group" to hold the individual points > of the plot. However, I cannot seem to find such an object. This is > what I attempted: > > -------BEGIN---CODE----------- > # start panel > my $panel = Bio::Graphics::Panel->new(-length => $f_seqlen, > -width => $f_seqlen*10, > -pad_left => 10, > -pad_right => 10, > -grid => 1 > ); > # add scale > $panel->add_track(arrow => > Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen), > -double => 1, > -tick => 2, > -fgcolor => 'black'); > # add DNA ($feature is of type Bio::SeqFeature::Annotated) > $panel->add_track(dna => $feature); > # get list of quality scores from database > my ($pqs_value) = $dbh->selectrow_array($sql); > my @pqs_value = split(/\s/,$pqs_value); > # create track > my $track = $panel->add_track(-glyph => 'xyplot', > -graph_type => 'points', > -point_symbol => 'point', > -max_score => 100, > -min_score => 0, > -scale => 'none'); > # add "subfeatures" to > for (my $i=0;$i<$f_seqlen;$i++) { > > > $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i])); > > } > print $panel->png(); > $panel->finished; > ------END---CODE---------- > > I also attempted to create an array of the point features and passed > that by reference to the panel "add_track" as it describes in the xyplot > documentation, but that resulted in the exact same image. > > keith > > -- > ><)))?> -cGRASP- < > Keith Anthony Boroevich > Davidson Lab > Dept of Molecular Biology > Simon Fraser University > Tel: 604-268-7276 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bix at sendu.me.uk Thu Dec 14 17:15:07 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 17:15:07 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> Message-ID: <4581CCEB.20206@sendu.me.uk> Matthew Vaughn wrote: > Dear all, > > I'm trying to bring some of my code into compliance with the BioPerl > 1.5.2 and am running into some design decisions that I am unclear on. > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of > the 'type' against SOFA? It seems to me that this should be optional > behavior as is the case with the Bio::FeatureIO family. I'd be happy to > write the patch if there is any agreement with me on this case. Lots of people seem to have worked on it over the years, but perhaps Scott Cain is the person to talk to? revision 1.4 date: 2004/09/25 11:41:29; author: scain; state: Exp; lines: +1 -1 two things: * adding SOFA as an available ontology to DocumentRegistry.pm * modifying FeatureIO::gff to use SOFA to validate, and to parse Ontology_term From lincoln.stein at gmail.com Thu Dec 14 16:56:41 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 14 Dec 2006 16:56:41 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: References: Message-ID: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> Hi All, I'm afraid that the xyplot glyph that is in the recent bioperl release has an error that causes the content to be printed to the right of the correct position. Unfortunately this wasn't caught before the release because the glyph was only tested on very large (whole genome) features. You will need to do a CVS update to get a fixed version from bioperl-live. A future bugfix release of gbrowse will patch this glyph for you automatically. Lincoln On 12/12/06, Kara Dolinski wrote: > > Hi, > I'm having a problem getting features and an xyplot properly aligned in > Gbrowse. For example, see this page: > > http://tinyurl.com/ylbq3q > > The feature in the "CENPK SNPs" track should actually be around the peak > of the graph in the "CENPK prediction signal" xyplot ie. the SNP feature > is at position 79, and the xyplot axes and data should span from 61 - 95. > However, as you can see, the data in the xyplot are oddly separated from > the axes (which seem to be in the correct place), with the data shifted over > to about position 120-155. > This occurs elsewhere, not just at the ends of the chromosomes. > > When I zoom to ~80 bp, all is well, see: > > http://tinyurl.com/yzav8k > > The relevant snippets from the GFF and the config files are below. > > Thanks! > Kara > > GFF: > > chrI SNPScanner CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score > is 2.24506 > chrI SNPScanner CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score > is 3.26837 > chrI SNPScanner CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score > is 1.39938 > chrI SNPScanner CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score > is 1.4039 > chrI SNPScanner CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score > is 9.16134 > chrI SNPScanner CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score > is 10.1413 > chrI SNPScanner CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score > is 12.9256 > chrI SNPScanner CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score > is 13.195 > chrI SNPScanner CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score > is 22.7127 > chrI SNPScanner CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score > is 23.8289 > chrI SNPScanner CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score > is 21.9123 > chrI SNPScanner CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score > is 28.3344 > chrI SNPScanner CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score > is 35.0436 > chrI SNPScanner CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score > is 37.361 > chrI SNPScanner CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score > is 39.5408 > chrI SNPScanner CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score > is 28.2008 > chrI SNPScanner CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score > is 32.6254 > chrI SNPScanner CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score > is 36.0832 > chrI SNPScanner CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score > is 32.1205 > chrI SNPScanner CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score > is 41.3048 > chrI SNPScanner CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score > is 30.7975 > chrI SNPScanner CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score > is 29.4282 > chrI SNPScanner CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score > is 35.3586 > chrI SNPScanner CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score > is 34.1426 > chrI SNPScanner CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score > is 30.2966 > chrI SNPScanner CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score > is 17.8402 > chrI SNPScanner CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score > is 15.2637 > chrI SNPScanner CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score > is 12.657 > chrI SNPScanner CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score > is 10.2033 > chrI SNPScanner CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score > is 9.40143 > chrI SNPScanner CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score > is 6.56273 > chrI SNPScanner CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score > is 3.66211 > chrI SNPScanner CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score > is 0.394194 > > CONFIG: > > > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} > > [CENPK_all_scores_graph] > feature = GRAPH_CENPK:SNPScanner > glyph = xyplot > graph_type = boxes > fgcolor = purple > bgcolor = purple > height = 100 > min_score = 0 > max_score = 110 > label = 0 > key = CENPK prediction signal > link = > category = SNPs: signal graphs > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dmessina at wustl.edu Thu Dec 14 20:45:24 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 19:45:24 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: References: Message-ID: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> Hey Chris, My thoughts below. > [Chris] > This could be used to annotate any > PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, > maybe in a collection (similar to AnnotationCollection). I thought > something like this may be of general use for any PrimarySeq > (quality, structure), alignments like NEXUS and Stockholm, > SeqFeatures where structure could be stored (tRNA or riboswitches), > etc. > > However, this also seems to fall into the category of sequence > annotation. So, would it be better to have a set of Bio::Annotation > classes used for this purpose? To me, all meta data is equal. That is, your classic Genbank feature annotation and a user's arbitrary meta-tag like "Bob thinks this is a kinase domain" aren't different in kind even if they are different in content. As resequencing projects multiply, the ability to create arbitrary meta tags, attach them to different types of objects, and use those tags to link them together will become desirable, if not essential. Keeping a common interface to all of these meta data types would be advantageous, plus new users won't have to determine whether they need to use Bio::Meta objects or Bio::Annotation objects. So I would argue for all of the meta data types to live "under one roof". Which roof isn't as important. Bio::Annotation, since it already exists for today's meta data, seems like a reasonable choice. (assuming Annotation objects are flexible enough to be extended as you propose) There, and no flames or jibes even. :) Dave From cjfields at uiuc.edu Thu Dec 14 21:21:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 20:21:10 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> Message-ID: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> On Dec 14, 2006, at 7:45 PM, David Messina wrote: > Hey Chris, > > My thoughts below. > >> [Chris] >> This could be used to annotate any >> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, >> maybe in a collection (similar to AnnotationCollection). I thought >> something like this may be of general use for any PrimarySeq >> (quality, structure), alignments like NEXUS and Stockholm, >> SeqFeatures where structure could be stored (tRNA or riboswitches), >> etc. >> >> However, this also seems to fall into the category of sequence >> annotation. So, would it be better to have a set of Bio::Annotation >> classes used for this purpose? > > > To me, all meta data is equal. That is, your classic Genbank feature > annotation and a user's arbitrary meta-tag like "Bob thinks this is a > kinase domain" aren't different in kind even if they are different in > content. > > As resequencing projects multiply, the ability to create arbitrary > meta tags, attach them to different types of objects, and use those > tags to link them together will become desirable, if not essential. > > Keeping a common interface to all of these meta data types would be > advantageous, plus new users won't have to determine whether they > need to use Bio::Meta objects or Bio::Annotation objects. > > So I would argue for all of the meta data types to live "under one > roof". Which roof isn't as important. Bio::Annotation, since it > already exists for today's meta data, seems like a reasonable choice. > (assuming Annotation objects are flexible enough to be extended as > you propose) > > There, and no flames or jibes even. :) I guess what I want to know is whether there should to be a distinction between 'normal' sequence annotation (comments, references, and so on) and annotation that could be best described as position-specific (like RNA or protein structural annotation). The current meta implementation is for sequence data only; I felt it would be nice to have a generic implementation that would be applicable to any object data. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dmessina at wustl.edu Thu Dec 14 21:46:27 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 20:46:27 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> Message-ID: <9C72012A-EFD7-42DD-93F8-578251CFDE01@wustl.edu> And it all seemed so clear to me when I wrote it. :) > whether there should to be a distinction I would argue no because it would contravene a s > a generic implementation that would be applicable to any object data. I wholeheartedly agree that this is the way to go. A generic implementation would allow arbitrary object data while maintaining a standard interface. From dmessina at wustl.edu Thu Dec 14 21:46:27 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 20:46:27 -0600 Subject: [Bioperl-l] Proposal for Meta data Message-ID: [oops, accidentally hit send midsentence] And it all seemed so clear to me when I wrote it. :) > whether there should to be a distinction I would argue no because it would contravene a standard interface. > a generic implementation that would be applicable to any object data. I wholeheartedly agree that this is the way to go. A generic implementation would allow arbitrary object data while maintaining a standard interface. Dave From neetisomaiya at gmail.com Fri Dec 15 00:21:42 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 15 Dec 2006 10:51:42 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> Message-ID: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Hi, Thanks a lot for your response. I ran needle like this /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out It gave me the output in format msf. But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO. Please help. Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence. ~Neeti. On 12/14/06, Fairley, Derek wrote: > > Neeti, > > > > From http://emboss.sourceforge.net/apps/cvs/needle.html: > > > > "The results can be output in one of several styles by using the > command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of > the required format. Some of the alignment formats can cope with an > unlimited number of sequences, while others are only for pairs of sequences. > > > > > The available multiple alignment format names are: unknown, multiple, > simple, fasta, msf, trace, srs > > > > The available pairwise alignment format names are: pair, markx0, markx1, > markx2, markx3, markx10, srspair, score > > > > See: http://emboss.sf.net/docs/themes/AlignFormats.html for further > information on alignment formats." > > > > Not sure based on this whether you can get pairwise alignment in .msf > format; can't think of a good reason why not. The BioPerl Align::IO module > will allow you to parse alignments in .msf format. > > > > HTH, > > > > Derek. > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto: > bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya > Sent: 14 December 2006 08:03 > To: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > > > How do I run needle specifying that I want the MSF format, on a linux box? > > The help doesnt show me any format option. Is there anything available to > > pasre MSF format? > > Please find an example alignment file attached. Here the seq_of_contig > > aligns with the reference sequence (i.e. SEQ_1.REF) starting at position > > (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the > > output alignment, how can I parse the result to get this? > > > > On 12/12/06, Chris Fields wrote: > > > > > > > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > > > > > Hi, > > > > > > > > Does anyone know of a bioperl parser for needle output, basically I > > > > won't > > > > where the target sequence aligns on the template (i.e. coordinate > > > > on the > > > > template where the taget aligns). > > > > > > > > -- > > > > -Neeti > > > > Even my blood says, B positive > > > > > > I answered this a number of months back: > > > > > > http://tinyurl.com/yzlbx5 > > > > > > Basically, newer versions of EMBOSS have changed the output for the > > > AlignIO::emboss parser (which parses needle). I don't believe the > > > parser has been fixed to deal with that, but Jason has pointed out > > > you can use MSF output when running needle, then parse using AlignIO > > > with the format set to 'msf'. > > > > > > chris > > > > > > > > > > > -- > > -Neeti > > Even my blood says, B positive > -- -Neeti Even my blood says, B positive From Derek.Fairley at bll.n-i.nhs.uk Fri Dec 15 04:57:35 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Fri, 15 Dec 2006 09:57:35 -0000 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Message-ID: Neeti, In lieu of a response from a BioPerl guru... why not use Needle to generate your pairwise alignment in fasta format, rather than msf format? The sequence you want should correspond to a single HSP which you can get directly from the fasta alignment with Bio::SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use Bio::AlignIO at all. Derek. -----Original Message----- From: neeti somaiya [mailto:neetisomaiya at gmail.com] Sent: 15 December 2006 05:22 To: Fairley, Derek; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? Hi, Thanks a lot for your response. I ran needle like this ?/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out It gave me the output in format msf. But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO. Please help. Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence. ~Neeti. On 12/14/06, Fairley, Derek wrote: Neeti, ? >From http://emboss.sourceforge.net/apps/cvs/needle.html : ? "The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. ? The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs ? The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score ? See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats." ? Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format. ? HTH, ? Derek. ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya Sent: 14 December 2006 08:03 To: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? ? How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? ? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle).? I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > ? ? ? -- -Neeti Even my blood says, B positive -- -Neeti Even my blood says, B positive From cain at cshl.edu Fri Dec 15 00:01:36 2006 From: cain at cshl.edu (Scott Cain) Date: Fri, 15 Dec 2006 00:01:36 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <4581CCEB.20206@sendu.me.uk> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> Message-ID: <1166158897.2569.335.camel@localhost.localdomain> As much as I would like to take credit for this :-) Allen Day wrote the original code, and then Chris Fields tried to fix it so that it actually worked :-) I think it would be a good idea to have a validate_terms option like Bio::FeatureIO::gff. Scott On Thu, 2006-12-14 at 17:15 -0500, Sendu Bala wrote: > Matthew Vaughn wrote: > > Dear all, > > > > I'm trying to bring some of my code into compliance with the BioPerl > > 1.5.2 and am running into some design decisions that I am unclear on. > > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of > > the 'type' against SOFA? It seems to me that this should be optional > > behavior as is the case with the Bio::FeatureIO family. I'd be happy to > > write the patch if there is any agreement with me on this case. > > Lots of people seem to have worked on it over the years, but perhaps > Scott Cain is the person to talk to? > > revision 1.4 > date: 2004/09/25 11:41:29; author: scain; state: Exp; lines: +1 -1 > two things: > * adding SOFA as an available ontology to DocumentRegistry.pm > * modifying FeatureIO::gff to use SOFA to validate, and to parse > Ontology_term > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From neetisomaiya at gmail.com Fri Dec 15 07:46:08 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 15 Dec 2006 18:16:08 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Message-ID: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> I ran needle like this /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out Please find the output attached. When I run the following :- use Bio::SearchIO; my $io = Bio::SearchIO->new(-file => "1.out", -format => "fasta" ); while ( my $result = $io->next_result() ) { while( my $hit = $result->next_hit) { print "yes\n"; } } It says :- -------------------- WARNING --------------------- MSG: unrecognized FASTA Family report file! --------------------------------------------------- What should I do? ~Neeti. On 12/15/06, Fairley, Derek wrote: > > Neeti, > > In lieu of a response from a BioPerl guru... why not use Needle to > generate your pairwise alignment in fasta format, rather than msf format? > The sequence you want should correspond to a single HSP which you can get > directly from the fasta alignment with Bio::SearchIO: > http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use > Bio::AlignIO at all. > > Derek. > > > -----Original Message----- > From: neeti somaiya [mailto:neetisomaiya at gmail.com] > Sent: 15 December 2006 05:22 > To: Fairley, Derek; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > Hi, > > Thanks a lot for your response. > I ran needle like this > /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out > It gave me the output in format msf. > But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I > get the alignment start and stop coordinates on the sequence. I mean > something like hsp->query->start which gives us the alignment start position > on query sequence in a blast output when using Bio::SearchIO. > Please help. > Like I explained with an example in my previous mail, I want the > coordinate where the alignment starts on the sequence. > > ~Neeti. > On 12/14/06, Fairley, Derek wrote: > Neeti, > > From http://emboss.sourceforge.net/apps/cvs/needle.html : > > "The results can be output in one of several styles by using the > command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of > the required format. Some of the alignment formats can cope with an > unlimited number of sequences, while others are only for pairs of sequences. > > The available multiple alignment format names are: unknown, multiple, > simple, fasta, msf, trace, srs > > The available pairwise alignment format names are: pair, markx0, markx1, > markx2, markx3, markx10, srspair, score > > See: http://emboss.sf.net/docs/themes/AlignFormats.html for further > information on alignment formats." > > Not sure based on this whether you can get pairwise alignment in .msf > format; can't think of a good reason why not. The BioPerl Align::IO module > will allow you to parse alignments in .msf format. > > HTH, > > Derek. > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto: > bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya > Sent: 14 December 2006 08:03 > To: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > How do I run needle specifying that I want the MSF format, on a linux box? > The help doesnt show me any format option. Is there anything available to > pasre MSF format? > Please find an example alignment file attached. Here the seq_of_contig > aligns with the reference sequence (i.e. SEQ_1.REF) starting at position > (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the > output alignment, how can I parse the result to get this? > > On 12/12/06, Chris Fields wrote: > > > > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > > > Hi, > > > > > > Does anyone know of a bioperl parser for needle output, basically I > > > won't > > > where the target sequence aligns on the template (i.e. coordinate > > > on the > > > template where the taget aligns). > > > > > > -- > > > -Neeti > > > Even my blood says, B positive > > > > I answered this a number of months back: > > > > http://tinyurl.com/yzlbx5 > > > > Basically, newer versions of EMBOSS have changed the output for the > > AlignIO::emboss parser (which parses needle). I don't believe the > > parser has been fixed to deal with that, but Jason has pointed out > > you can use MSF output when running needle, then parse using AlignIO > > with the format set to 'msf'. > > > > chris > > > > > > -- > -Neeti > Even my blood says, B positive > > > > -- > -Neeti > Even my blood says, B positive > -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.out Type: application/octet-stream Size: 90277 bytes Desc: not available URL: From jason at bioperl.org Fri Dec 15 09:28:13 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 09:28:13 -0500 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> Message-ID: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > > On Dec 14, 2006, at 7:45 PM, David Messina wrote: > >> Hey Chris, >> >> My thoughts below. >> >>> [Chris] >>> This could be used to annotate any >>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, >>> maybe in a collection (similar to AnnotationCollection). I thought >>> something like this may be of general use for any PrimarySeq >>> (quality, structure), alignments like NEXUS and Stockholm, >>> SeqFeatures where structure could be stored (tRNA or riboswitches), >>> etc. >>> >>> However, this also seems to fall into the category of sequence >>> annotation. So, would it be better to have a set of Bio::Annotation >>> classes used for this purpose? >> >> >> To me, all meta data is equal. That is, your classic Genbank feature >> annotation and a user's arbitrary meta-tag like "Bob thinks this is a >> kinase domain" aren't different in kind even if they are different in >> content. >> >> As resequencing projects multiply, the ability to create arbitrary >> meta tags, attach them to different types of objects, and use those >> tags to link them together will become desirable, if not essential. >> >> Keeping a common interface to all of these meta data types would be >> advantageous, plus new users won't have to determine whether they >> need to use Bio::Meta objects or Bio::Annotation objects. >> >> So I would argue for all of the meta data types to live "under one >> roof". Which roof isn't as important. Bio::Annotation, since it >> already exists for today's meta data, seems like a reasonable choice. >> (assuming Annotation objects are flexible enough to be extended as >> you propose) >> >> There, and no flames or jibes even. :) > > I guess what I want to know is whether there should to be a > distinction between 'normal' sequence annotation (comments, > references, and so on) and annotation that could be best described as > position-specific (like RNA or protein structural annotation). The > current meta implementation is for sequence data only; I felt it > would be nice to have a generic implementation that would be > applicable to any object data. my stream-of-consciousness for right now: I was thinking Bio::Annotation is where this should go - that system doesn't have anything about it that makes it explicitly sequence related. What we're trying to hammer out here on the Alignment side - which fits with your RNA example - is have features, basically SeqFeatures - associated with alignments so columns can be annotated to cover things like character sets and partitions for phylogenetic analyses. As for data which annotates non-contiguous things like RNAstems we may have to be more creative about that or model it with a splitLocation. So currently we've added code so that an Alignment is-a Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this end, with the goal of being able to capture more of the data that can be represented in a NEXUS file. It feels more like a hack than an elegant Meta-data solution, but I am totally sure whether the data you are thinking about doing at this point, perhaps I need to spend more time thinking about it. Or are you worried about the idea of whether the semantic mapping of the data into features or annotations is confusing users? From jason at bioperl.org Fri Dec 15 09:48:32 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 09:48:32 -0500 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> Message-ID: <42CB9018-72CD-433E-A42F-152D63D2F584@bioperl.org> I get the impression you are trying to use the wrong tool for the job. Can you explain a little more generally what you want to do? Semantically FASTA in Bio::SearchIO is much different from FASTA in Bio::AlignIO. We explain this on the wiki, please have a look on the FASTA page. do not use Bio::SearchIO to parse multi-fasta alignment output Bio::SearchIO is for pairwise alignment reports use Bio::AlignIO for a multi-fasta format or for msf - you just provide a different field to '-format'. But none of that is going to help you get start/end for your alignment because that is not part of the output format - do the experiment of looking at the file and figuring out what are the actual fields you want output, if they don't exist then you either have a format that won't work for your question, or you will have to calculate additional . If you trying to align transcripts to genome please consider tools that are built for it (and referenced on the wiki like Sim4, est2genome, exonerate, BLAT). -jason On Dec 15, 2006, at 7:46 AM, neeti somaiya wrote: > I ran needle like this > > /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out > > Please find the output attached. > > When I run the following :- > > use Bio::SearchIO; > > my $io = Bio::SearchIO->new(-file => "1.out", > -format => "fasta" ); > > while ( my $result = $io->next_result() ) > { > while( my $hit = $result->next_hit) > { > > print "yes\n"; > } > } > > > It says :- > > -------------------- WARNING --------------------- > MSG: unrecognized FASTA Family report file! > --------------------------------------------------- > > What should I do? > > ~Neeti. > > On 12/15/06, Fairley, Derek wrote: >> >> Neeti, >> >> In lieu of a response from a BioPerl guru... why not use Needle to >> generate your pairwise alignment in fasta format, rather than msf >> format? >> The sequence you want should correspond to a single HSP which you >> can get >> directly from the fasta alignment with Bio::SearchIO: >> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need >> to use >> Bio::AlignIO at all. >> >> Derek. >> >> >> -----Original Message----- >> From: neeti somaiya [mailto:neetisomaiya at gmail.com] >> Sent: 15 December 2006 05:22 >> To: Fairley, Derek; bioperl-l >> Subject: Re: [Bioperl-l] needle parser in bioperl? >> >> Hi, >> >> Thanks a lot for your response. >> I ran needle like this >> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out >> It gave me the output in format msf. >> But now my problem is, if I use Bio::AlignIO module of Bioperl, >> how can I >> get the alignment start and stop coordinates on the sequence. I mean >> something like hsp->query->start which gives us the alignment >> start position >> on query sequence in a blast output when using Bio::SearchIO. >> Please help. >> Like I explained with an example in my previous mail, I want the >> coordinate where the alignment starts on the sequence. >> >> ~Neeti. >> On 12/14/06, Fairley, Derek wrote: >> Neeti, >> >> From http://emboss.sourceforge.net/apps/cvs/needle.html : >> >> "The results can be output in one of several styles by using the >> command-line qualifier -aformat xxx, where 'xxx' is replaced by >> the name of >> the required format. Some of the alignment formats can cope with an >> unlimited number of sequences, while others are only for pairs of >> sequences. >> >> The available multiple alignment format names are: unknown, multiple, >> simple, fasta, msf, trace, srs >> >> The available pairwise alignment format names are: pair, markx0, >> markx1, >> markx2, markx3, markx10, srspair, score >> >> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further >> information on alignment formats." >> >> Not sure based on this whether you can get pairwise alignment in .msf >> format; can't think of a good reason why not. The BioPerl >> Align::IO module >> will allow you to parse alignments in .msf format. >> >> HTH, >> >> Derek. >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto: >> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya >> Sent: 14 December 2006 08:03 >> To: Chris Fields; bioperl-l >> Subject: Re: [Bioperl-l] needle parser in bioperl? >> >> How do I run needle specifying that I want the MSF format, on a >> linux box? >> The help doesnt show me any format option. Is there anything >> available to >> pasre MSF format? >> Please find an example alignment file attached. Here the >> seq_of_contig >> aligns with the reference sequence (i.e. SEQ_1.REF) starting at >> position >> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate >> from the >> output alignment, how can I parse the result to get this? >> >> On 12/12/06, Chris Fields wrote: >> > >> > >> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: >> > >> > > Hi, >> > > >> > > Does anyone know of a bioperl parser for needle output, >> basically I >> > > won't >> > > where the target sequence aligns on the template (i.e. coordinate >> > > on the >> > > template where the taget aligns). >> > > >> > > -- >> > > -Neeti >> > > Even my blood says, B positive >> > >> > I answered this a number of months back: >> > >> > http://tinyurl.com/yzlbx5 >> > >> > Basically, newer versions of EMBOSS have changed the output for the >> > AlignIO::emboss parser (which parses needle). I don't believe the >> > parser has been fixed to deal with that, but Jason has pointed out >> > you can use MSF output when running needle, then parse using >> AlignIO >> > with the format set to 'msf'. >> > >> > chris >> > >> >> >> >> -- >> -Neeti >> Even my blood says, B positive >> >> >> >> -- >> -Neeti >> Even my blood says, B positive >> > > > > -- > -Neeti > Even my blood says, B positive > <1.out> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From lubapardo at gmail.com Fri Dec 15 11:39:11 2006 From: lubapardo at gmail.com (Luba Pardo) Date: Fri, 15 Dec 2006 17:39:11 +0100 Subject: [Bioperl-l] NO BLAST Message-ID: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> *Hello,* *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;* ** *I got the following error message: cannot find path to blastall.* *The code I used is (modified from HOWTObeginners): * #! /local/bin/perl -w #use strict; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; use Bio::Tools::Run::StandAloneBlast; my $db_object = Bio::DB::GenBank-> new; #my $seq_ob = $db_object->get_Seq_by_id('NM_004043'); #$seq= Bio::SeqIO->new(-file => "> out.fasta", -format => 'fasta'); #$seq ->write_seq($seq_ob); #print $seq; @params = (program =>'blastn', database =>'db.fa'); $blast_obj =Bio::Tools::Run::StandAloneBlast->new(@params); $seq_obj = Bio::Seq->new(-id =>"testquery", -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); $report_obj = $blast_obj->blastall($seq_obj); $result_obj =$report_obj->next_result; print $result_obj->num_hits; *Whether I create a sequence the novo or retrieve one from internet I got the same message.* From cjfields at uiuc.edu Fri Dec 15 12:23:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 11:23:27 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> Message-ID: On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: > > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > >> >> On Dec 14, 2006, at 7:45 PM, David Messina wrote: >> >>> Hey Chris, >>> >>> My thoughts below. >>> >>>> [Chris] >>>> This could be used to annotate any >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- >>>> you, >>>> maybe in a collection (similar to AnnotationCollection). I thought >>>> something like this may be of general use for any PrimarySeq >>>> (quality, structure), alignments like NEXUS and Stockholm, >>>> SeqFeatures where structure could be stored (tRNA or riboswitches), >>>> etc. >>>> >>>> However, this also seems to fall into the category of sequence >>>> annotation. So, would it be better to have a set of >>>> Bio::Annotation >>>> classes used for this purpose? >>> >>> >>> To me, all meta data is equal. That is, your classic Genbank feature >>> annotation and a user's arbitrary meta-tag like "Bob thinks this >>> is a >>> kinase domain" aren't different in kind even if they are >>> different in >>> content. >>> >>> As resequencing projects multiply, the ability to create arbitrary >>> meta tags, attach them to different types of objects, and use those >>> tags to link them together will become desirable, if not essential. >>> >>> Keeping a common interface to all of these meta data types would be >>> advantageous, plus new users won't have to determine whether they >>> need to use Bio::Meta objects or Bio::Annotation objects. >>> >>> So I would argue for all of the meta data types to live "under one >>> roof". Which roof isn't as important. Bio::Annotation, since it >>> already exists for today's meta data, seems like a reasonable >>> choice. >>> (assuming Annotation objects are flexible enough to be extended as >>> you propose) >>> >>> There, and no flames or jibes even. :) >> >> I guess what I want to know is whether there should to be a >> distinction between 'normal' sequence annotation (comments, >> references, and so on) and annotation that could be best described as >> position-specific (like RNA or protein structural annotation). The >> current meta implementation is for sequence data only; I felt it >> would be nice to have a generic implementation that would be >> applicable to any object data. > > my stream-of-consciousness for right now: > > I was thinking Bio::Annotation is where this should go - that > system doesn't have anything about it that makes it explicitly > sequence related. What we're trying to hammer out here on the > Alignment side - which fits with your RNA example - is have > features, basically SeqFeatures - associated with alignments so > columns can be annotated to cover things like character sets and > partitions for phylogenetic analyses. As for data which annotates > non-contiguous things like RNAstems we may have to be more > creative about that or model it with a splitLocation. > > So currently we've added code so that an Alignment is-a > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this > end, with the goal of being able to capture more of the data that > can be represented in a NEXUS file. > > It feels more like a hack than an elegant Meta-data solution, but I > am totally sure whether the data you are thinking about doing at > this point, perhaps I need to spend more time thinking about it. > Or are you worried about the idea of whether the semantic mapping > of the data into features or annotations is confusing users? Sorry in advance for the longish response here... My original thought was to have a generic abstract class capable of positionally describing data in any another class, similar to Heikki's Bio::Seq::MetaI but not constrained to sequence data only. Implementing classes would be capable of having different data structures based on their use (simple string, array, AoA, AoH, AoO). One MetaCollection class to contain them all in a tag-like system, so you could have mixed data types describe the same object. The latter Collection class is so similar to AnnotationCollection that I agree Bio::Annotation would be the best place for this. The way I reconfigured Stockholm alignment parsing/writing is to use Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is capable of holding a sequence and several meta strings, stored as tags or 'names'. However, there is no Meta object for alignments (for RNA/protein structure consensus and other Rfam/Pfam markup); I hacked around this by using a Bio::Seq::Meta w/o a seq, but I would rather have a generic Meta object independent of the sequence cruft. So for this partial Pfam alignment, Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG #=GR Q92SV1_RHIME/122-299 pAS ......................... Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT #=GC SA_cons 03002200312...1312414..676 #=GC seq_cons luhhLuhsRpl...hthppth..+pG // '#=GC' lines would be in generic meta string objects in the alignment, while '#=GR' tags would be in similar meta objects in the relevant sequences. As long as both aren't AnnotatableI this isn't an issue. Similarly, NEXUS files which contained any position-based values could hold a meta string/array object in a similar tag. The basic scheme is: |--String | Annotation::Meta----|--Array | |--HorriblyComplexDataStruct Then I started thinking about where this could be applied, and whether a true Meta object needs to be constrained only to describing position-based data. This somewhat relates to this bug: http://bugzilla.open-bio.org/show_bug.cgi?id=1825 which seems to need a simple but unconstrained hash-of-arrays-based meta object. Then my head appropriately exploded... Hope everything is going well at the hackathon! Looks like some interesting stuff coming out of it. chris From cjfields at uiuc.edu Fri Dec 15 12:49:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 11:49:45 -0600 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <1166158897.2569.335.camel@localhost.localdomain> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> Message-ID: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> On Dec 14, 2006, at 11:01 PM, Scott Cain wrote: > As much as I would like to take credit for this :-) Allen Day > wrote the > original code, and then Chris Fields tried to fix it so that it > actually > worked :-) I think it would be a good idea to have a validate_terms > option like Bio::FeatureIO::gff. > > Scott I did ?!? I committed a bug fix a while back: Revision 1.34 / (view) - annotate - [select for diffs] , Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields Branch: MAIN CVS Tags: branch-experimental Branch point for: branch-1-5-2 Changes since 1.33: +155 -33 lines Diff to previous 1.33 Bug 2026; Robert's enhancements To tell the truth I don't know if this is where the mandatory checks were added in; I'm not too familiar with SeqFeature::Annotation yet. I agree with Scott (and Matthew) that SOFA checks should be optional. Matthew, can you write up a patch and maybe some tests? chris From stewarta at nmrc.navy.mil Thu Dec 14 18:30:11 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 18:30:11 -0500 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown Message-ID: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> I'm getting the following exception... ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ SearchIO/blast.pm:1172 STACK: main::process_reports ./new_blast_script.pl:254 STACK: ./new_blast_script.pl:132 ----------------------------------------------------------- next_result is a pretty dense chunk of code to decipher. I was wondering if anyone more familiar with that code might know what the "no data for midline $_" exception is referring to? For context: 1161 if( /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+ (\-?\d+)/ ) { 1162 my ($full,$type,$start,$str,$end) = ($1, $2,$3,$4,$5); 1163 if( $str eq '-' ) { 1164 $i = 3 if $type eq 'Sbjct'; 1165 } else { 1166 $data{$type} = $str; 1167 } 1168 $len = length($full); 1169 $self->{"\_$type"}->{'begin'} = $start unless $self->{"_$type"}->{'begin'}; 1170 $self->{"\_$type"}->{'end'} = $end; 1171 } else { 1172 $self->throw("no data for midline $_") 1173 unless (defined $_ && defined $len); 1174 $data{'Mid'} = substr($_,$len); 1175 } -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From jason at bioperl.org Fri Dec 15 13:56:13 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 13:56:13 -0500 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown In-Reply-To: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> Message-ID: It means it is expecting alignment block of data and there is none (or there is none in the context it is expecting it) - so something is wrong with the report as it gets tripped up. I'm not sure reading the code is going to help you - what someone will have to do is figure out what is different about this report than reports that do work for the parser. You'll do better if you just provide an example report that is failing as a bug report. Providing the version of BLAST you are using and version of bioperl will help. I seem to remember NCBI changing the BLAST text format so that will break the parser if it is a significant change. As has been mentioned in the past, this playing cat and mouse with format changes means things will periodically break. If you need rock- solid always going to work, I guess the XML is better route to go. -jason On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote: > I'm getting the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ > SearchIO/blast.pm:1172 > STACK: main::process_reports ./new_blast_script.pl:254 > STACK: ./new_blast_script.pl:132 > ----------------------------------------------------------- > > > next_result is a pretty dense chunk of code to decipher. I was > wondering if anyone more familiar with that code might know what the > "no data for midline $_" exception is referring to? > > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Dec 15 14:21:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 13:21:32 -0600 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown In-Reply-To: References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> Message-ID: <6A0D17FA-CB98-4937-998E-11B87FB9CBBD@uiuc.edu> On Dec 15, 2006, at 12:56 PM, Jason Stajich wrote: > It means it is expecting alignment block of data and there is none > (or there is none in the context it is expecting it) - so something > is wrong with the report as it gets tripped up. > > I'm not sure reading the code is going to help you - what someone > will have to do is figure out what is different about this report > than reports that do work for the parser. > You'll do better if you just provide an example report that is > failing as a bug report. > > Providing the version of BLAST you are using and version of bioperl > will help. I seem to remember NCBI changing the BLAST text format so > that will break the parser if it is a significant change. > > As has been mentioned in the past, this playing cat and mouse with > format changes means things will periodically break. If you need rock- > solid always going to work, I guess the XML is better route to go. > > -jason I agree that XML is the only reliable way to go, though I have been reading on the BioPython group about some issues with newer (2.2.13 or greater) BLAST XML output when reports with multiple BLAST queries. Don't know if this affects Bioperl or not. As for the 'midline' error, there was a similar error a while back (fixed for the 1.5.2 release) that had to do with extra lines in the alignment section in some BLAST reports. Unless we have a demo BLAST report and sample code we can't do much about it (we need to reproduce the error in order to fix it), so the best thing to do it file a bug report. chris > On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote: > >> I'm getting the following exception... >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: >> 328 >> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ >> SearchIO/blast.pm:1172 >> STACK: main::process_reports ./new_blast_script.pl:254 >> STACK: ./new_blast_script.pl:132 >> ----------------------------------------------------------- >> >> >> next_result is a pretty dense chunk of code to decipher. I was >> wondering if anyone more familiar with that code might know what the >> "no data for midline $_" exception is referring to? >> >> >> -- >> Andrew Stewart >> Research Assistant, Genomics Team >> Navy Medical Research Center (NMRC) >> Biological Defense Research Directorate (BDRD) >> BDRD Annex >> 12300 Washington Avenue, 2nd Floor >> Rockville, MD 20852 >> >> email: stewarta at nmrc.navy.mil >> phone: 301-231-6700 Ext 270 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From vaughn at cshl.edu Fri Dec 15 13:05:47 2006 From: vaughn at cshl.edu (Matthew Vaughn) Date: Fri, 15 Dec 2006 13:05:47 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> Message-ID: Yes, I will. I am working on it today. It's a little more complicated to fix this than I expected because SeqFeature::Annotation->type() returns a Bio::AnnotationI rather than a simple scalar like it used to. On 12/15/06, Chris Fields wrote: > On Dec 14, 2006, at 11:01 PM, Scott Cain wrote: > > > As much as I would like to take credit for this :-) Allen Day > > wrote the > > original code, and then Chris Fields tried to fix it so that it > > actually > > worked :-) I think it would be a good idea to have a validate_terms > > option like Bio::FeatureIO::gff. > > > > Scott > > I did ?!? I committed a bug fix a while back: > > Revision 1.34 / (view) - annotate - [select for diffs] , > Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields > Branch: MAIN > CVS Tags: branch-experimental > Branch point for: branch-1-5-2 > Changes since 1.33: +155 -33 lines > Diff to previous 1.33 > > Bug 2026; Robert's enhancements > > To tell the truth I don't know if this is where the mandatory checks > were added in; I'm not too familiar with SeqFeature::Annotation yet. > > I agree with Scott (and Matthew) that SOFA checks should be > optional. Matthew, can you write up a patch and maybe some tests? > > chris > > > > From valiente at lsi.upc.edu Fri Dec 15 19:45:27 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Sat, 16 Dec 2006 01:45:27 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4577EFD3.7090904@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> Message-ID: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> > I don't think that can be true. Your error message contains 'Must > supply > a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). > > If you uninstall the fink installation and install 1.5.2 using cpan > (with root privileges by going sudo cpan) that should at least get > rid of the error messages... > > >> The tree is not correct (I've parsed it from R to have a double >> check) but don't know yet what the problem is with it. > > ... But if the tree is wrong anyway... Let me know what you find out. I've uninstalled the fink installation and used the cvs instead, and the error message is gone. However, on a larger set of 190 species, which are all present in the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, something must be wrong with the merge_lineage method in the major rewrite of the taxonomy2tree script. Can someone please check this? I'm attaching the 190 species call to the script. Thanks, Gabriel -------------- next part -------------- A non-text attachment was scrubbed... Name: fetch-bork.sh Type: application/octet-stream Size: 7378 bytes Desc: not available URL: From lincoln.stein at gmail.com Fri Dec 15 11:02:27 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 15 Dec 2006 11:02:27 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> Message-ID: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> This is very embarassing for me, particularly since I spent a lot of time validating that Bio::Graphics was working properly before the 1.5.2 release went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release? Lincoln On 12/14/06, Lincoln Stein wrote: > > Hi All, > > I'm afraid that the xyplot glyph that is in the recent bioperl release has > an error that causes the content to be printed to the right of the correct > position. Unfortunately this wasn't caught before the release because the > glyph was only tested on very large (whole genome) features. > > You will need to do a CVS update to get a fixed version from bioperl-live. > A future bugfix release of gbrowse will patch this glyph for you > automatically. > > Lincoln > > On 12/12/06, Kara Dolinski wrote: > > > > Hi, > > I'm having a problem getting features and an xyplot properly aligned in > > Gbrowse. For example, see this page: > > > > http://tinyurl.com/ylbq3q > > > > The feature in the "CENPK SNPs" track should actually be around the peak > > of the graph in the "CENPK prediction signal" xyplot ie. the SNP > > feature is at position 79, and the xyplot axes and data should span from > > 61 - 95. However, as you can see, the data in the xyplot are oddly > > separated from the axes (which seem to be in the correct place), with the > > data shifted over to about position 120-155. > > This occurs elsewhere, not just at the ends of the chromosomes. > > > > When I zoom to ~80 bp, all is well, see: > > > > http://tinyurl.com/yzav8k > > > > The relevant snippets from the GFF and the config files are below. > > > > Thanks! > > Kara > > > > GFF: > > > > chrI SNPScanner > > CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score > > is 2.24506 > > chrI SNPScanner > > CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score > > is 3.26837 > > chrI SNPScanner > > CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score > > is 1.39938 > > chrI SNPScanner > > CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score > > is 1.4039 > > chrI SNPScanner > > CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score > > is 9.16134 > > chrI SNPScanner > > CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score > > is 10.1413 > > chrI SNPScanner > > CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score > > is 12.9256 > > chrI SNPScanner > > CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score > > is 13.195 > > chrI SNPScanner > > CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score > > is 22.7127 > > chrI SNPScanner > > CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score > > is 23.8289 > > chrI SNPScanner > > CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score > > is 21.9123 > > chrI SNPScanner > > CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score > > is 28.3344 > > chrI SNPScanner > > CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score > > is 35.0436 > > chrI SNPScanner > > CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score > > is 37.361 > > chrI SNPScanner > > CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score > > is 39.5408 > > chrI SNPScanner > > CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score > > is 28.2008 > > chrI SNPScanner > > CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score > > is 32.6254 > > chrI SNPScanner > > CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score > > is 36.0832 > > chrI SNPScanner > > CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score > > is 32.1205 > > chrI SNPScanner > > CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score > > is 41.3048 > > chrI SNPScanner > > CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score > > is 30.7975 > > chrI SNPScanner > > CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score > > is 29.4282 > > chrI SNPScanner > > CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score > > is 35.3586 > > chrI SNPScanner > > CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score > > is 34.1426 > > chrI SNPScanner > > CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score > > is 30.2966 > > chrI SNPScanner > > CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score > > is 17.8402 > > chrI SNPScanner > > CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score > > is 15.2637 > > chrI SNPScanner > > CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score > > is 12.657 > > chrI SNPScanner > > CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score > > is 10.2033 > > chrI SNPScanner > > CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score > > is 9.40143 > > chrI SNPScanner > > CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score > > is 6.56273 > > chrI SNPScanner > > CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score > > is 3.66211 > > chrI SNPScanner > > CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score > > is 0.394194 > > > > CONFIG: > > > > > > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} > > > > [CENPK_all_scores_graph] > > feature = GRAPH_CENPK:SNPScanner > > glyph = xyplot > > graph_type = boxes > > fgcolor = purple > > bgcolor = purple > > height = 100 > > min_score = 0 > > max_score = 110 > > label = 0 > > key = CENPK prediction signal > > link = > > category = SNPs: signal graphs > > > > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > > your > > opinions on IT & business topics through brief surveys - and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > > > _______________________________________________ > > Gmod-gbrowse mailing list > > Gmod-gbrowse at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > > > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Sat Dec 16 01:10:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 00:10:07 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> Message-ID: <70A5E333-8CF5-49D3-84AC-7A6A02791B5C@uiuc.edu> We could feasibly have regular point releases of the 1.5 dev. series for bug fixes; I guess it just depends on how often these should come out and what critical tests must pass for a release to go forward. Sendu's already done a ton of work towards getting BioPerl switched over to Module::Build and Test::More, and fixing bugs. As Hilmar has pointed out in the past, this is a developer's series, so not every test needs to pass before a release goes out. When would you like this to go out? chris On Dec 15, 2006, at 10:02 AM, Lincoln Stein wrote: > This is very embarassing for me, particularly since I spent a lot > of time > validating that Bio::Graphics was working properly before the 1.5.2 > release > went out. How long before there is a 1.5.3 release? How about a > 1.5.2.1release? > > Lincoln > > On 12/14/06, Lincoln Stein wrote: >> >> Hi All, >> >> I'm afraid that the xyplot glyph that is in the recent bioperl >> release has >> an error that causes the content to be printed to the right of the >> correct >> position. Unfortunately this wasn't caught before the release >> because the >> glyph was only tested on very large (whole genome) features. >> >> You will need to do a CVS update to get a fixed version from >> bioperl-live. >> A future bugfix release of gbrowse will patch this glyph for you >> automatically. >> >> Lincoln >> >> On 12/12/06, Kara Dolinski wrote: >>> >>> Hi, >>> I'm having a problem getting features and an xyplot properly >>> aligned in >>> Gbrowse. For example, see this page: >>> >>> http://tinyurl.com/ylbq3q >>> >>> The feature in the "CENPK SNPs" track should actually be around >>> the peak >>> of the graph in the "CENPK prediction signal" xyplot ie. the SNP >>> feature is at position 79, and the xyplot axes and data should >>> span from >>> 61 - 95. However, as you can see, the data in the xyplot are oddly >>> separated from the axes (which seem to be in the correct place), >>> with the >>> data shifted over to about position 120-155. >>> This occurs elsewhere, not just at the ends of the chromosomes. >>> >>> When I zoom to ~80 bp, all is well, see: >>> >>> http://tinyurl.com/yzav8k >>> >>> The relevant snippets from the GFF and the config files are below. >>> >>> Thanks! >>> Kara >>> >>> GFF: >>> >>> chrI SNPScanner >>> CENPK_GRAPH 61 95 41.9883 . . >>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_CALL 79 79 41.9883 . . >>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_SCORE 61 61 2.24506 . . >>> ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score >>> is 2.24506 >>> chrI SNPScanner >>> CENPK_SCORE 62 62 3.26837 . . >>> ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score >>> is 3.26837 >>> chrI SNPScanner >>> CENPK_SCORE 63 63 1.39938 . . >>> ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score >>> is 1.39938 >>> chrI SNPScanner >>> CENPK_SCORE 64 64 1.4039 . . >>> ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score >>> is 1.4039 >>> chrI SNPScanner >>> CENPK_SCORE 65 65 9.16134 . . >>> ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score >>> is 9.16134 >>> chrI SNPScanner >>> CENPK_SCORE 66 66 10.1413 . . >>> ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score >>> is 10.1413 >>> chrI SNPScanner >>> CENPK_SCORE 67 67 12.9256 . . >>> ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score >>> is 12.9256 >>> chrI SNPScanner >>> CENPK_SCORE 68 68 13.195 . . >>> ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score >>> is 13.195 >>> chrI SNPScanner >>> CENPK_SCORE 69 69 22.7127 . . >>> ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score >>> is 22.7127 >>> chrI SNPScanner >>> CENPK_SCORE 70 70 23.8289 . . >>> ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score >>> is 23.8289 >>> chrI SNPScanner >>> CENPK_SCORE 71 71 21.9123 . . >>> ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score >>> is 21.9123 >>> chrI SNPScanner >>> CENPK_SCORE 72 72 28.3344 . . >>> ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score >>> is 28.3344 >>> chrI SNPScanner >>> CENPK_SCORE 73 73 35.0436 . . >>> ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score >>> is 35.0436 >>> chrI SNPScanner >>> CENPK_SCORE 74 74 37.361 . . >>> ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score >>> is 37.361 >>> chrI SNPScanner >>> CENPK_SCORE 75 75 39.5408 . . >>> ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score >>> is 39.5408 >>> chrI SNPScanner >>> CENPK_SCORE 76 76 28.2008 . . >>> ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score >>> is 28.2008 >>> chrI SNPScanner >>> CENPK_SCORE 77 77 32.6254 . . >>> ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score >>> is 32.6254 >>> chrI SNPScanner >>> CENPK_SCORE 78 78 36.0832 . . >>> ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score >>> is 36.0832 >>> chrI SNPScanner >>> CENPK_SCORE 79 79 41.9883 . . >>> ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_SCORE 80 80 32.1205 . . >>> ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score >>> is 32.1205 >>> chrI SNPScanner >>> CENPK_SCORE 81 81 41.3048 . . >>> ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score >>> is 41.3048 >>> chrI SNPScanner >>> CENPK_SCORE 82 82 30.7975 . . >>> ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score >>> is 30.7975 >>> chrI SNPScanner >>> CENPK_SCORE 83 83 29.4282 . . >>> ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score >>> is 29.4282 >>> chrI SNPScanner >>> CENPK_SCORE 84 84 35.3586 . . >>> ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score >>> is 35.3586 >>> chrI SNPScanner >>> CENPK_SCORE 85 85 34.1426 . . >>> ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score >>> is 34.1426 >>> chrI SNPScanner >>> CENPK_SCORE 86 86 30.2966 . . >>> ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score >>> is 30.2966 >>> chrI SNPScanner >>> CENPK_SCORE 87 87 17.8402 . . >>> ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score >>> is 17.8402 >>> chrI SNPScanner >>> CENPK_SCORE 88 88 15.2637 . . >>> ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score >>> is 15.2637 >>> chrI SNPScanner >>> CENPK_SCORE 89 89 12.657 . . >>> ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score >>> is 12.657 >>> chrI SNPScanner >>> CENPK_SCORE 90 90 10.2033 . . >>> ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score >>> is 10.2033 >>> chrI SNPScanner >>> CENPK_SCORE 91 91 9.40143 . . >>> ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score >>> is 9.40143 >>> chrI SNPScanner >>> CENPK_SCORE 92 92 6.56273 . . >>> ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score >>> is 6.56273 >>> chrI SNPScanner >>> CENPK_SCORE 93 93 3.66211 . . >>> ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score >>> is 3.66211 >>> chrI SNPScanner >>> CENPK_SCORE 94 94 0.394194 . . >>> ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score >>> is 0.394194 >>> >>> CONFIG: >>> >>> >>> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} >>> >>> [CENPK_all_scores_graph] >>> feature = GRAPH_CENPK:SNPScanner >>> glyph = xyplot >>> graph_type = boxes >>> fgcolor = purple >>> bgcolor = purple >>> height = 100 >>> min_score = 0 >>> max_score = 110 >>> label = 0 >>> key = CENPK prediction signal >>> link = >>> category = SNPs: signal graphs >>> >>> >>> >>> -------------------------------------------------------------------- >>> ----- >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to >>> share >>> your >>> opinions on IT & business topics through brief surveys - and earn >>> cash >>> http://www.techsay.com/default.php? >>> page=join.php&p=sourceforge&CID=DEVDEV >>> >>> >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>> >>> >>> >> >> >> -- >> Lincoln D. Stein >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> FOR URGENT MESSAGES & SCHEDULING, >> PLEASE CONTACT MY ASSISTANT, >> SANDRA MICHELSEN, AT michelse at cshl.edu >> > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sat Dec 16 01:28:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 00:28:47 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: >> I don't think that can be true. Your error message contains 'Must >> supply >> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >> >> If you uninstall the fink installation and install 1.5.2 using >> cpan (with root privileges by going sudo cpan) that should at >> least get rid of the error messages... >> >> >>> The tree is not correct (I've parsed it from R to have a double >>> check) but don't know yet what the problem is with it. >> >> ... But if the tree is wrong anyway... Let me know what you find out. > > I've uninstalled the fink installation and used the cvs instead, > and the error message is gone. However, on a larger set of 190 > species, which are all present in the NCBI taxonomy, the resulting > tree has only 178 taxa. I suspect, something must be wrong with the > merge_lineage method in the major rewrite of the taxonomy2tree > script. Can someone please check this? I'm attaching the 190 > species call to the script. Thanks, > > Gabriel I can confirm that. It is definitely dropping them in merge_lineage (); if you add a call to get_leaf_nodes to check how many are present after each merge_lineage() call, you can see it dropping nodes along the trace. in taxonomy2tree.pl: my $ct; my ($treect, $mergect) = 0; for my $name (@species) { my $ncbi_id = $db->get_taxonid($name); if ($ncbi_id) { #print "Species: $name\n\tTaxID: $ncbi_id\n"; #$ids{$ncbi_id}++; my $node = $db->get_taxon(-taxonid => $ncbi_id); if ($tree) { $tree->merge_lineage($node); } else { $tree = Bio::Tree::Tree->new(-node => $node); } printf("%-3d: Nodes: %-4d\n",$ct,scalar($tree->get_leaf_nodes)); } else { warn "no NCBI Taxonomy node for species ",$name,"\n"; } $ct++; } chris From bix at sendu.me.uk Sat Dec 16 09:37:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 14:37:49 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> Message-ID: <458404BD.8030908@sendu.me.uk> Lincoln Stein wrote: > This is very embarassing for me, particularly since I spent a lot of time > validating that Bio::Graphics was working properly before the 1.5.2 release > went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release? I'm happy to try a point release for critical bug fixes. Why don't you commit the necessary fixes to branch-1-5-2 and let me know when you're happy, and I'll do 1.5.2.1. Cheers, Sendu. From bix at sendu.me.uk Sat Dec 16 09:47:57 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 14:47:57 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: <4584071D.3070005@sendu.me.uk> Gabriel Valiente wrote: >> I don't think that can be true. Your error message contains 'Must supply >> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >> >> If you uninstall the fink installation and install 1.5.2 using cpan >> (with root privileges by going sudo cpan) that should at least get rid >> of the error messages... >> >> >>> The tree is not correct (I've parsed it from R to have a double >>> check) but don't know yet what the problem is with it. >> >> ... But if the tree is wrong anyway... Let me know what you find out. > > I've uninstalled the fink installation and used the cvs instead, and the > error message is gone. However, on a larger set of 190 species, which > are all present in the NCBI taxonomy, the resulting tree has only 178 > taxa. I suspect, something must be wrong with the merge_lineage method > in the major rewrite of the taxonomy2tree script. Can someone please > check this? I'm attaching the 190 species call to the script. Thanks, Ok, I'll look into it. You're also welcome to see if you can take your own code from your original taxonomy2tree script and see if you can merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with your algorithms to get it working correctly. Indeed, does your original version of the script work on this data set? Cheers, Sendu. From cjfields at uiuc.edu Sat Dec 16 10:18:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 09:18:50 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4584071D.3070005@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4584071D.3070005@sendu.me.uk> Message-ID: <6AE33842-B2E7-4E9B-B80D-68A058045818@uiuc.edu> On Dec 16, 2006, at 8:47 AM, Sendu Bala wrote: > Gabriel Valiente wrote: >>> I don't think that can be true. Your error message contains 'Must >>> supply >>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >>> >>> If you uninstall the fink installation and install 1.5.2 using cpan >>> (with root privileges by going sudo cpan) that should at least >>> get rid >>> of the error messages... >>> >>> >>>> The tree is not correct (I've parsed it from R to have a double >>>> check) but don't know yet what the problem is with it. >>> >>> ... But if the tree is wrong anyway... Let me know what you find >>> out. >> >> I've uninstalled the fink installation and used the cvs instead, >> and the >> error message is gone. However, on a larger set of 190 species, which >> are all present in the NCBI taxonomy, the resulting tree has only 178 >> taxa. I suspect, something must be wrong with the merge_lineage >> method >> in the major rewrite of the taxonomy2tree script. Can someone please >> check this? I'm attaching the 190 species call to the script. Thanks, > > Ok, I'll look into it. You're also welcome to see if you can take your > own code from your original taxonomy2tree script and see if you can > merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with > your algorithms to get it working correctly. Indeed, does your > original > version of the script work on this data set? > > > Cheers, > Sendu. Sendu, Don't know if it helps, but when I tried Gabriel's shell script last night I ran a modification of taxonomy2tree to see what would pop up. Everything is fine up to about 100 iterations, then merge_lineage () starts dropping leaf nodes. chris From bix at sendu.me.uk Sat Dec 16 10:33:35 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 15:33:35 +0000 Subject: [Bioperl-l] NO BLAST In-Reply-To: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> References: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> Message-ID: <458411CF.8000707@sendu.me.uk> Luba Pardo wrote: > *Hello,* > *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;* > ** > *I got the following error message: cannot find path to blastall.* > *The code I used is (modified from HOWTObeginners): Bioperl doesn't know where you installed blast. If you've actually installed it, you can set the environment variable BLASTDIR to point to the directory that contains the blastall executable. From cain.cshl at gmail.com Fri Dec 15 13:09:48 2006 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 15 Dec 2006 13:09:48 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> Message-ID: <1166206188.2569.380.camel@localhost.localdomain> On Fri, 2006-12-15 at 11:49 -0600, Chris Fields wrote: > > To tell the truth I don't know if this is where the mandatory checks > were added in; I'm not too familiar with SeqFeature::Annotation yet. > > I agree with Scott (and Matthew) that SOFA checks should be > optional. Matthew, can you write up a patch and maybe some tests? > > chris > That's not where they were added in, it just that they hadn't been fully implemented before then, so they didn't work (which probably meant they weren't mandatory, though I don't remember (it could be that it just croaked)). Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From hlapp at gmx.net Sun Dec 17 01:02:04 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 17 Dec 2006 01:02:04 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <458404BD.8030908@sendu.me.uk> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> Message-ID: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: > Lincoln Stein wrote: >> This is very embarassing for me, particularly since I spent a lot >> of time >> validating that Bio::Graphics was working properly before the >> 1.5.2 release >> went out. How long before there is a 1.5.3 release? How about a >> 1.5.2.1release? > > I'm happy to try a point release for critical bug fixes. Why don't you > commit the necessary fixes to branch-1-5-2 and let me know when you're > happy, and I'll do 1.5.2.1. Feel free to do that, but why not make a 1.5.3 off the main trunk? 1.5.2.1 may be adding more to the version confusion (developer/stable/ point-release/etc) than it is worth, and there is no shame in releasing new developer versions every few weeks. My $0.02 ... -hilmar > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From fgarret at ub.edu Mon Dec 18 07:07:02 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 13:07:02 +0100 Subject: [Bioperl-l] codeml Message-ID: <45868466.508@ub.edu> Hi all, I've been using bioperl's PAML module (specifically the codeml part) but with just one tree. Since the program accepts several trees as input (and runs the analysis for each tree outputting the difference in likelihoods for each one) I was wondering if there's some way to do it through bioperl? thanks in adv, FG From heikki at sanbi.ac.za Mon Dec 18 08:51:50 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 18 Dec 2006 15:51:50 +0200 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> Message-ID: <200612181551.51277.heikki@sanbi.ac.za> Reading the discussion, I think it is time to draw some guidelines. 1. Base the Meta implementation to a real use cases. MSA is a good example. 2. Allow generalisations If you can see an other implementation of the same idea that can be merged with the first do it but do not hurt yourself if you can not. The most difficult question is how to separate case-specific attributes that are best implemented by subclassing with additional methods from truly widely variable meta data that is best done as a parallel track meta information holding class. The problem I see with undefined, totally open meta annotation, is that if you can put anything in there, it is also totally confusing to a user. If you can put anything in, how do you know what to get get out and know that it is there? That leads to the the third guideline: 3. Use separate meta classes only when there are several different ways of encoding data that is present in large numbers *and* when you are expecting to be assessing the data computationally rather than just checking if an attribute is there. -Heikki On Friday 15 December 2006 19:23, Chris Fields wrote: > On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: > > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > >> On Dec 14, 2006, at 7:45 PM, David Messina wrote: > >>> Hey Chris, > >>> > >>> My thoughts below. > >>> > >>>> [Chris] > >>>> This could be used to annotate any > >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- > >>>> you, > >>>> maybe in a collection (similar to AnnotationCollection). I thought > >>>> something like this may be of general use for any PrimarySeq > >>>> (quality, structure), alignments like NEXUS and Stockholm, > >>>> SeqFeatures where structure could be stored (tRNA or riboswitches), > >>>> etc. > >>>> > >>>> However, this also seems to fall into the category of sequence > >>>> annotation. So, would it be better to have a set of > >>>> Bio::Annotation > >>>> classes used for this purpose? > >>> > >>> To me, all meta data is equal. That is, your classic Genbank feature > >>> annotation and a user's arbitrary meta-tag like "Bob thinks this > >>> is a > >>> kinase domain" aren't different in kind even if they are > >>> different in > >>> content. > >>> > >>> As resequencing projects multiply, the ability to create arbitrary > >>> meta tags, attach them to different types of objects, and use those > >>> tags to link them together will become desirable, if not essential. > >>> > >>> Keeping a common interface to all of these meta data types would be > >>> advantageous, plus new users won't have to determine whether they > >>> need to use Bio::Meta objects or Bio::Annotation objects. > >>> > >>> So I would argue for all of the meta data types to live "under one > >>> roof". Which roof isn't as important. Bio::Annotation, since it > >>> already exists for today's meta data, seems like a reasonable > >>> choice. > >>> (assuming Annotation objects are flexible enough to be extended as > >>> you propose) > >>> > >>> There, and no flames or jibes even. :) > >> > >> I guess what I want to know is whether there should to be a > >> distinction between 'normal' sequence annotation (comments, > >> references, and so on) and annotation that could be best described as > >> position-specific (like RNA or protein structural annotation). The > >> current meta implementation is for sequence data only; I felt it > >> would be nice to have a generic implementation that would be > >> applicable to any object data. > > > > my stream-of-consciousness for right now: > > > > I was thinking Bio::Annotation is where this should go - that > > system doesn't have anything about it that makes it explicitly > > sequence related. What we're trying to hammer out here on the > > Alignment side - which fits with your RNA example - is have > > features, basically SeqFeatures - associated with alignments so > > columns can be annotated to cover things like character sets and > > partitions for phylogenetic analyses. As for data which annotates > > non-contiguous things like RNAstems we may have to be more > > creative about that or model it with a splitLocation. > > > > So currently we've added code so that an Alignment is-a > > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this > > end, with the goal of being able to capture more of the data that > > can be represented in a NEXUS file. > > > > It feels more like a hack than an elegant Meta-data solution, but I > > am totally sure whether the data you are thinking about doing at > > this point, perhaps I need to spend more time thinking about it. > > Or are you worried about the idea of whether the semantic mapping > > of the data into features or annotations is confusing users? > > Sorry in advance for the longish response here... > > My original thought was to have a generic abstract class capable of > positionally describing data in any another class, similar to > Heikki's Bio::Seq::MetaI but not constrained to sequence data only. > Implementing classes would be capable of having different data > structures based on their use (simple string, array, AoA, AoH, AoO). > One MetaCollection class to contain them all in a tag-like system, so > you could have mixed data types describe the same object. The latter > Collection class is so similar to AnnotationCollection that I agree > Bio::Annotation would be the best place for this. > > The way I reconfigured Stockholm alignment parsing/writing is to use > Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is > capable of holding a sequence and several meta strings, stored as > tags or 'names'. However, there is no Meta object for alignments > (for RNA/protein structure consensus and other Rfam/Pfam markup); I > hacked around this by using a Bio::Seq::Meta w/o a seq, but I would > rather have a generic Meta object independent of the sequence cruft. > > So for this partial Pfam alignment, > > Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG > #=GR Q92SV1_RHIME/122-299 pAS ......................... > Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS > Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG > #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT > #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 > #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT > #=GC SA_cons 03002200312...1312414..676 > #=GC seq_cons luhhLuhsRpl...hthppth..+pG > // > > '#=GC' lines would be in generic meta string objects in the > alignment, while '#=GR' tags would be in similar meta objects in the > relevant sequences. As long as both aren't AnnotatableI this isn't > an issue. > > Similarly, NEXUS files which contained any position-based values > could hold a meta string/array object in a similar tag. > > The basic scheme is: > |--String > > Annotation::Meta----|--Array > > |--HorriblyComplexDataStruct > > Then I started thinking about where this could be applied, and > whether a true Meta object needs to be constrained only to describing > position-based data. This somewhat relates to this bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=1825 > > which seems to need a simple but unconstrained hash-of-arrays-based > meta object. > > Then my head appropriately exploded... > > Hope everything is going well at the hackathon! Looks like some > interesting stuff coming out of it. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From fgarret at ub.edu Mon Dec 18 11:18:31 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 17:18:31 +0100 Subject: [Bioperl-l] PAML files Message-ID: <4586BF57.4090002@ub.edu> Hi all, does anyone knows how to get the name of the .ctl file created by the PAML module? Inside the tmp directory there are 2 files with random names (tree and ctl). Why do they have random names?? Wouldn't it be easier to assign them a fixed name?? For instance "codeml.ctl" and "tree.nwk"?? thanks in adv, FG From bix at sendu.me.uk Mon Dec 18 11:15:21 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 16:15:21 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> Message-ID: <4586BE99.7020308@sendu.me.uk> Hilmar Lapp wrote: > > On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: > >> Lincoln Stein wrote: >>> This is very embarassing for me, particularly since I spent a lot >>> of time validating that Bio::Graphics was working properly before >>> the 1.5.2 release went out. How long before there is a 1.5.3 >>> release? How about a 1.5.2.1release? >> >> I'm happy to try a point release for critical bug fixes. Why don't >> you commit the necessary fixes to branch-1-5-2 and let me know when >> you're happy, and I'll do 1.5.2.1. > > Feel free to do that, but why not make a 1.5.3 off the main trunk? > 1.5.2.1 may be adding more to the version confusion > (developer/stable/point-release/etc) than it is worth, My feeling is that 1.5.3 should be reserved for some significant changes and new features, and not just a few bug fixes. I'd say this causes less confusion amongst users - they can associate '1.5.2' with a certain API and feature-set, and the specific name of the file they download and install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't matter at all to them. I also won't have to make some major announcement about it; it will simply be the most recent developer version of bioperl available so new users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing 1.5.2 users will only feel compelled to get it if they suffer from the bugs fixed. > and there is no shame in releasing new developer versions every few > weeks. I think doing frequent releases are inadvisable; such a quick release won't have had much testing so we shouldn't encourage people to install it: encouragement is implicit when a major new version comes out like 1.5.3. People who want to live on the edge can and should be using a CVS checkout. From bix at sendu.me.uk Mon Dec 18 14:15:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 19:15:16 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: <4586E8C4.6030306@sendu.me.uk> Chris Fields wrote: > On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: > >> However, on a larger set of 190 species, which are all present in >> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, >> something must be wrong with the merge_lineage method in the major >> rewrite of the taxonomy2tree script. Can someone please check this? >> I'm attaching the 190 species call to the script. Thanks, >> >> Gabriel > > I can confirm that. It is definitely dropping them in merge_lineage > (); if you add a call to get_leaf_nodes to check how many are > present after each merge_lineage() call, you can see it dropping > nodes along the trace. I confirm the 'dropped' nodes, but also claim that this is no bug. For example, the first 'drop' happens for the 101st species which is 'Leptospira interrogans serovar Copenhageni'. This is a variation (descendant) of species 24: 'Leptospira interrogans'. So when the variation is added it becomes a leaf and 'Leptospira interrogans' is no longer a leaf, so the overall number of leaves does not increase. The next drop is for species 103 'Prochlorococcus marinus subsp. pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'. Same deal. I didn't check any others, but suspect the same issue arises in all cases. Gabriel, please confirm this isn't a bug, or suggest how you propose to see your taxa when they are not all leaves of the tree. PS. I changed the merge_lineage() algorithm to be 18x faster (from the absurd 3mins for making the 190 species tree to a more reasonable 10s), without changing the tree produced. From fgarret at ub.edu Mon Dec 18 15:01:38 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 21:01:38 +0100 Subject: [Bioperl-l] PAML files In-Reply-To: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> References: <4586BF57.4090002@ub.edu> <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> Message-ID: <4586F3A2.4010607@ub.edu> Hi Jason, This question is related with the one I made previously today. I need to run codeml with 3 tree topologies. I looked on codeml module but it only accepts one tree as input so I thought of using the codeml module to prepare all the files and then I would just have to run the codeml with the new tree file in batch. But for that I need to know which one is the ctl file. any idea? FG Jason Stajich wrote: > They are temporary names so they are deliberately random and there is no > intention of you needing them after a run since it to be cleaned up on > the fly. We use an internal method for generating tempfiles that takes > care of cleanup afterwards. I suppose since we do all the work within a > temp directory that is cleaned up, one could have a fixed name for the > tree, alignment, and ctl files but honestly we never expect people to be > reading these filenames as they are intended to be transient. > > What problem are you having that you need the filename? > > -jason > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > >> Hi all, >> >> does anyone knows how to get the name of the .ctl file created by the >> PAML module? Inside the tmp directory there are 2 files with random >> names (tree and ctl). Why do they have random names?? Wouldn't it be >> easier to assign them a fixed name?? For instance "codeml.ctl" and >> "tree.nwk"?? >> >> thanks in adv, >> FG >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > From fgarret at ub.edu Mon Dec 18 15:07:46 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 21:07:46 +0100 Subject: [Bioperl-l] codeml In-Reply-To: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> References: <45868466.508@ub.edu> <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> Message-ID: <4586F512.1030209@ub.edu> Right now it's impossible for me to write it. By February or March I should have more time but I'll let you know. FG Jason Stajich wrote: > This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I > guess we'll need to allow the -tree option to accept and arrayref of trees. > Are you willing to try write this patch? It should be added as a > bug/feature request to bugzilla so it can be corrected in short order. > > -jason > On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote: > >> Hi all, >> >> I've been using bioperl's PAML module (specifically the codeml part) but >> with just one tree. >> >> Since the program accepts several trees as input (and runs the analysis >> for each tree outputting the difference in likelihoods for each one) I >> was wondering if there's some way to do it through bioperl? >> >> thanks in adv, >> FG >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > > From cjfields at uiuc.edu Mon Dec 18 15:55:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 14:55:55 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4586E8C4.6030306@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> Message-ID: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: >> >>> However, on a larger set of 190 species, which are all present in >>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, >>> something must be wrong with the merge_lineage method in the major >>> rewrite of the taxonomy2tree script. Can someone please check this? >>> I'm attaching the 190 species call to the script. Thanks, >>> >>> Gabriel >> >> I can confirm that. It is definitely dropping them in merge_lineage >> (); if you add a call to get_leaf_nodes to check how many are >> present after each merge_lineage() call, you can see it dropping >> nodes along the trace. > > I confirm the 'dropped' nodes, but also claim that this is no bug. > > For example, the first 'drop' happens for the 101st species which is > 'Leptospira interrogans serovar Copenhageni'. This is a variation > (descendant) of species 24: 'Leptospira interrogans'. So when the > variation is added it becomes a leaf and 'Leptospira interrogans' > is no > longer a leaf, so the overall number of leaves does not increase. > > The next drop is for species 103 'Prochlorococcus marinus subsp. > pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'. > Same deal. I didn't check any others, but suspect the same issue > arises > in all cases. Makes sense now. I personally would consider this a bug since the results are unexpected (so the docs need to be modified in order to clarify). Some say tomato... I suppose this is one of the issues one might run into when using NCBI taxonomy to build trees. > Gabriel, please confirm this isn't a bug, or suggest how you > propose to > see your taxa when they are not all leaves of the tree. Having the nodes appear internally seems semantically correct to me. Is there any other way? > PS. I changed the merge_lineage() algorithm to be 18x faster (from the > absurd 3mins for making the 190 species tree to a more reasonable > 10s), > without changing the tree produced. Definitely an improvement! chris From jason at bioperl.org Mon Dec 18 14:33:32 2006 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Dec 2006 14:33:32 -0500 Subject: [Bioperl-l] PAML files In-Reply-To: <4586BF57.4090002@ub.edu> References: <4586BF57.4090002@ub.edu> Message-ID: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> They are temporary names so they are deliberately random and there is no intention of you needing them after a run since it to be cleaned up on the fly. We use an internal method for generating tempfiles that takes care of cleanup afterwards. I suppose since we do all the work within a temp directory that is cleaned up, one could have a fixed name for the tree, alignment, and ctl files but honestly we never expect people to be reading these filenames as they are intended to be transient. What problem are you having that you need the filename? -jason On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > Hi all, > > does anyone knows how to get the name of the .ctl file created by the > PAML module? Inside the tmp directory there are 2 files with random > names (tree and ctl). Why do they have random names?? Wouldn't it be > easier to assign them a fixed name?? For instance "codeml.ctl" and > "tree.nwk"?? > > thanks in adv, > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjm at fruitfly.org Mon Dec 18 16:50:00 2006 From: cjm at fruitfly.org (Chris Mungall) Date: Mon, 18 Dec 2006 13:50:00 -0800 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> Message-ID: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> I agree with everything Heikki is saying, I just wanted to highlight one paragraph: > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? One solution is to give your annotation/metadata-model formal computational semantics and use ontologies to give additional semantics to your metadata tags. This provides both user information in the form of documentation, and a means of specifying to the computer exactly what should be done with the tags. This is probably overkill for bioperl; but if the use cases being proposed do lean in the direction of a new metadata system that is not necessarily backwards compatible with the existing one, then I'd recommend checking out what's already out there before re-inventing the wheel. Perl RDF libraries are getting a little better. If anyone is interested in pursuing this sort of thing (probably on a branch), let me know On Dec 18, 2006, at 5:51 AM, Heikki Lehvaslaiho wrote: > > Reading the discussion, I think it is time to draw some guidelines. > > 1. Base the Meta implementation to a real use cases. > > MSA is a good example. > > 2. Allow generalisations > > If you can see an other implementation of the same idea that can > be merged > with the first do it but do not hurt yourself if you can not. > > > The most difficult question is how to separate case-specific > attributes that > are best implemented by subclassing with additional methods from > truly widely > variable meta data that is best done as a parallel track meta > information > holding class. > > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? > > That leads to the the third guideline: > > 3. Use separate meta classes only when there are several different > ways of > encoding data that is present in large numbers *and* when you are > expecting > to be assessing the data computationally rather than just checking > if an > attribute is there. > > > -Heikki > > > > On Friday 15 December 2006 19:23, Chris Fields wrote: >> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: >>> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: >>>> On Dec 14, 2006, at 7:45 PM, David Messina wrote: >>>>> Hey Chris, >>>>> >>>>> My thoughts below. >>>>> >>>>>> [Chris] >>>>>> This could be used to annotate any >>>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- >>>>>> you, >>>>>> maybe in a collection (similar to AnnotationCollection). I >>>>>> thought >>>>>> something like this may be of general use for any PrimarySeq >>>>>> (quality, structure), alignments like NEXUS and Stockholm, >>>>>> SeqFeatures where structure could be stored (tRNA or >>>>>> riboswitches), >>>>>> etc. >>>>>> >>>>>> However, this also seems to fall into the category of sequence >>>>>> annotation. So, would it be better to have a set of >>>>>> Bio::Annotation >>>>>> classes used for this purpose? >>>>> >>>>> To me, all meta data is equal. That is, your classic Genbank >>>>> feature >>>>> annotation and a user's arbitrary meta-tag like "Bob thinks this >>>>> is a >>>>> kinase domain" aren't different in kind even if they are >>>>> different in >>>>> content. >>>>> >>>>> As resequencing projects multiply, the ability to create arbitrary >>>>> meta tags, attach them to different types of objects, and use >>>>> those >>>>> tags to link them together will become desirable, if not >>>>> essential. >>>>> >>>>> Keeping a common interface to all of these meta data types >>>>> would be >>>>> advantageous, plus new users won't have to determine whether they >>>>> need to use Bio::Meta objects or Bio::Annotation objects. >>>>> >>>>> So I would argue for all of the meta data types to live "under one >>>>> roof". Which roof isn't as important. Bio::Annotation, since it >>>>> already exists for today's meta data, seems like a reasonable >>>>> choice. >>>>> (assuming Annotation objects are flexible enough to be extended as >>>>> you propose) >>>>> >>>>> There, and no flames or jibes even. :) >>>> >>>> I guess what I want to know is whether there should to be a >>>> distinction between 'normal' sequence annotation (comments, >>>> references, and so on) and annotation that could be best >>>> described as >>>> position-specific (like RNA or protein structural annotation). The >>>> current meta implementation is for sequence data only; I felt it >>>> would be nice to have a generic implementation that would be >>>> applicable to any object data. >>> >>> my stream-of-consciousness for right now: >>> >>> I was thinking Bio::Annotation is where this should go - that >>> system doesn't have anything about it that makes it explicitly >>> sequence related. What we're trying to hammer out here on the >>> Alignment side - which fits with your RNA example - is have >>> features, basically SeqFeatures - associated with alignments so >>> columns can be annotated to cover things like character sets and >>> partitions for phylogenetic analyses. As for data which annotates >>> non-contiguous things like RNAstems we may have to be more >>> creative about that or model it with a splitLocation. >>> >>> So currently we've added code so that an Alignment is-a >>> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this >>> end, with the goal of being able to capture more of the data that >>> can be represented in a NEXUS file. >>> >>> It feels more like a hack than an elegant Meta-data solution, but I >>> am totally sure whether the data you are thinking about doing at >>> this point, perhaps I need to spend more time thinking about it. >>> Or are you worried about the idea of whether the semantic mapping >>> of the data into features or annotations is confusing users? >> >> Sorry in advance for the longish response here... >> >> My original thought was to have a generic abstract class capable of >> positionally describing data in any another class, similar to >> Heikki's Bio::Seq::MetaI but not constrained to sequence data only. >> Implementing classes would be capable of having different data >> structures based on their use (simple string, array, AoA, AoH, AoO). >> One MetaCollection class to contain them all in a tag-like system, so >> you could have mixed data types describe the same object. The latter >> Collection class is so similar to AnnotationCollection that I agree >> Bio::Annotation would be the best place for this. >> >> The way I reconfigured Stockholm alignment parsing/writing is to use >> Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is >> capable of holding a sequence and several meta strings, stored as >> tags or 'names'. However, there is no Meta object for alignments >> (for RNA/protein structure consensus and other Rfam/Pfam markup); I >> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would >> rather have a generic Meta object independent of the sequence cruft. >> >> So for this partial Pfam alignment, >> >> Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG >> #=GR Q92SV1_RHIME/122-299 pAS ......................... >> Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS >> Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG >> #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT >> #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 >> #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT >> #=GC SA_cons 03002200312...1312414..676 >> #=GC seq_cons luhhLuhsRpl...hthppth..+pG >> // >> >> '#=GC' lines would be in generic meta string objects in the >> alignment, while '#=GR' tags would be in similar meta objects in the >> relevant sequences. As long as both aren't AnnotatableI this isn't >> an issue. >> >> Similarly, NEXUS files which contained any position-based values >> could hold a meta string/array object in a similar tag. >> >> The basic scheme is: >> |--String >> >> Annotation::Meta----|--Array >> >> |--HorriblyComplexDataStruct >> >> Then I started thinking about where this could be applied, and >> whether a true Meta object needs to be constrained only to describing >> position-based data. This somewhat relates to this bug: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1825 >> >> which seems to need a simple but unconstrained hash-of-arrays-based >> meta object. >> >> Then my head appropriately exploded... >> >> Hope everything is going well at the hackathon! Looks like some >> interesting stuff coming out of it. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Dec 18 14:35:50 2006 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Dec 2006 14:35:50 -0500 Subject: [Bioperl-l] codeml In-Reply-To: <45868466.508@ub.edu> References: <45868466.508@ub.edu> Message-ID: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I guess we'll need to allow the -tree option to accept and arrayref of trees. Are you willing to try write this patch? It should be added as a bug/ feature request to bugzilla so it can be corrected in short order. -jason On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote: > Hi all, > > I've been using bioperl's PAML module (specifically the codeml > part) but > with just one tree. > > Since the program accepts several trees as input (and runs the > analysis > for each tree outputting the difference in likelihoods for each one) I > was wondering if there's some way to do it through bioperl? > > thanks in adv, > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From gowthaman.ramasamy at sbri.org Mon Dec 18 17:19:09 2006 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Mon, 18 Dec 2006 14:19:09 -0800 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence Message-ID: Hi List, Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) Many thanks in advance, gowtham From cjfields at uiuc.edu Mon Dec 18 17:33:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 16:33:34 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> Message-ID: On Dec 18, 2006, at 7:51 AM, Heikki Lehvaslaiho wrote: > > Reading the discussion, I think it is time to draw some guidelines. > > 1. Base the Meta implementation to a real use cases. > > MSA is a good example. AlignIO::stockholm is where I'll initially test it out. > 2. Allow generalisations > > If you can see an other implementation of the same idea that can > be merged > with the first do it but do not hurt yourself if you can not. I agree. > The most difficult question is how to separate case-specific > attributes that > are best implemented by subclassing with additional methods from > truly widely > variable meta data that is best done as a parallel track meta > information > holding class. I would probably start with a general Bio::Annotation::MetaI abstract class, which supplements AnnotationI with general meta-specific methods (meta, meta_text, named_meta, etc)? Implement this in whatever way one wanted (RNA structure as strings, quality data as arrays, etc) under the constraints of the interface description. Multiple meta objects, potentially of mixed data types, could be added in an AnnotationCollection along with other Bio::Annotation data, or stored in a nested meta-specific AnnotationCollection object (I favor the former as it's simpler). So you could have an alignment, sequence, seqfeature (anything that is AnnotatableI) with a regular AnnotationCollection also containing possibly multiple meta objects, each meta object also containing possibly more than one set of meta data. The key issue I have is whether or not to constrain these to describing positional data, similar to Bio::Seq::Meta, by ensuring that the data is_flush(), etc. My current inclination is 'no', and to have a separate abstract class which describes these methods, implementing those separately. > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? > > That leads to the the third guideline: > > 3. Use separate meta classes only when there are several different > ways of > encoding data that is present in large numbers *and* when you are > expecting > to be assessing the data computationally rather than just checking > if an > attribute is there. > > > -Heikki The initial use case for this would be simple data strings for alignment data. I already have a partial implementation in place for stockholm using Bio::Seq::Meta (which led me to this proposal!). I like Chris M.'s idea of ensuring that meta implementations use some sort of formalized ontology, but I'll probably start out very simple and work up from there. chris From cjfields at uiuc.edu Mon Dec 18 17:38:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 16:38:14 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <4586BE99.7020308@sendu.me.uk> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> <4586BE99.7020308@sendu.me.uk> Message-ID: <6AD475AE-7F5E-4612-BC24-73B65AA47F30@uiuc.edu> On Dec 18, 2006, at 10:15 AM, Sendu Bala wrote: > Hilmar Lapp wrote: >> >> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: >> >>> Lincoln Stein wrote: >>>> This is very embarassing for me, particularly since I spent a lot >>>> of time validating that Bio::Graphics was working properly before >>>> the 1.5.2 release went out. How long before there is a 1.5.3 >>>> release? How about a 1.5.2.1release? >>> >>> I'm happy to try a point release for critical bug fixes. Why don't >>> you commit the necessary fixes to branch-1-5-2 and let me know when >>> you're happy, and I'll do 1.5.2.1. >> >> Feel free to do that, but why not make a 1.5.3 off the main trunk? >> 1.5.2.1 may be adding more to the version confusion >> (developer/stable/point-release/etc) than it is worth, > > My feeling is that 1.5.3 should be reserved for some significant > changes > and new features, and not just a few bug fixes. I'd say this causes > less > confusion amongst users - they can associate '1.5.2' with a certain > API > and feature-set, and the specific name of the file they download and > install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't > matter at all to them. > > I also won't have to make some major announcement about it; it will > simply be the most recent developer version of bioperl available so > new > users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing > 1.5.2 users will only feel compelled to get it if they suffer from the > bugs fixed. > > >> and there is no shame in releasing new developer versions every few >> weeks. > > I think doing frequent releases are inadvisable; such a quick release > won't have had much testing so we shouldn't encourage people to > install > it: encouragement is implicit when a major new version comes out like > 1.5.3. People who want to live on the edge can and should be using a > CVS checkout. I thought that 1.5.2 was considered a point release for the 1.5 dev series, for bug fixes along with the potential for added/experimental features. Similarly, 1.6.x releases would be point releases for bug fixes only with all tests passing (no added features since it is a stable release series). I guess one could reason that 1.5.x releases have both bug fixes and new features, while 1.5.x.y releases are simply bug fixes for the 1.5.x branch (no new features). We probably should add something to the FAQ and maybe make a few changes to the 1.5.2 wiki page. I think having a 1.5.2.1 release is feasible as a quick one-off to get Lincoln's fixes in, since you would make them off the 1.5.2 branch anyway (so I guess it could be considered a bug release from that branch). It's probably not something we should make a habit of, but then again I'm not the Pumpkin! chris From bix at sendu.me.uk Mon Dec 18 17:50:11 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 22:50:11 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> Message-ID: <45871B23.8070103@sendu.me.uk> Chris Fields wrote: > > On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: > >> For example, the first 'drop' happens for the 101st species which is >> 'Leptospira interrogans serovar Copenhageni'. This is a variation >> (descendant) of species 24: 'Leptospira interrogans'. So when the >> variation is added it becomes a leaf and 'Leptospira interrogans' is no >> longer a leaf, so the overall number of leaves does not increase. > > Makes sense now. I personally would consider this a bug since the > results are unexpected (so the docs need to be modified in order to > clarify). Some say tomato... > > I suppose this is one of the issues one might run into when using NCBI > taxonomy to build trees. No, the tree produced is perfectly fine. The taxonomy2tree.pl script deliberately then does: # simple paths are contracted by removing degree one nodes $tree->contract_linear_paths; Because that is what Gabriel's script originally did. >> Gabriel, please confirm this isn't a bug, or suggest how you propose to >> see your taxa when they are not all leaves of the tree. > > Having the nodes appear internally seems semantically correct to me. Is > there any other way? I suppose if we want to see all the input species output again we have to make contract_linear_paths() aware of nodes we want to keep, even when they are degree one nodes. Gabriel, is that what you want to see? From cjfields at uiuc.edu Mon Dec 18 18:14:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 17:14:23 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <45871B23.8070103@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> Message-ID: On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: >>> For example, the first 'drop' happens for the 101st species which is >>> 'Leptospira interrogans serovar Copenhageni'. This is a variation >>> (descendant) of species 24: 'Leptospira interrogans'. So when the >>> variation is added it becomes a leaf and 'Leptospira interrogans' >>> is no >>> longer a leaf, so the overall number of leaves does not increase. >> >> Makes sense now. I personally would consider this a bug since the >> results are unexpected (so the docs need to be modified in order >> to clarify). Some say tomato... >> I suppose this is one of the issues one might run into when using >> NCBI taxonomy to build trees. > > No, the tree produced is perfectly fine. The taxonomy2tree.pl > script deliberately then does: > > # simple paths are contracted by removing degree one nodes > $tree->contract_linear_paths; > > Because that is what Gabriel's script originally did. I think you misunderstood me. The tree is fine; the data used to make the tree (NCBI taxonomy) is the issue. One of the clear caveats that NCBI attaches to their taxonomy data is that should not be the 'primary source for taxonomic or phylogenetic information': http://tinyurl.com/y3k624 I think it works as a good guide as long as one takes the above into consideration. That and the fact that not all taxids attached to sequence data will represent leaf nodes. chris From cjfields at uiuc.edu Mon Dec 18 18:15:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 17:15:56 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> Message-ID: <16D6DB51-C2CB-4E89-A597-4672FAA6681B@uiuc.edu> On Dec 18, 2006, at 3:50 PM, Chris Mungall wrote: > > I agree with everything Heikki is saying, I just wanted to highlight > one paragraph: > >> The problem I see with undefined, totally open meta annotation, is >> that if you >> can put anything in there, it is also totally confusing to a user. >> If you can >> put anything in, how do you know what to get get out and know that >> it is >> there? > > One solution is to give your annotation/metadata-model formal > computational semantics and use ontologies to give additional > semantics to your metadata tags. This provides both user information > in the form of documentation, and a means of specifying to the > computer exactly what should be done with the tags. > > This is probably overkill for bioperl; but if the use cases being > proposed do lean in the direction of a new metadata system that is > not necessarily backwards compatible with the existing one, then I'd > recommend checking out what's already out there before re-inventing > the wheel. Perl RDF libraries are getting a little better. > > If anyone is interested in pursuing this sort of thing (probably on a > branch), let me know ... I like the idea of of using ontologies (although that's one of my many weak points!). I'll likely start off with simple examples using meta data initially, then progress from there. It is a developer series, after all! Thanks everybody! I think I have an idea on how to at least get started. chris From bix at sendu.me.uk Mon Dec 18 18:27:15 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 23:27:15 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> Message-ID: <458723D3.4010908@sendu.me.uk> Chris Fields wrote: > > On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: >>>> For example, the first 'drop' happens for the 101st species which is >>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation >>>> (descendant) of species 24: 'Leptospira interrogans'. So when the >>>> variation is added it becomes a leaf and 'Leptospira interrogans' is no >>>> longer a leaf, so the overall number of leaves does not increase. >>> >>> Makes sense now. I personally would consider this a bug since the >>> results are unexpected (so the docs need to be modified in order to >>> clarify). Some say tomato... >>> I suppose this is one of the issues one might run into when using >>> NCBI taxonomy to build trees. >> >> No, the tree produced is perfectly fine. The taxonomy2tree.pl script >> deliberately then does: >> >> # simple paths are contracted by removing degree one nodes >> $tree->contract_linear_paths; >> >> Because that is what Gabriel's script originally did. > > I think you misunderstood me. The tree is fine; the data used to make > the tree (NCBI taxonomy) is the issue. In what way is it the issue? The database is also fine as far as I can see, in so far as it is not causing any problems in this instance. Gabriel asked for a tree featuring a species and its subspecies. The NCBI taxonomy database provided Bioperl the correct data to build such a tree. Then Gabriel asked to remove the degree one nodes of his tree. His problem was that doing that happened to (correctly) remove the species node. If he wants to see both his species and his subspecies he must either not remove degree one nodes, or alter the method of doing so to keep desired nodes. There is no possible way for NCBI to improve matters here. From bix at sendu.me.uk Mon Dec 18 18:45:59 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 23:45:59 +0000 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <45872837.6050403@sendu.me.uk> Gowthaman Ramasamy wrote: > Hi List, Is there any module in bioperl which can find out the primer > binding sites in a genomic sequence. I am interested in finding > locations with few mismatches along the primer...not just the exact > match (which is a very trivial task) There's no module dedicated to that task, but Bioperl may help you to answer the question. Probably the easiest/reliable/clear thing to do is to do a Blast with appropriate settings for short sequence with few mismatches. You can write a script to only consider hits for your forward primer that are a 'primable' distance from a hit to your reverse primer (and check their orientations are correct as well). Or use some e-pcr tool. From Kevin.M.Brown at asu.edu Mon Dec 18 18:52:20 2006 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 18 Dec 2006 16:52:20 -0700 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence Message-ID: <1A4207F8295607498283FE9E93B775B40270F3BB@EX02.asurite.ad.asu.edu> A function I use to find the first landing site for a primer. Should be modifiable to handle multiple occurences: =head1 C Match searches for a near alignment between two strings and returns the position at which the two strings align. Match is based on 80% conformation match($this, $in_that) =cut sub match { my ($primer, $gene) = @_; my $start = 0; my $pattern = ""; for (my $i = 0 ; $i < length($primer) ; $i++) { $pattern .= substr($primer, $i, 1); pos($gene) = 0; if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($pattern) + 1; } else { $start = 0; chop($pattern); $pattern .= '.'; } } if ($pattern =~ /\.$/) { if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($pattern) + 1; } } $pattern =~ s/\.//g; if ((length($pattern) / length($primer)) > .8) { #print $start . "\n"; return $start; } return 0; } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, December 18, 2006 4:46 PM > To: Gowthaman Ramasamy > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] module to find out primer binding > sites in a genome sequence > > Gowthaman Ramasamy wrote: > > Hi List, Is there any module in bioperl which can find out > the primer > > binding sites in a genomic sequence. I am interested in finding > > locations with few mismatches along the primer...not just the exact > > match (which is a very trivial task) > > There's no module dedicated to that task, but Bioperl may help you to > answer the question. > > Probably the easiest/reliable/clear thing to do is to do a Blast with > appropriate settings for short sequence with few mismatches. You can > write a script to only consider hits for your forward primer > that are a > 'primable' distance from a hit to your reverse primer (and check their > orientations are correct as well). > > Or use some e-pcr tool. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From torsten.seemann at infotech.monash.edu.au Mon Dec 18 18:52:58 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 19 Dec 2006 10:52:58 +1100 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <458729DA.9030909@infotech.monash.edu.au> Gowthaman Ramasamy wrote: > Hi List, > Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. > I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) This FAQ question may help: http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F This software may help: http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From sdavis2 at mail.nih.gov Mon Dec 18 21:16:19 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 18 Dec 2006 21:16:19 -0500 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <45874B73.7010600@mail.nih.gov> Gowthaman Ramasamy wrote: > Hi List, > Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. > I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) > See here: http://genome.ucsc.edu/cgi-bin/hgPcr?command=start It is designed for exactly this task, is very fast, is available as an executable or web-based (though watch the usage requirements), and the output can be parsed rather easily. Sean From cjfields at uiuc.edu Mon Dec 18 21:30:04 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 20:30:04 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <458723D3.4010908@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> Message-ID: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> >> I think you misunderstood me. The tree is fine; the data used to >> make >> the tree (NCBI taxonomy) is the issue. > > In what way is it the issue? The database is also fine as far as I can > see, in so far as it is not causing any problems in this instance. I should maybe have clarified a bit more: what I said has nothing to do with the structure of the database itself. I was just pointing out that NCBI Taxonomy isn't the best source of data for building a phylogenetic tree, something NCBI also points out (the link in my last post). Not a big deal, really. > Gabriel asked for a tree featuring a species and its subspecies. The > NCBI taxonomy database provided Bioperl the correct data to build > such a > tree. Then Gabriel asked to remove the degree one nodes of his > tree. His > problem was that doing that happened to (correctly) remove the species > node. If he wants to see both his species and his subspecies he must > either not remove degree one nodes, or alter the method of doing so to > keep desired nodes. There is no possible way for NCBI to improve > matters > here. Actually, there isn't any way they could w/o digging through the literature in many cases. The problem is incomplete taxonomic information for nodes derived from older sequence data, where a genus and species was designated but nothing else (strain, etc) is known. Again, I merely was pointing out what I had mentioned above. I wasn't criticizing you, Gabriel, or the methodology here. Honest! chris From avilella at gmail.com Mon Dec 18 16:43:27 2006 From: avilella at gmail.com (Albert Vilella) Date: Mon, 18 Dec 2006 21:43:27 +0000 Subject: [Bioperl-l] PAML files In-Reply-To: <4586F3A2.4010607@ub.edu> References: <4586BF57.4090002@ub.edu> <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> <4586F3A2.4010607@ub.edu> Message-ID: <358f4d650612181343o5bd51169w7b46cceb34a5c92b@mail.gmail.com> Filipe, if you need to create the ctl file but not run the job, you can use the "prepare" method in Codeml run. Also, there is a tmpdir and save_tempfiles method that will keep the files where you want. About the naming, you can add a ".tree" and ".aln" extension to the tempnames if you want, by altering the $temptreefile and $tempseqfile variables in bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm (cvs head version). If you want, you can also add a couple of getters/setters there: sub alnfilename{ my $self = shift; return $self->{'alnfilename'} = shift if @_; return $self->{'alnfilename'}; } and subtitute those $tempseqfile io calls for you $self->{'alnfilename'} io calls. $codeml->alnfilename("/path/name"); $codeml->prepare; ... $codeml->run; What I use to do is to have the aln and tree files in a different place. Codeml will create the tmp files for running somewhere, and then delete all the stuff when done. Cheers, Albert. On 12/18/06, Filipe Garrett wrote: > > Hi Jason, > > This question is related with the one I made previously today. > I need to run codeml with 3 tree topologies. I looked on codeml module > but it only accepts one tree as input so I thought of using the codeml > module to prepare all the files and then I would just have to run the > codeml with the new tree file in batch. But for that I need to know > which one is the ctl file. > > any idea? > FG > > Jason Stajich wrote: > > They are temporary names so they are deliberately random and there is no > > intention of you needing them after a run since it to be cleaned up on > > the fly. We use an internal method for generating tempfiles that takes > > care of cleanup afterwards. I suppose since we do all the work within a > > temp directory that is cleaned up, one could have a fixed name for the > > tree, alignment, and ctl files but honestly we never expect people to be > > reading these filenames as they are intended to be transient. > > > > What problem are you having that you need the filename? > > > > -jason > > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > > > >> Hi all, > >> > >> does anyone knows how to get the name of the .ctl file created by the > >> PAML module? Inside the tmp directory there are 2 files with random > >> names (tree and ctl). Why do they have random names?? Wouldn't it be > >> easier to assign them a fixed name?? For instance "codeml.ctl" and > >> "tree.nwk"?? > >> > >> thanks in adv, > >> FG > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > http://jason.open-bio.org/ > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From valiente at lsi.upc.edu Mon Dec 18 23:18:20 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Tue, 19 Dec 2006 13:18:20 +0900 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> Message-ID: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> Thanks a lot for the prompt answer and follow-up discussion. I think this turned out not to be a bug in the merge_lineage() code but a direct consequence of building a phylogenetic tree instead of a taxonomic tree, aka with internal node labels. In order to reconstruct the NCBI taxonomy for the set of species present in a given phylogenetic tree, the only reasonable work-around seems to be a first step of merging lineages and contracting linear paths with the current implementation, followed by a second step of restricting the given phylogenetic tree to the set of species present in the obtained NCBI taxonomy. But this does not affect the code for merge_lineage(). Gabriel >>> I think you misunderstood me. The tree is fine; the data used to >>> make >>> the tree (NCBI taxonomy) is the issue. >> >> In what way is it the issue? The database is also fine as far as I >> can >> see, in so far as it is not causing any problems in this instance. > > I should maybe have clarified a bit more: what I said has nothing > to do with the structure of the database itself. I was just > pointing out that NCBI Taxonomy isn't the best source of data for > building a phylogenetic tree, something NCBI also points out (the > link in my last post). Not a big deal, really. > >> Gabriel asked for a tree featuring a species and its subspecies. The >> NCBI taxonomy database provided Bioperl the correct data to build >> such a >> tree. Then Gabriel asked to remove the degree one nodes of his >> tree. His >> problem was that doing that happened to (correctly) remove the >> species >> node. If he wants to see both his species and his subspecies he must >> either not remove degree one nodes, or alter the method of doing >> so to >> keep desired nodes. There is no possible way for NCBI to improve >> matters >> here. > > Actually, there isn't any way they could w/o digging through the > literature in many cases. The problem is incomplete taxonomic > information for nodes derived from older sequence data, where a > genus and species was designated but nothing else (strain, etc) is > known. > > Again, I merely was pointing out what I had mentioned above. I > wasn't criticizing you, Gabriel, or the methodology here. Honest! > > chris From cjfields at uiuc.edu Mon Dec 18 23:41:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 22:41:16 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> Message-ID: On Dec 18, 2006, at 10:18 PM, Gabriel Valiente wrote: > Thanks a lot for the prompt answer and follow-up discussion. I > think this turned out not to be a bug in the merge_lineage() code > but a direct consequence of building a phylogenetic tree instead of > a taxonomic tree, aka with internal node labels. > > In order to reconstruct the NCBI taxonomy for the set of species > present in a given phylogenetic tree, the only reasonable work- > around seems to be a first step of merging lineages and contracting > linear paths with the current implementation, followed by a second > step of restricting the given phylogenetic tree to the set of > species present in the obtained NCBI taxonomy. But this does not > affect the code for merge_lineage(). > > Gabriel I did notice one thing, though it's minor: if you use the option to retrieve the data from Entrez, a few species aren't found (even though they show up in a local taxonomy search). I think both were E. coli strains. chris From DGroskreutz at twt.com Tue Dec 19 02:00:40 2006 From: DGroskreutz at twt.com (DGroskreutz at twt.com) Date: Tue, 19 Dec 2006 01:00:40 -0600 Subject: [Bioperl-l] CN=Deb Groskreutz/OU=MSN/O=TWT is out of the office. Message-ID: I will be out of the office starting 12/18/2006 and will not return until 01/02/2007. NOTICE OF CONFIDENTIALITY: The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments. From michael.watson at bbsrc.ac.uk Tue Dec 19 07:20:56 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 19 Dec 2006 12:20:56 -0000 Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs? Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I'm using bioperl-1.4. I did do a google search fro this but couldn't find anything. If this is fixed in 1.5.2 then forgive me. I'm getting a warning: MSG: No whitespace allowed in FASTA ID [unknown id] When trying to convert from EMBL format to fasta. The offending sequence is CK234114: ID CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP. XX AC CK234114; XX DT 03-MAR-2004 (Rel. 79, Created) DT 03-MAR-2004 (Rel. 79, Last updated, Version 1) XX DE SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon cDNA DE Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA DE sequence. Etc Any advice? Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From michael.watson at bbsrc.ac.uk Tue Dec 19 07:27:59 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 19 Dec 2006 12:27:59 -0000 Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs? In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E682@iahce2ksrv1.iah.bbsrc.ac.uk> Sorry, problem solved. Mick -----Original Message----- From: michael watson (IAH-C) Sent: 19 December 2006 12:21 To: bioperl-l at lists.open-bio.org Subject: Problems with EMBL entries and fasta IDs? Hi I'm using bioperl-1.4. I did do a google search fro this but couldn't find anything. If this is fixed in 1.5.2 then forgive me. I'm getting a warning: MSG: No whitespace allowed in FASTA ID [unknown id] When trying to convert from EMBL format to fasta. The offending sequence is CK234114: ID CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP. XX AC CK234114; XX DT 03-MAR-2004 (Rel. 79, Created) DT 03-MAR-2004 (Rel. 79, Last updated, Version 1) XX DE SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon cDNA DE Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA DE sequence. Etc Any advice? Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From roest216 at student.otago.ac.nz Tue Dec 19 04:15:55 2006 From: roest216 at student.otago.ac.nz (Stephan Roessner) Date: Tue, 19 Dec 2006 22:15:55 +1300 Subject: [Bioperl-l] problems installing bioperl Message-ID: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> Dear support team, I installed bioperl 1.5.2_100 on a ferdora machine to be able to use gbrowse. The installation seems to work (except of the test failures) but the gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but of course it requires 1.52. Is there a chance to find out what went wrong? thanks a lot, Stephan From bix at sendu.me.uk Tue Dec 19 10:12:39 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 15:12:39 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> Message-ID: <45880167.9010605@sendu.me.uk> Stephan Roessner wrote: > Dear support team, > > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use > gbrowse. > The installation seems to work (except of the test failures) but the > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but > of course it requires 1.52. > > Is there a chance to find out what went wrong? Nothing went wrong with the Bioperl installation (well, expect there shouldn't have been any test failures - can you post those please?). gbrowse simply defined its Bioperl requirement incorrectly. If you tell me exactly where you downloaded gbrowse from and how you went about installing it, and provide the exact, complete error message you got from it, I might be able help the authors fix the problem. Or I'm pretty sure they can figure it our for themselves :) From cjfields at uiuc.edu Tue Dec 19 11:05:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 10:05:01 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] problems installing bioperl In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> Message-ID: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> On Dec 19, 2006, at 9:31 AM, Scott Cain wrote: > I really don't think the BioPerl version detection is wrong. I > actually > don't check Bio::Root::Version::VERSION in Makefile.PL, I check > Bio::Graphics::Panel->api_version. When it doesn't find the correct > api_version, it gives a warning the BioPerl 1.5.2 is not installed. I > have seen this happen when more than one BioPerl instance is installed > and `perl Makefile.PL` finds the wrong one first. My suggestion is to > try reinstalling BioPerl and providing the --uninst 1 argument to > remove > older versions of BioPerl: > > sudo ./Build install --uninst 1 > > Scott Could having two Bioperl instances explain the test failures? I'm not sure (maybe Sendu can answer this), but I would assume Module::Build uses the current working directory for test runs. chris From bix at sendu.me.uk Tue Dec 19 12:02:34 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 17:02:34 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] problems installing bioperl In-Reply-To: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> Message-ID: <45881B2A.8060907@sendu.me.uk> Chris Fields wrote: > > On Dec 19, 2006, at 9:31 AM, Scott Cain wrote: > >> I really don't think the BioPerl version detection is wrong. I actually >> don't check Bio::Root::Version::VERSION in Makefile.PL, I check >> Bio::Graphics::Panel->api_version. When it doesn't find the correct >> api_version, it gives a warning the BioPerl 1.5.2 is not installed. I >> have seen this happen when more than one BioPerl instance is installed >> and `perl Makefile.PL` finds the wrong one first. My suggestion is to >> try reinstalling BioPerl and providing the --uninst 1 argument to remove >> older versions of BioPerl: >> >> sudo ./Build install --uninst 1 >> >> Scott > > Could having two Bioperl instances explain the test failures? I'm not > sure (maybe Sendu can answer this), but I would assume Module::Build > uses the current working directory for test runs. It does, so that shouldn't be an issue for the test failures. From ferraria at gmail.com Tue Dec 19 11:40:05 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Tue, 19 Dec 2006 17:40:05 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy Message-ID: Hi all, I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with the cpan shell. I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI 'gene' database (first step of my pipeline). But the installation of this package doesn't seem to be correct : The simple example given on the documentation doesn't work. (this one : http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) Here is the error message I got : "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." In the UserAgent package, line 779 is in the private "_need_proxy" subroutine and corresponds to the code : ...if (@{ $self->{'no_proxy'} }) ... If I comment this line in the UserAgent package and the corresponding "}", the example works. But obviously, I prefer to solve the problem in a regular way :) Indeed, my computer accesses the internet via a http proxy and I didn't tell this to BioPerl at any moment. As I read on the BioPerl Wiki site, I tried to configure an $http_proxy environment variable but it still doesn't work. One last maybe important information is that I saw during the installation that the tests 't/EUtilities' were skipped because of an unrevealed reason. So finally I got two questions : 1. Is there somebody who can figure out what is my problem ? 2. At the moment, is the Bio::DB::EUtilities package really efficient or using directly the NCBI eutilities with the LWP::Simple package could be an good alternative ? Many thanks in advance, Best Regards, Anthony Ferrari From bix at sendu.me.uk Tue Dec 19 12:06:03 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 17:06:03 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> Message-ID: <45881BFB.7020008@sendu.me.uk> Scott Cain wrote: > I really don't think the BioPerl version detection is wrong. I actually > don't check Bio::Root::Version::VERSION in Makefile.PL, I check > Bio::Graphics::Panel->api_version. When it doesn't find the correct > api_version, it gives a warning the BioPerl 1.5.2 is not installed. I > have seen this happen when more than one BioPerl instance is installed > and `perl Makefile.PL` finds the wrong one first. Yes, I saw that, which is why I thought I must be looking at something different to what the OP had installed. > My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove > older versions of BioPerl: > > sudo ./Build install --uninst 1 My confusion is that he has definitely installed 1.5.2 and this version is being polled for its version number (by something!) and returning the correct '1.0050021', whilst the something expects '1.52'. Anyway, this can only be resolved if Stephan provides the real error message and its context. From cjfields at uiuc.edu Tue Dec 19 12:27:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 11:27:24 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: Message-ID: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote: > Hi all, > > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake > machine with > the cpan shell. > I want to use the Bio::DB::EUtilities to retrieve data (id's) from > NCBI > 'gene' database (first step of my pipeline). > > But the installation of this package doesn't seem to be correct : > The simple example given on the documentation doesn't work. (this > one : > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) > > Here is the error message I got : > "Can't use an undefined value as an ARRAY reference at > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > In the UserAgent package, line 779 is in the private "_need_proxy" > subroutine and corresponds to the code : ...if (@{ $self-> > {'no_proxy'} }) > ... > > If I comment this line in the UserAgent package and the > corresponding "}", > the example works. But obviously, I prefer to solve the problem in > a regular > way :) > > Indeed, my computer accesses the internet via a http proxy and I > didn't tell > this to BioPerl at any moment. > As I read on the BioPerl Wiki site, I tried to configure an > $http_proxy > environment variable but it still doesn't work. > > One last maybe important information is that I saw during the > installation > that the tests 't/EUtilities' were skipped because of an unrevealed > reason. > > > So finally I got two questions : > 1. Is there somebody who can figure out what is my problem ? > 2. At the moment, is the Bio::DB::EUtilities package really > efficient or > using directly the NCBI eutilities with the LWP::Simple package > could be an > good alternative ? > > Many thanks in advance, > Best Regards, > Anthony Ferrari First things first: at the moment the BioPerl EUtilities interface is very experimental (as specifically outlined in the POD), so I can't really recommend it for production use until the API is cleaned up. However, I do appreciate any feedback or comments re:EUtilities (the reason it's out there in the 1.5.2 release). You might check out this bug report, which relates directly to your issue: http://bugzilla.open-bio.org/show_bug.cgi?id=2109 After I worked out the proxy issue Torsten got it working. Let me know if this doesn't help or fix the problem. chris From cain at cshl.edu Tue Dec 19 10:31:50 2006 From: cain at cshl.edu (Scott Cain) Date: Tue, 19 Dec 2006 10:31:50 -0500 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <45880167.9010605@sendu.me.uk> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> Message-ID: <1166542310.6981.119.camel@localhost.localdomain> I really don't think the BioPerl version detection is wrong. I actually don't check Bio::Root::Version::VERSION in Makefile.PL, I check Bio::Graphics::Panel->api_version. When it doesn't find the correct api_version, it gives a warning the BioPerl 1.5.2 is not installed. I have seen this happen when more than one BioPerl instance is installed and `perl Makefile.PL` finds the wrong one first. My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove older versions of BioPerl: sudo ./Build install --uninst 1 Scott On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote: > Stephan Roessner wrote: > > Dear support team, > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use > > gbrowse. > > The installation seems to work (except of the test failures) but the > > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but > > of course it requires 1.52. > > > > Is there a chance to find out what went wrong? > > Nothing went wrong with the Bioperl installation (well, expect there > shouldn't have been any test failures - can you post those please?). > gbrowse simply defined its Bioperl requirement incorrectly. If you tell > me exactly where you downloaded gbrowse from and how you went about > installing it, and provide the exact, complete error message you got > from it, I might be able help the authors fix the problem. > > Or I'm pretty sure they can figure it our for themselves :) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From ferraria at gmail.com Tue Dec 19 12:06:31 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Tue, 19 Dec 2006 18:06:31 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: Message-ID: Hi all, I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with the cpan shell. I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI 'gene' database (first step of my pipeline). But the installation of this package doesn't seem to be correct : The simple example given on the documentation doesn't work. (this one : http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) Here is the error message I got : "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." In the UserAgent package, line 779 is in the private "_need_proxy" subroutine and corresponds to the code : ...if (@{ $self->{'no_proxy'} }) ... If I comment this line in the UserAgent package and the corresponding "}", the example works. But obviously, I prefer to solve the problem in a regular way :) Indeed, my computer accesses the internet via a http proxy and I didn't tell this to BioPerl at any moment. As I read on the BioPerl Wiki site, I tried to configure an $http_proxy environment variable but it still doesn't work. One last maybe important information is that I saw during the installation that the tests 't/EUtilities' were skipped because of an unrevealed reason. So finally I got two questions : 1. Is there somebody who can figure out what is my problem ? 2. At the moment, is the Bio::DB::EUtilities package really efficient or using directly the NCBI eutilities with the LWP::Simple package could be an good alternative ? Many thanks in advance, Best Regards, Anthony Ferrari From stewarta at nmrc.navy.mil Tue Dec 19 13:49:57 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Tue, 19 Dec 2006 13:49:57 -0500 Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3 Message-ID: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> I see that Bio::Tools::Glimmer documentation clearly states that this module is intended for use with GlimmerM (eukaryotic version) only. I am wondering if anyone can recall any talk about adopting Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)? I've searched the list history with little luck other than someone else asking a similar question. If not, does anyone have any thoughts on how difficult it might be to implement support for glimmer2/3 result parsing? Perhaps just a matter of editing the _parse_predictions method? -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From rvosa at sfu.ca Tue Dec 19 13:53:47 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 19 Dec 2006 10:53:47 -0800 Subject: [Bioperl-l] problems installing bioperl Message-ID: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From cjfields at uiuc.edu Tue Dec 19 14:31:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 13:31:17 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3 In-Reply-To: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> References: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> Message-ID: <71E04575-DFD2-4F5A-B268-493D3246CBFA@uiuc.edu> On Dec 19, 2006, at 12:49 PM, Andrew Stewart wrote: > I see that Bio::Tools::Glimmer documentation clearly states that this > module is intended for use with GlimmerM (eukaryotic version) only. > I am wondering if anyone can recall any talk about adopting > Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)? > I've searched the list history with little luck other than someone > else asking a similar question. There is a thread here: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12546/ focus=12546 > If not, does anyone have any thoughts on how difficult it might be to > implement support for glimmer2/3 result parsing? Perhaps just a > matter of editing the _parse_predictions method? It depends on how different the various Glimmer formats are; I'll have to look at the ones Torsten added in CVS. You could always try modifying Bio::Tools::Glimmer to parse Glimmer2/3 and GlimmerM reports, but based on the mail list thread above it may not be so straightforward. chris From MEC at stowers-institute.org Tue Dec 19 14:57:48 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 19 Dec 2006 13:57:48 -0600 Subject: [Bioperl-l] bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation Message-ID: Lincoln and fellow Bio::DB::SeqFeature travelers, I find that using bp_seqfeature_load.PLS to load subfeatures of genes already loaded using bp_seqfeature_load.PLS fails with ------------- EXCEPTION ------------- MSG: FBgn0017545 doesn't have a primary id STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 STACK toplevel /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo ad.PLS:76 Where FBgn0017545 is the ID of a gene previously loaded. I am unsure how to remedy my situation and welcome any advise on correct or improved approach to my problem. Here's some detail if it helps. I am developing a pipeline to design a microarray probes capable of distinguishing among splice variants in drosophila (using latest Flybase dmel_r5.1 annotation). So I 1) load a filtered selection of Flybase annotation using bp_seqfeature_load. (for testing purposes, I am using a single gene's worth of annotation, FBgn0017545.gff, attached). This is done as follows: > bp_seqfeature_load.PLS --create FBgn0017545.gff 2) analyze all the genes in the database, and create GFF3 output each feature of which has a 'Parent' that is a previously loaded gene (i.e. FBgn0017545). (These features represent the unique introns, splice sites, and exonic design targets. Output of this analysis, FBgn0017545_matd.gff, is also attached) 3) load these analysis results into the same database, as follows: > bp_seqfeature_load.PLS FBgn0017545_matd.gff It is at this point that I get the above error. However, I don't get any error and the data loads fine if I load the two files together, as follows: > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff FBgn0017545_matd.gff) So, I suspect that either I am misunderstanding when/how to use bp_seqfeature_load.PLS or else this use case has not yet arisen and must be provided for somehow. I am running against bioperl-live Thanks for your thoughts and assistance, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri From Kevin.M.Brown at asu.edu Tue Dec 19 16:46:19 2006 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Tue, 19 Dec 2006 14:46:19 -0700 Subject: [Bioperl-l] Bio::SimpleAlign Message-ID: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> I'm working on a script that plays around with alignments of sequences and one of the things I noticed is that the code for the match method does not seem to actually use the start/end information when creating the match between objects in the alignment. Maybe I'm misunderstanding what the alignment is supposed to hold in terms of sequence. The alignment objects I build up are based on the sequence of a gene and the sequences of the primers that amplify that gene. $alignments{$gene->id()}->add_seq( new Bio::LocatableSeq( -seq => $seq[0]->seq(), -id => $seq[0]->id(), -start => $start, -end => $start + $seq[0]->length() - 1, -strand => 1 ) ); $alignments{$gene->id()}->add_seq( new Bio::LocatableSeq( -seq => $seq[1]->seq(), -id => $seq[1]->id(), -start => $stop, -end => $stop + $seq[1]->length() - 1, -strand => -1 ) ); So, you can see I input a start and stop point for the primer, but when I use the match function all it does is match the first character of the gene sequence to the first char of the primer sequences, then the second gene char to the second in each primer, etc... This doesn't seem to fit with the documentation and seems odd that there would be holders for the start/stop points and not use them when doing things like matching of sequences in an alignment. From bix at sendu.me.uk Tue Dec 19 17:01:22 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 22:01:22 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> References: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> Message-ID: <45886132.7050505@sendu.me.uk> Rutger Vos wrote: > Aren't 1.5.2_100 and 1.0050021 supposed to be equivalent in in this weird > version-string-translation way that makes 5.5 and 5.005 equivalent also? Yes, 1.5.2_100 and 1.0050021 are equivalent. The equivalent of 5.5 is 5.500 however. From lstein at cshl.edu Tue Dec 19 16:58:24 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 19 Dec 2006 16:58:24 -0500 Subject: [Bioperl-l] bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation In-Reply-To: References: Message-ID: <6dce9a0b0612191358t4764bfe0g601cd22d09132e55@mail.gmail.com> Hi Malcom, Your second guess was right. The use case of augmenting an existing gene with additional splice forms isn't provided for. You can get the functionality by making direct calls to Bio::DB::SeqFeature::Store methods: my @genes = $db->get_features_by_name('FBgn0017545'); @genes == 1 or die "Didn't get exactly one gene"; my $parent = $genes[0]; my $parent = $genes[0]; my $chr = $parent->seq_id; my $start = $parent->start; my $end = $parent->end; my $strand = $parent->strand; my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', -source => 'added', -seq_id => '4r', -strand => $strand, -start => $start+10, -end => $end, ); $parent->add_SeqFeature($new_splice_form); for my $pos ([$start+10,$start+100],[$start+200,$end]) { my ($e_start,$e_end) = @$pos; my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', -store => $db, -seq_id => '4r', -strand => $strand, -start => $e_start, -end => $e_end); $new_splice_form->add_SeqFeature($exon); } I found a bug in updating the seqfeature database when I wrote this script, so you'll have to get the latest biperl live. I think you can use this to write a splice form updating script. In order to support the idea of adding new splice forms to an existing gene using the GFF3 format, I will have to either modify the loader, or write a separate script (probably better to do the latter). It shouldn't be hard if you'd like to give it a try. Lincoln On 12/19/06, Cook, Malcolm wrote: > > Lincoln and fellow Bio::DB::SeqFeature travelers, > > I find that using bp_seqfeature_load.PLS to load subfeatures of genes > already loaded using bp_seqfeature_load.PLS fails with > > ------------- EXCEPTION ------------- > MSG: FBgn0017545 doesn't have a primary id > STACK > Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 > STACK toplevel > /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo > ad.PLS:76 > > Where FBgn0017545 is the ID of a gene previously loaded. > > I am unsure how to remedy my situation and welcome any advise on correct > or improved approach to my problem. > > Here's some detail if it helps. I am developing a pipeline to design a > microarray probes capable of distinguishing among splice variants in > drosophila (using latest Flybase dmel_r5.1 annotation). So I > > 1) load a filtered selection of Flybase annotation using > bp_seqfeature_load. (for testing purposes, I am using a single gene's > worth of annotation, FBgn0017545.gff, attached). This is done as > follows: > > > bp_seqfeature_load.PLS --create FBgn0017545.gff > > 2) analyze all the genes in the database, and create GFF3 output each > feature of which has a 'Parent' that is a previously loaded gene (i.e. > FBgn0017545). (These features represent the unique introns, splice > sites, and exonic design targets. Output of this analysis, > FBgn0017545_matd.gff, is also attached) > > 3) load these analysis results into the same database, as follows: > > > bp_seqfeature_load.PLS FBgn0017545_matd.gff > > It is at this point that I get the above error. > > However, I don't get any error and the data loads fine if I load the two > files together, as follows: > > > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff > FBgn0017545_matd.gff) > > So, I suspect that either I am misunderstanding when/how to use > bp_seqfeature_load.PLS or else this use case has not yet arisen and must > be provided for somehow. > > I am running against bioperl-live > > Thanks for your thoughts and assistance, > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From rvosa at sfu.ca Tue Dec 19 23:23:20 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 19 Dec 2006 20:23:20 -0800 Subject: [Bioperl-l] suggestions for suitable 'taxon' object Message-ID: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From cjfields at uiuc.edu Wed Dec 20 01:16:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 00:16:47 -0600 Subject: [Bioperl-l] suggestions for suitable 'taxon' object In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> Message-ID: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote: > Hi all, > > I am looking for a bioperl object that can be abused to function as a > suitable 'taxon' object, where I mean 'taxon' as understood by the > NEXUS > file format (i.e. not strictly an entity from a taxonomy, but more > loosely > an OTU). > > The object would primarily function as a way to relate nodes in > trees to > sequences in an alignment (a foreign key that both nodes and > sequences refer > to), and secondarily as the keeper of the canonical name of the > OTU, such > that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node > named 'Homo > sapiens (constrained monophyly)' can still be understood to refer > to the > same thing - the OTU 'Homo sapiens sapiens' (for example). Alignment (SimpleAlign) objects contain Bio::LocatableSeq sequence objects; at the moment LocatableSeqs don't store their own annotation but they could easily be made or subclassed to be AnnotatableI (i.e. they can store annotation collections). I recently made SimpleAlign Annotatable; Jason has also made SimpleAlign implement FeatureHolderI, so alignments can store SeqFeatures as well; he may have his own designs here. There may be a wide variety of ways to go about this. I would probably do the following (bear in mind I'm a microbiologist, not a computer scientist). If one could add trees as annotation to the alignment (i.e. if trees could be Annotation objects and kept in the SimpleAlign's annotation collection), and each sequence in the alignment contained reference to a node object of the tree (i.e. if Bio::Taxon/Bio::Species objects could also be Annotation objects, but kept in a LocatableSeq annotation collection), both could refer to the same node object. This may not be exactly what you want, but maybe it's close? SimpleAlign->AnnoColln->Tree->OTU(Nodes) \----->LocSeqs-->AnnoColln-----/ I suppose this could also be done with Seqfeatures... > I was thinking that a (possibly expanded) Bio::Species might work > if there > was some sensible way of appending references to node and sequence > objects > to it (or otherwise associate them with each other), but I am > writing *to > solicit any and all suggestions*. I am looking for something > similar to > Bio::Phylo::Taxa::Taxon. > > Any and all comments and suggestions greatly appreciated! > > Best wishes, > > Rutger Vos Sendu would be the best one to speak about Bio::Taxon and Bio::Species and may have some ideas on the above. The current plan was to deprecate Bio::Species, but who knows? chris From heikki at sanbi.ac.za Wed Dec 20 05:25:08 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 20 Dec 2006 12:25:08 +0200 Subject: [Bioperl-l] Bio::SimpleAlign In-Reply-To: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> Message-ID: <200612201225.08862.heikki@sanbi.ac.za> Kevin, Sequences that are added to the alignment are supposed to be *aligned*. SimpleAlign does not do it for you. It seems to me that you are adding sequences like this: nnnnnnnnnnnnnnnnnnnn 1 - 20, "a short gene" nnnnnn 21 - 26 "a short primer after the gene" when you should be doing this nnnnnnnnnnnnnnnnnnnn 1 - 20, "a short gene" --------------------nnnnnn 21 - 26 "a short primer after the gene" Note that the default way of displaying names in SimpleAlign is "name/start-end". The name usually are expected to refer to the sequence from which this subsequence is derived from. The displayname does not change if you add gaps. Yours, -Heikki On Tuesday 19 December 2006 23:46, Kevin Brown wrote: > I'm working on a script that plays around with alignments of sequences > and one of the things I noticed is that the code for the match method > does not seem to actually use the start/end information when creating > the match between objects in the alignment. Maybe I'm misunderstanding > what the alignment is supposed to hold in terms of sequence. The > alignment objects I build up are based on the sequence of a gene and the > sequences of the primers that amplify that gene. > > $alignments{$gene->id()}->add_seq( > new Bio::LocatableSeq( > -seq => $seq[0]->seq(), > -id => $seq[0]->id(), > -start => $start, > -end => $start + $seq[0]->length() - 1, > -strand => 1 > ) > ); If your sequence does not contain gaps and the numbering starts from one, you can let the object handle start/stop: my $a = new Bio::LocatableSeq( -seq => 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', -id => 'A00001', -strand => 1 } > $alignments{$gene->id()}->add_seq( > new Bio::LocatableSeq( > -seq => $seq[1]->seq(), > -id => $seq[1]->id(), > -start => $stop, > -end => $stop + $seq[1]->length() - 1, > -strand => -1 > ) > ); > > So, you can see I input a start and stop point for the primer, but when > I use the match function all it does is match the first character of the > gene sequence to the first char of the primer sequences, then the second > gene char to the second in each primer, etc... This doesn't seem to fit > with the documentation and seems odd that there would be holders for the > start/stop points and not use them when doing things like matching of > sequences in an alignment. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From ferraria at gmail.com Wed Dec 20 06:04:16 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 12:04:16 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> Message-ID: On 19/12/06, Chris Fields wrote: > > > On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote: > > > Hi all, > > > > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake > > machine with > > the cpan shell. > > I want to use the Bio::DB::EUtilities to retrieve data (id's) from > > NCBI > > 'gene' database (first step of my pipeline). > > > > But the installation of this package doesn't seem to be correct : > > The simple example given on the documentation doesn't work. (this > > one : > > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) > > > > Here is the error message I got : > > "Can't use an undefined value as an ARRAY reference at > > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > In the UserAgent package, line 779 is in the private "_need_proxy" > > subroutine and corresponds to the code : ...if (@{ $self-> > > {'no_proxy'} }) > > ... > > > > If I comment this line in the UserAgent package and the > > corresponding "}", > > the example works. But obviously, I prefer to solve the problem in > > a regular > > way :) > > > > Indeed, my computer accesses the internet via a http proxy and I > > didn't tell > > this to BioPerl at any moment. > > As I read on the BioPerl Wiki site, I tried to configure an > > $http_proxy > > environment variable but it still doesn't work. > > > > One last maybe important information is that I saw during the > > installation > > that the tests 't/EUtilities' were skipped because of an unrevealed > > reason. > > > > > > So finally I got two questions : > > 1. Is there somebody who can figure out what is my problem ? > > 2. At the moment, is the Bio::DB::EUtilities package really > > efficient or > > using directly the NCBI eutilities with the LWP::Simple package > > could be an > > good alternative ? > > > > Many thanks in advance, > > Best Regards, > > Anthony Ferrari > > First things first: at the moment the BioPerl EUtilities interface is > very experimental (as specifically outlined in the POD), so I can't > really recommend it for production use until the API is cleaned up. > However, I do appreciate any feedback or comments re:EUtilities (the > reason it's out there in the 1.5.2 release). > > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > I carefully read this bug but that doesn't help because this has already been modified in the now given GenericWebDBI.pm So my problem does not come from a deep recursion loop. As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/EUtilities.t " to see what's really happening. And actually, all tests are skipped because of the same message error -> "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." *** I tried the same command with the modified LWP::UserAgent package (which means I comment the line 779 and the corresponding '}') and all 453 tests passed. But not always. I made the tests several times and it often failed. And always on a test called "eXXX->cookie->cookie() query key" (ending with query key). In those cases, I got back a html message indicating that the error was thrown by the internal sever of NCBI. So I guess that sometimes it is just NCBI server fault (internal problem), and BioPerl is not implied.. But once more, I comment a line from a basic package so it is a bit hazardous. *** tony - a little bit lost. From smane at vbi.vt.edu Tue Dec 19 14:46:56 2006 From: smane at vbi.vt.edu (Shrinivasrao P. Mane) Date: Tue, 19 Dec 2006 14:46:56 -0500 Subject: [Bioperl-l] Using Muscle parameter within bioperl Message-ID: Hi, I need to run muscle using bioperl. This is how I do it in command line. muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet I used the following in perl script my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); The program runs and produces the result file but it doesn't create a log file nor does it stop sending output to STDOUT (-quiet). Could anybody help me with this? Thanks Mane From cjfields at uiuc.edu Wed Dec 20 09:09:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 08:09:56 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> Message-ID: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > > > I carefully read this bug but that doesn't help because this has > already been modified in the now given GenericWebDBI.pm > So my problem does not come from a deep recursion loop. > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > EUtilities.t " to see what's really happening. > And actually, all tests are skipped because of the same message error > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > *** > I tried the same command with the modified LWP::UserAgent package > (which means I comment the line 779 and the corresponding '}') and > all 453 tests passed. > But not always. I made the tests several times and it often > failed. And always on a test called "eXXX->cookie->cookie() query > key" (ending with query key). In those cases, I got back a html > message indicating that the error was thrown by the internal sever > of NCBI. So I guess that sometimes it is just NCBI server fault > (internal problem), and BioPerl is not implied.. > But once more, I comment a line from a basic package so it is a bit > hazardous. > *** > > tony - a little bit lost. I'm cc'ing Torsten as he has a bit more experience with proxies. EUtilities is set up to check for an env. proxy and also take a set proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy to say this was a bug in LWP, but I think the problem is that something is undefined (i.e. an env. variable), or username/password. From the bug report, Torsten set his proxy variables using the following: -------------------------------------- "Note: I am behind an _authenticating_ proxy. My $http_proxy and $HTTP_PROXY are both set to http://USER:PASS at proxy.monash.edu.au:80/" -------------------------------------- Note the lowercase for $http_proxy, which can make a difference. After the recursion fix, I'm assuming he made no changes to the env. settings, and according to the bug everything was fine (is that correct Tortsen?). Also LWP::UserAgent has this: -------------------------------------- "Load proxy settings from *_proxy environment variables. You might specify proxies like this (sh-syntax): gopher_proxy=http://proxy.my.place/ wais_proxy=http://proxy.my.place/ no_proxy="localhost,my.domain" export gopher_proxy wais_proxy no_proxy csh or tcsh users should use the setenv command to define these environment variables. On systems with case insensitive environment variables there exists a name clash between the CGI environment variables and the HTTP_PROXY environment variable normally picked up by env_proxy(). Because of this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY environment variable can be used instead." -------------------------------------- chris From bix at sendu.me.uk Wed Dec 20 09:08:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 14:08:16 +0000 Subject: [Bioperl-l] Using Muscle parameter within bioperl In-Reply-To: References: Message-ID: <458943D0.10400@sendu.me.uk> Shrinivasrao P. Mane wrote: > Hi, > I need to run muscle using bioperl. This is how I do it in command line. > > muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet > > I used the following in perl script > > my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => > 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); > > The program runs and produces the result file but it doesn't create a > log file nor does it stop sending output to STDOUT (-quiet). > Could anybody help me with this? The Muscle arguments don't take dashed args. To make switches active you need to set them to some true value. So (-verbose => 1, quiet => 1, log => 'inv.log'). Verbose may not do what you want since it is both a Bioperl option and a Muscle option; if you want the latter try using verbose => 1. From bix at sendu.me.uk Wed Dec 20 09:51:33 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 14:51:33 +0000 Subject: [Bioperl-l] suggestions for suitable 'taxon' object In-Reply-To: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> Message-ID: <45894DF5.1060503@sendu.me.uk> Chris Fields wrote: > On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote: > >> Hi all, >> >> I am looking for a bioperl object that can be abused to function as >> a suitable 'taxon' object, where I mean 'taxon' as understood by >> the NEXUS file format (i.e. not strictly an entity from a taxonomy, >> but more loosely an OTU). >> >> The object would primarily function as a way to relate nodes in >> trees to sequences in an alignment (a foreign key that both nodes >> and sequences refer to), and secondarily as the keeper of the >> canonical name of the OTU, such that a sequence named >> 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo sapiens >> (constrained monophyly)' can still be understood to refer to the >> same thing - the OTU 'Homo sapiens sapiens' (for example). I haven't had time to give your suggestions consideration, but I can say that I'm having to do the same thing for a bioperl-run module and my work-around is simply to set a custom name on my Bio::Taxon objects. To explain, I have the benefit that my tree is made up of Bio::Taxon objects, so I call $taxon->name('seq_id', $seq->id). Then when I want to know which of my sequences corresponds to a particular taxon, I work out which of them has the id given by shift @{$taxon->name('seq_id')}. Hardly ideal, but it works for now. >> I was thinking that a (possibly expanded) Bio::Species might work >> if there was some sensible way of appending references to node and >> sequence objects to it (or otherwise associate them with each >> other), but I am writing *to solicit any and all suggestions*. I am >> looking for something similar to Bio::Phylo::Taxa::Taxon. > > Sendu would be the best one to speak about Bio::Taxon and > Bio::Species and may have some ideas on the above. The current plan > was to deprecate Bio::Species, but who knows? Given that we do plan to deprecate Bio::Species, I'd resist the temptation to expand on it. Use Bio::Taxon as a base if it has stuff you need, or base straight from Bio::Tree::Node if not. From ferraria at gmail.com Wed Dec 20 10:40:34 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 16:40:34 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> Message-ID: Defining a "no_proxy" environment variable in my '.bashrc' file solved my problem. I set it to "localhost". It indeed corresponds to the line... [ ...if (@{ $self->{'no_proxy'} }) ... ] (I guess!) I really don't know why we are compelled to do this, but let's say that's the way it is. It works now ! Thanks a lot. Tony On 20/12/06, Chris Fields wrote: > > > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > > > You might check out this bug report, which relates directly to your > > issue: > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > > > After I worked out the proxy issue Torsten got it working. Let me > > know if this doesn't help or fix the problem. > > > > chris > > > > > > I carefully read this bug but that doesn't help because this has > > already been modified in the now given GenericWebDBI.pm > > So my problem does not come from a deep recursion loop. > > > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > > EUtilities.t " to see what's really happening. > > And actually, all tests are skipped because of the same message error > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > *** > > I tried the same command with the modified LWP::UserAgent package > > (which means I comment the line 779 and the corresponding '}') and > > all 453 tests passed. > > But not always. I made the tests several times and it often > > failed. And always on a test called "eXXX->cookie->cookie() query > > key" (ending with query key). In those cases, I got back a html > > message indicating that the error was thrown by the internal sever > > of NCBI. So I guess that sometimes it is just NCBI server fault > > (internal problem), and BioPerl is not implied.. > > But once more, I comment a line from a basic package so it is a bit > > hazardous. > > *** > > > > tony - a little bit lost. > > I'm cc'ing Torsten as he has a bit more experience with proxies. > > EUtilities is set up to check for an env. proxy and also take a set > proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy > to say this was a bug in LWP, but I think the problem is that > something is undefined (i.e. an env. variable), or username/password. > > From the bug report, Torsten set his proxy variables using the > following: > > -------------------------------------- > "Note: I am behind an _authenticating_ proxy. > My $http_proxy and $HTTP_PROXY are both set to > http://USER:PASS at proxy.monash.edu.au:80/" > -------------------------------------- > > Note the lowercase for $http_proxy, which can make a difference. > After the recursion fix, I'm assuming he made no changes to the env. > settings, and according to the bug everything was fine (is that > correct Tortsen?). > > Also LWP::UserAgent has this: > > -------------------------------------- > "Load proxy settings from *_proxy environment variables. You might > specify proxies like this (sh-syntax): > > gopher_proxy=http://proxy.my.place/ > wais_proxy=http://proxy.my.place/ > no_proxy="localhost,my.domain" > export gopher_proxy wais_proxy no_proxy > > csh or tcsh users should use the setenv command to define these > environment variables. > > On systems with case insensitive environment variables there exists a > name clash between the CGI environment variables and the HTTP_PROXY > environment variable normally picked up by env_proxy(). Because of > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY > environment variable can be used instead." > -------------------------------------- > > chris > From cjfields at uiuc.edu Wed Dec 20 11:10:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 10:10:48 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: Message-ID: <007901c72451$6ad540a0$15327e82@pyrimidine> Just to clarify: does it work it you don't have any proxy env. settings? chris _____ From: Anthony Ferrari [mailto:ferraria at gmail.com] Sent: Wednesday, December 20, 2006 9:41 AM To: Chris Fields Cc: bioperl-l List; Torsten Seemann Subject: Re: [Bioperl-l] Problem with : EUtilities - Proxy Defining a "no_proxy" environment variable in my '.bashrc' file solved my problem. I set it to "localhost". It indeed corresponds to the line... [ ...if (@{ $self->{'no_proxy'} }) ... ] (I guess!) I really don't know why we are compelled to do this, but let's say that's the way it is. It works now ! Thanks a lot. Tony On 20/12/06, Chris Fields wrote: On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > > > I carefully read this bug but that doesn't help because this has > already been modified in the now given GenericWebDBI.pm > So my problem does not come from a deep recursion loop. > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > EUtilities.t " to see what's really happening. > And actually, all tests are skipped because of the same message error > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > *** > I tried the same command with the modified LWP::UserAgent package > (which means I comment the line 779 and the corresponding '}') and > all 453 tests passed. > But not always. I made the tests several times and it often > failed. And always on a test called "eXXX->cookie->cookie() query > key" (ending with query key). In those cases, I got back a html > message indicating that the error was thrown by the internal sever > of NCBI. So I guess that sometimes it is just NCBI server fault > (internal problem), and BioPerl is not implied.. > But once more, I comment a line from a basic package so it is a bit > hazardous. > *** > > tony - a little bit lost. I'm cc'ing Torsten as he has a bit more experience with proxies. EUtilities is set up to check for an env. proxy and also take a set proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy to say this was a bug in LWP, but I think the problem is that something is undefined ( i.e. an env. variable), or username/password. >From the bug report, Torsten set his proxy variables using the following: -------------------------------------- "Note: I am behind an _authenticating_ proxy. My $http_proxy and $HTTP_PROXY are both set to http://USER:PASS at proxy.monash.edu.au:80/" -------------------------------------- Note the lowercase for $http_proxy, which can make a difference. After the recursion fix, I'm assuming he made no changes to the env. settings, and according to the bug everything was fine (is that correct Tortsen?). Also LWP::UserAgent has this: -------------------------------------- "Load proxy settings from *_proxy environment variables. You might specify proxies like this (sh-syntax): gopher_proxy=http://proxy.my.place/ wais_proxy= http://proxy.my.place/ no_proxy="localhost,my.domain" export gopher_proxy wais_proxy no_proxy csh or tcsh users should use the setenv command to define these environment variables. On systems with case insensitive environment variables there exists a name clash between the CGI environment variables and the HTTP_PROXY environment variable normally picked up by env_proxy(). Because of this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY environment variable can be used instead." -------------------------------------- chris From ferraria at gmail.com Wed Dec 20 11:59:49 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 17:59:49 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <007901c72451$6ad540a0$15327e82@pyrimidine> References: <007901c72451$6ad540a0$15327e82@pyrimidine> Message-ID: First, I got a $http_proxy env. variable automatically defined by the BioPerl installation (I don't define and export it in my .bash_profile). So when I'm logging in, $http_proxy=http://ip_adress:port/ Next step : two solutions : 1) defining an $no_proxy env.variable in my .bash_profile. It can be set to 'whatever'. 2) If I do not define '$no_proxy'; to make it work, I must call the no_proxy() method on each Bio::DB::EUtilities object I create before I can call the get_response() method on it. (The bug is in the 'get_response' call) And finally without 1) or 2) it doesn't work. Tony On 20/12/06, Chris Fields wrote: > > Just to clarify: does it work it you don't have any proxy env. settings? > One thing I didn't point out previously is that Bio::DB::GenericWebDBI > inherits LWP::UserAgent. You should be able to use $eutil->no_proxy() > instead of setting it in your env. > chris > > ------------------------------ > *From:* Anthony Ferrari [mailto:ferraria at gmail.com] > *Sent:* Wednesday, December 20, 2006 9:41 AM > *To:* Chris Fields > *Cc:* bioperl-l List; Torsten Seemann > *Subject:* Re: [Bioperl-l] Problem with : EUtilities - Proxy > > Defining a "no_proxy" environment variable in my '.bashrc' file solved my > problem. I set it to "localhost". > > It indeed corresponds to the line... [ ...if (@{ > $self->{'no_proxy'} }) ... ] (I guess!) > > > I really don't know why we are compelled to do this, but let's say that's > the way it is. > > It works now ! > > Thanks a lot. > > Tony > > > > > On 20/12/06, Chris Fields wrote: > > > > > > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > > > > > You might check out this bug report, which relates directly to your > > > issue: > > > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > > > > > After I worked out the proxy issue Torsten got it working. Let me > > > know if this doesn't help or fix the problem. > > > > > > chris > > > > > > > > > I carefully read this bug but that doesn't help because this has > > > already been modified in the now given GenericWebDBI.pm > > > So my problem does not come from a deep recursion loop. > > > > > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > > > EUtilities.t " to see what's really happening. > > > And actually, all tests are skipped because of the same message error > > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > > > *** > > > I tried the same command with the modified LWP::UserAgent package > > > (which means I comment the line 779 and the corresponding '}') and > > > all 453 tests passed. > > > But not always. I made the tests several times and it often > > > failed. And always on a test called "eXXX->cookie->cookie() query > > > key" (ending with query key). In those cases, I got back a html > > > message indicating that the error was thrown by the internal sever > > > of NCBI. So I guess that sometimes it is just NCBI server fault > > > (internal problem), and BioPerl is not implied.. > > > But once more, I comment a line from a basic package so it is a bit > > > hazardous. > > > *** > > > > > > tony - a little bit lost. > > > > I'm cc'ing Torsten as he has a bit more experience with proxies. > > > > EUtilities is set up to check for an env. proxy and also take a set > > proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy > > to say this was a bug in LWP, but I think the problem is that > > something is undefined ( i.e. an env. variable), or username/password. > > > > From the bug report, Torsten set his proxy variables using the > > following: > > > > -------------------------------------- > > "Note: I am behind an _authenticating_ proxy. > > My $http_proxy and $HTTP_PROXY are both set to > > http://USER:PASS at proxy.monash.edu.au:80/" > > -------------------------------------- > > > > Note the lowercase for $http_proxy, which can make a difference. > > After the recursion fix, I'm assuming he made no changes to the env. > > settings, and according to the bug everything was fine (is that > > correct Tortsen?). > > > > Also LWP::UserAgent has this: > > > > -------------------------------------- > > "Load proxy settings from *_proxy environment variables. You might > > specify proxies like this (sh-syntax): > > > > gopher_proxy=http://proxy.my.place/ > > wais_proxy= http://proxy.my.place/ > > no_proxy="localhost,my.domain" > > export gopher_proxy wais_proxy no_proxy > > > > csh or tcsh users should use the setenv command to define these > > environment variables. > > > > On systems with case insensitive environment variables there exists a > > name clash between the CGI environment variables and the HTTP_PROXY > > environment variable normally picked up by env_proxy(). Because of > > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY > > environment variable can be used instead." > > -------------------------------------- > > > > chris > > > > From cjfields at uiuc.edu Wed Dec 20 13:28:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 12:28:09 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: Message-ID: <000301c72464$9a12a070$15327e82@pyrimidine> > First, I got a $http_proxy env. variable automatically > defined by the BioPerl installation (I don't define and > export it in my .bash_profile). > So when I'm logging in, $http_proxy=http://ip_adress:port/ BioPerl can't permanently set any env. variables out of the box since that would mean modifying your local .bash_profile or the system profile. If you're a user on a system where you're not the sysadmin, then it's more likely the sysadmin has set up user accounts with an already-defined $http_proxy variable in the system .bash_profile (which is passed on to all users). The problem I can see (going by what you have above) is there is no username/password defined, only the address (IP:Port). I am assuming LWP is expecting some form of authentication when a proxy is env. defined w/o username/password included. If so, you'll have to supply those yourself, either by redefining $http_proxy to include it in your local .bash_profile, export $http_proxy='http://USER:PASS at proxy.monash.edu.au:80/' by using $agent->proxy() for including all proxy information, or by using $agent->authentication() so that a proxy can authorize any outgoing/incoming requests. The first may be preferrable if you are able to do so since you wouldn't have to authenticate every agent. Note that this would also explain why you had an LWP problem with an undefined array ref: the LWP agent is likely expecting some form of authentication, probably in the form [username, password], if a proxy env. variable is found. > Next step : two solutions : > 1) defining an $no_proxy env.variable in my .bash_profile. > It can be set to 'whatever'. > > 2) If I do not define '$no_proxy'; to make it work, I must call the > no_proxy() method on each Bio::DB::EUtilities object I create > before I can call the get_response() method on it. > > (The bug is in the 'get_response' call) If you mean when the request is calling proxy_authorization_basic(), that's not a bug. If we didn't authorize then it likely wouldn't work for properly set up proxies (Torsten's worked). Note that it's supposed to be passing a username/password from $self->authentication(). The fact that you can set $no_proxy to anything suggests there is no proxy in place. > And finally without 1) or 2) it doesn't work. > > Tony We can't guarantee that defining no_proxy will always work on your system, either. It's possible a proxy was added systemwide but a firewall hasn't been put in place yet; once it goes up and all requests need to be authorized, then you'll run into problems again. Conversely, maybe this was defined at some point systemwide in the .bash_profile but wasn't removed. The only one who would know is the sysadmin. If you aren't the sysadmin, you can contact them to find out about how to properly set up your proxy, or whether it is even necessary (maybe they neglected to remove the proxy definition from the system .bash_profile). Who knows? chris From bix at sendu.me.uk Wed Dec 20 16:03:03 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 21:03:03 +0000 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine> References: <000301c72464$9a12a070$15327e82@pyrimidine> Message-ID: <4589A507.60106@sendu.me.uk> Chris Fields wrote: >> First, I got a $http_proxy env. variable automatically >> defined by the BioPerl installation (I don't define and >> export it in my .bash_profile). >> So when I'm logging in, $http_proxy=http://ip_adress:port/ > > BioPerl can't permanently set any env. variables out of the box since True, and it doesn't try to set one temporarily either. To clarify some of the other points Chris made, the proxy variable certainly doesn't need username and password to be defined (from LWPs point of view), since not all proxies authenticate. Of course accesses won't work if authentication is actually required and these aren't set. There's no reason that no_proxy should have to be set. It is used to say what domains shouldn't be proxied. Either this is a real LWP bug, or somehow EUtilities or one of its bases is doing something wrong. It should be investigated... It would be very informative if Anthony could log in when he hasn't done anything to his environment variables (and so where the original problem manifests) and give us the results of: perl -e 'while (($key, $val) = each %ENV) { print "$key => $val\n" }' From avilella at gmail.com Wed Dec 20 09:07:17 2006 From: avilella at gmail.com (Albert Vilella) Date: Wed, 20 Dec 2006 14:07:17 +0000 Subject: [Bioperl-l] Using Muscle parameter within bioperl In-Reply-To: References: Message-ID: <358f4d650612200607m4324b8f1r91d2d917cd4951bd@mail.gmail.com> Try something like: my @params =('verbose'=>0, 'quiet'=>1, 'log'=>'/tmp/inv.log'); my $factory = Bio::Tools::Run::Alignment::Muscle->new(@params); it works for me with muscle 3.6. The log only gives me a start, commandstring and end. I dunno if that is what muscle is supposed to spit out. Albert. On 12/19/06, Shrinivasrao P. Mane wrote: > Hi, > I need to run muscle using bioperl. This is how I do it in command line. > > muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet > > I used the following in perl script > > my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => > 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); > > The program runs and produces the result file but it doesn't create a > log file nor does it stop sending output to STDOUT (-quiet). > Could anybody help me with this? > Thanks > Mane > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Dec 20 17:46:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 16:46:35 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <4589A507.60106@sendu.me.uk> Message-ID: <000c01c72488$b6a690b0$15327e82@pyrimidine> > Chris Fields wrote: > >> First, I got a $http_proxy env. variable automatically > defined by the > >> BioPerl installation (I don't define and export it in my > >> .bash_profile). > >> So when I'm logging in, > $http_proxy=http://ip_adress:port/ > > > > BioPerl can't permanently set any env. variables out of the > box since > > True, and it doesn't try to set one temporarily either. > > To clarify some of the other points Chris made, the proxy > variable certainly doesn't need username and password to be > defined (from LWPs point of view), since not all proxies > authenticate. Of course accesses won't work if authentication > is actually required and these aren't set. > > There's no reason that no_proxy should have to be set. It is > used to say what domains shouldn't be proxied. Either this is > a real LWP bug, or somehow EUtilities or one of its bases is > doing something wrong. It should be investigated... Actually, after some investigation I repeated the error and committed a fix. If I set (on WinXP) HTTP_PROXY to a dummy variable I get the same error: Can't use an undefined value as an ARRAY reference at C:/Perl/lib/LWP/UserAgent.pm line 787. It's EUtilities-specific as other WebAgents that have proxy settings do not have the same problem, though I haven't checked any WebAgent-based classes. I think this may also partly be an LWP bug as setting env_proxy to TRUE/FALSE doesn't seem to have an effect, but instantiating with it (env_proxy => 1) in the constructor fixes the problem. Anthony, I have committed a fix to CVS to GenericWebDBI and EUtilities. Could you try it out? -chris From cjfields at uiuc.edu Wed Dec 20 18:19:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 17:19:59 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine> Message-ID: <000001c7248d$5e7df450$15327e82@pyrimidine> > > First, I got a $http_proxy env. variable automatically > defined by the > > BioPerl installation (I don't define and export it in my > > .bash_profile). > > So when I'm logging in, > $http_proxy=http://ip_adress:port/ Anthony, Sorry about the prior long-winded response. I managed to reproduce the error about five minutes after I responded and managed to trace the problem back to GenericWebDBI. The issue seems to be with the LWP::UserAgent env_proxy method not setting correctly post-instantiation; setting to 0 or 1 doesn't seem to do anything. If I add it to the list of args for chained instantiation in the constructor: my $self = $class->SUPER::new(@args, env_proxy => 1); it suddenly works like a charm. Hard to know why it's being fussy... I'm going to try reproducing this on a few platforms and check to see if it has been reported as an LWP bug. I have also committed a fix to CVS if you want to test it out. Chris From jnewcomer at jhu.edu Wed Dec 20 20:56:10 2006 From: jnewcomer at jhu.edu (Joe Newcomer) Date: Wed, 20 Dec 2006 20:56:10 -0500 Subject: [Bioperl-l] a stupid question Message-ID: <002101c724a3$2ff80100$bd59dc80@aap.jhu.edu> Hello Paul Leo, I am with Johns Hopkins University Advanced Academic Programs. I am trying to contact a student named Paul Leo who has registered for Protein Bioinformatics. If this is you please email me. I would like to send you information about the spring course. Respectfully, Joe Newcomer (410) 516-5047 Online Education From anhthu.tieu at gsf.de Thu Dec 21 05:10:47 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 11:10:47 +0100 Subject: [Bioperl-l] imagemaps with heterogeneous_segments Message-ID: <458A5DA7.1010802@gsf.de> Hi, I use bioperl 1.5.2 and have been wondering whether it is possible to apply the image_and_map function with the glyph option "heterogenous_segments". Up to now I can successfully create an underlying imagemap for the entire track. However, what I want is to create an imagemap for each single segment on my track/glyph. Does anyone know who to realise this? Any help is appreciated. Thanks a lot. Anh Thu From anhthu.tieu at gsf.de Thu Dec 21 05:12:36 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 11:12:36 +0100 Subject: [Bioperl-l] imagemaps with heterogeneous_segments Message-ID: <458A5E14.8060409@gsf.de> Hi, I use bioperl 1.5.2 and have been wondering whether it is possible to apply the image_and_map function with the glyph option "heterogenous_segments". Up to now I can successfully create an underlying imagemap for the entire track. However, what I want is to create an imagemap for each single segment on my track/glyph. Does anyone know who to realise this? Any help is appreciated. Thanks a lot. Anh Thu From somil.sharma1 at gmail.com Thu Dec 21 01:22:24 2006 From: somil.sharma1 at gmail.com (Somil Sharma) Date: Thu, 21 Dec 2006 14:22:24 +0800 Subject: [Bioperl-l] problem Message-ID: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> hello *i run this program* *#!/use/bin/perl* *use Bio::DB::GenBank;* *$gb = new Bio::DB::GenBank; $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); print $seq1; * *and got this error on cmd line--* ---------- *EXCEPTION ------------- MSG: WebDBSeqI Request Error: 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error) Content-Type: text/plain Client-Date: Thu, 21 Dec 2006 06:28:33 GMT Client-Warning: Internal response* *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)* *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685 STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491 STACK Bio::DB::WebDBSeqI::get_Stream_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27 STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145 STACK toplevel C:\Perl\a2.pl:5* plz see if u can help me out. my ppm is also not able to install Bioperl so i did that also manually. waiting for ur reply From granjeau at tagc.univ-mrs.fr Thu Dec 21 06:14:25 2006 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Thu, 21 Dec 2006 12:14:25 +0100 Subject: [Bioperl-l] BioFetch: Adding databases Message-ID: <458A6C91.7090000@tagc.univ-mrs.fr> Hello! I needed to query the Unisave database at EBI. Up to date, the only way to access it is the dbfetch web service (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet defined in the BioFetch package (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote these few lines to make it work, but I don't think it fits a good programming practice. May be it makes sense to defined a method to add databases to FORMATMAP, in order to follow the dbfetch service evolutions. Cheers, --Samuel use Bio::DB::BioFetch; $Bio::DB::BioFetch::FORMATMAP{unisave} = { default => 'swiss', swissprot => 'swiss', fasta => 'fasta', namespace => 'unisave', }; my $bf = new Bio::DB::BioFetch(-db=>'unisave'); my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); print $seq->display_id(); print $seq->seq(); From cain at cshl.edu Thu Dec 21 08:56:21 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 21 Dec 2006 08:56:21 -0500 Subject: [Bioperl-l] problem In-Reply-To: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> References: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> Message-ID: <1166709381.3739.47.camel@localhost.localdomain> Hello, It looks to me like you have a networking problem that doesn't have anything to do with BioPerl. When I run your script, I get: Bio::Seq::RichSeq=HASH(0x97013e0) Fairly quickly, too. Can you get to http://eutils.ncbi.nlm.nih.gov/ in a browser without proxy settings? As an aside, you probably don't really want the HASH stuff above, so I modified your script to look like this, with warnings and strict to make future debugging easier: #!/use/bin/perl -w use strict; use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank; my $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); print $seq1->seq; Scott On Thu, 2006-12-21 at 14:22 +0800, Somil Sharma wrote: > hello > > *i run this program* > > *#!/use/bin/perl* > > *use Bio::DB::GenBank;* > > *$gb = new Bio::DB::GenBank; > $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); > print $seq1; > * > > *and got this error on cmd line--* > > ---------- *EXCEPTION ------------- > MSG: WebDBSeqI Request Error: > 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error) > Content-Type: text/plain > Client-Date: Thu, 21 Dec 2006 06:28:33 GMT > Client-Warning: Internal response* > > *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)* > > *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685 > STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491 > STACK Bio::DB::WebDBSeqI::get_Stream_by_id > C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27 > STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145 > STACK toplevel C:\Perl\a2.pl:5* > > plz see if u can help me out. > > my ppm is also not able to install Bioperl so i did that also manually. > > waiting for ur reply > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Thu Dec 21 09:28:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 21 Dec 2006 08:28:07 -0600 Subject: [Bioperl-l] BioFetch: Adding databases In-Reply-To: <458A6C91.7090000@tagc.univ-mrs.fr> References: <458A6C91.7090000@tagc.univ-mrs.fr> Message-ID: <193C6D1C-6374-4A86-9FBD-7FA994D5FDDF@uiuc.edu> I've added this to the BioFetch FORMATMAP as 'unisave' and committed to CVS. Thanks! chris On Dec 21, 2006, at 5:14 AM, Samuel GRANJEAUD - IR/IFR137 wrote: > Hello! > > I needed to query the Unisave database at EBI. Up to date, the only > way > to access it is the dbfetch web service > (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet > defined > in the BioFetch package > (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote > these few lines to make it work, but I don't think it fits a good > programming practice. May be it makes sense to defined a method to add > databases to FORMATMAP, in order to follow the dbfetch service > evolutions. > > Cheers, > --Samuel > > use Bio::DB::BioFetch; > $Bio::DB::BioFetch::FORMATMAP{unisave} = { > default => 'swiss', > swissprot => 'swiss', > fasta => 'fasta', > namespace => 'unisave', > }; > my $bf = new Bio::DB::BioFetch(-db=>'unisave'); > my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); > > print $seq->display_id(); > print $seq->seq(); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From anhthu.tieu at gsf.de Thu Dec 21 09:31:45 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 15:31:45 +0100 Subject: [Bioperl-l] multiple glyph elements in one track Message-ID: <458A9AD1.50907@gsf.de> Hello, I use bioperl 1.5.2. Does anyone know how I could create two seperate glyph elements on the same track with the Bio::Graphics::Panel module? My aim is to have two (e.g. two different) clickable imagemap elements on the same track. Until now I can merely create two glyph elements (transcript2 or generic options) per track with only one imagemap element (e.g. the same imagemap element is used for the entire track as the entire (=both elements) glyph's coordinates are returned to the image_and_map function as one set of coordinate). Thank you for your help. Best regards, Anh Thu From cain at cshl.edu Thu Dec 21 09:47:32 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 21 Dec 2006 09:47:32 -0500 Subject: [Bioperl-l] multiple glyph elements in one track In-Reply-To: <458A9AD1.50907@gsf.de> References: <458A9AD1.50907@gsf.de> Message-ID: <1166712453.3739.53.camel@localhost.localdomain> Hello Anh Thu, You can provide a callback for the glyph argument that returns different glyphs depending on what you want to do (ie, how you've coded your callback). This example is from the perldoc for Bio::Graphics::Panel: $panel->add_track(\@exons, -glyph => sub { my $feature = shift; $feature->source_tag eq ?curated? ? ?ellipse? : ?generic?; } ); Scott On Thu, 2006-12-21 at 15:31 +0100, Anh-Thu Tieu wrote: > Hello, > > I use bioperl 1.5.2. Does anyone know how I could create two seperate > glyph elements on the same track with the Bio::Graphics::Panel module? > My aim is to have two (e.g. two different) clickable imagemap elements > on the same track. Until now I can merely create two glyph elements > (transcript2 or generic options) per track with only one imagemap > element (e.g. the same imagemap element is used for the entire track as > the entire (=both elements) glyph's coordinates are returned to the > image_and_map function as one set of coordinate). > > Thank you for your help. > > Best regards, > > Anh Thu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cain.cshl at gmail.com Thu Dec 21 15:03:48 2006 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 21 Dec 2006 15:03:48 -0500 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> <1166604008.4588f6e87cccc@www.studentmail.otago.ac.nz> <1166621113.3739.11.camel@localhost.localdomain> <1166642653.45898dddbd8cf@www.studentmail.otago.ac.nz> <1166643051.3739.28.camel@localhost.localdomain> <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz> Message-ID: <1166731428.3739.71.camel@localhost.localdomain> Hi Stephan, About your bioperl mail: did you cancel it, or did it just disappear? If the latter, I might have accidentally deleted it, sorry :-/ So 'GBrowse is running' means that you can see the sample yeast chr1 database, browse around, etc, right? I still don't know what is up with the warning but my guess is that everything is working there. As for your question about writing a callback, the reason it's not working is that the attributes method returns a list (typically but not always with only one element), so what you are really doing in your test is this "number of elements in the list > 1200", which is almost always going to be false. You should change it to this: my $feature = shift; my ($score) = $feature->attributes('score'); if ($score > 1200) { ...etc... Finally, if you really want to test that you are using the correct bioperl, you can put this simple cgi in your cgi-bin directory as test_biographics.pl, set it as world executable and go to http://localhost/cgi-bin/test_biographics.pl (and, yes, I use strict and warnings even when the script is only 10 lines long :-) : #!/usr/bin/perl use strict; use warnings; use Bio::Graphics::Panel; use CGI qw/:standard/; print header(), start_html, p("Bio::Graphics::Panel api_version is ".Bio::Graphics::Panel->api_version), p("It should be 1.654 for BioPerl 1.5.2"), end_html; Scott On Fri, 2006-12-22 at 08:27 +1300, Stephan Roessner wrote: > Hi Scott, > > responded to group but did get through. > So I reply back to you. > > I installed Class-Base-0.03 using CPAN. > > Reinstalling GBrowse gives me still a warning like: > Warning: prerequisite Bio::Perl 1.52 not found. We have 1.0050021. > Writing Makefile for Bio::Graphocs::Browser::CAlign > Writing Makefile for Generic-Genome-Browser. > > GBrowse is running but I cannot access attributes and/or the score column > of .gff files. Is this related to the warning? > > To get an attribute I use > > my $feature = shift; > if ($feature->attributes('score') > 1200) { > return 'blue'; > } else { > return 'pink'; > } > But I retrieve not data using $feature-> > > Can I somehaow verify what bioperl version GBrowse is using? > > Stephan, > > > > Quoting Scott Cain : > > > Stephan, > > > > Yes, it is in cpan: > > > > http://search.cpan.org/~abw/Class-Base-0.03/lib/Class/Base.pm > > > > The cpan shell should be able to install it. > > > > Whether or not that works, please respond to the mailing list so that > > the rest of the conversation can be archived. > > > > Scott > > > > > > On Thu, 2006-12-21 at 08:24 +1300, Stephan Roessner wrote: > > > Hi Scott, > > > > > > No I didn't. > > > I had a look and couldn't find it. > > > It is not part of CPAN? > > > > > > Stephan > > > > > > > > > Quoting Scott Cain : > > > > > > > Stephan, > > > > > > > > Did you install Class::Base? It was inadvertantly left out the > > > > install > > > > document, but is required. > > > > > > > > Scott > > > > > > > > > > > > On Wed, 2006-12-20 at 21:40 +1300, Stephan Roessner wrote: > > > > > Hi all, > > > > > > > > > > I did sudo ./Build install --uninst 1 and got the error > > > > > * ERROR: Confiduration was initially created with MOdule::Build > > > > version > > > > > '0.2805', but we are now using '0.2806'. ... > > > > > > > > > > So I ran perl Build.PL and got the message > > > > > Creating new 'Buid' script for 'bioperl' verion '1.0050021'. > > > > > > > > > > I did run sudo ./Build install --uninst 1 again. > > > > > Seems to be fine with no error messages. > > > > > > > > > > When I run perl Makefile.PL for GBrowse 1.66-RC2 it results in > > > > > > > > > > Warning: prerequisite Bio::Perl 1.52 not found. We have > > 1.0050021. > > > > > Warning: prerequisite Class::Base 0 not found. > > > > > Writing Makefile for Bio::Graphocs::Browser::CAlign > > > > > Writing Makefile for Generic-Genome-Browser > > > > > > > > > > GBrowse is running but I have really troubles with aggregators > > trying > > > > to > > > > > use xyplot. It does not plot anything. So I thought the bioperl > > could > > > > be > > > > > the problem. > > > > > > > > > > Stephan > > > > > > > > > > > > > > > > > > > > Quoting Scott Cain : > > > > > > > > > > > I really don't think the BioPerl version detection is wrong. > > I > > > > > > actually > > > > > > don't check Bio::Root::Version::VERSION in Makefile.PL, I > > check > > > > > > Bio::Graphics::Panel->api_version. When it doesn't find the > > > > correct > > > > > > api_version, it gives a warning the BioPerl 1.5.2 is not > > installed. > > > > I > > > > > > have seen this happen when more than one BioPerl instance is > > > > installed > > > > > > and `perl Makefile.PL` finds the wrong one first. My > > suggestion is > > > > to > > > > > > try reinstalling BioPerl and providing the --uninst 1 argument > > to > > > > > > remove > > > > > > older versions of BioPerl: > > > > > > > > > > > > sudo ./Build install --uninst 1 > > > > > > > > > > > > Scott > > > > > > > > > > > > > > > > > > On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote: > > > > > > > Stephan Roessner wrote: > > > > > > > > Dear support team, > > > > > > > > > > > > > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be > > able > > > > to > > > > > > use > > > > > > > > gbrowse. > > > > > > > > The installation seems to work (except of the test > > failures) > > > > but > > > > > > the > > > > > > > > gbrowse installation tells me that BIO::pERL 1.0050021 is > > > > > > installed, but > > > > > > > > of course it requires 1.52. > > > > > > > > > > > > > > > > Is there a chance to find out what went wrong? > > > > > > > > > > > > > > Nothing went wrong with the Bioperl installation (well, > > expect > > > > there > > > > > > > shouldn't have been any test failures - can you post those > > > > please?). > > > > > > > gbrowse simply defined its Bioperl requirement incorrectly. > > If > > > > you > > > > > > tell > > > > > > > me exactly where you downloaded gbrowse from and how you > > went > > > > about > > > > > > > installing it, and provide the exact, complete error message > > you > > > > got > > > > > > > from it, I might be able help the authors fix the problem. > > > > > > > > > > > > > > Or I'm pretty sure they can figure it our for themselves :) > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > Scott Cain, Ph. D. > > > > > > cain at cshl.edu > > > > > > GMOD Coordinator (http://www.gmod.org/) > > > > > > 216-392-3087 > > > > > > Cold Spring Harbor Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > ------------------------------------------------------------------------ > > > > Scott Cain, Ph. D. > > > > cain.cshl at gmail.com > > > > GMOD Coordinator (http://www.gmod.org/) > > > > 216-392-3087 > > > > Cold Spring Harbor Laboratory > > > > > > > > > > > > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. > > cain.cshl at gmail.com > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rvosa at sfu.ca Sat Dec 23 17:17:37 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Sat, 23 Dec 2006 14:17:37 -0800 Subject: [Bioperl-l] [Summary] Re: suggestions for suitable 'taxon' object In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> Message-ID: <458DAB01.6080200@sfu.ca> The replies I've received so far indicate I should look into Bio::Taxon. I will probably come back with further questions/discussions as to how to link and cross reference taxa, sequences and nodes, but for now I should first look at the Bio::Taxon api (and unpack my other Christmas gifts). Thank you for all comments and suggestions. Happy holidays! Rutger Rutger Vos wrote: > Hi all, > > I am looking for a bioperl object that can be abused to function as a > suitable 'taxon' object, where I mean 'taxon' as understood by the NEXUS > file format (i.e. not strictly an entity from a taxonomy, but more loosely > an OTU). > > The object would primarily function as a way to relate nodes in trees to > sequences in an alignment (a foreign key that both nodes and sequences refer > to), and secondarily as the keeper of the canonical name of the OTU, such > that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo > sapiens (constrained monophyly)' can still be understood to refer to the > same thing - the OTU 'Homo sapiens sapiens' (for example). > > I was thinking that a (possibly expanded) Bio::Species might work if there > was some sensible way of appending references to node and sequence objects > to it (or otherwise associate them with each other), but I am writing *to > solicit any and all suggestions*. I am looking for something similar to > Bio::Phylo::Taxa::Taxon. > > Any and all comments and suggestions greatly appreciated! > > Best wishes, > > Rutger Vos > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger A. Vos Postdoctoral research fellow University of British Columbia Personal site: http://www.sfu.ca/~rvosa CIPRES: http://www.phylo.org Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From paul.boutros at utoronto.ca Sat Dec 23 22:36:59 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Sat, 23 Dec 2006 22:36:59 -0500 Subject: [Bioperl-l] Bio::Graphics::Glyph::dna Message-ID: <20061223223659.7sgfofa44mw4okks@webmail.utoronto.ca> Hi, I've been trying to get the dna glyph working and have had some problems. I'm using a fasta file, and am having some problems. This is ActiveState perl 5.8.8 (build 819) and BioPerl 1.5.2 on WinXP. I'm starting with a FASTA file, so I've tried: $panel->add_track( $seq, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); where $seq is a Bio::Seq object and I've tried it using a GFF $segment: my $db = Bio::DB::GFF->new( -adaptor=> 'berkeleydb', -create => 1, -dsn => 'temp' ); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary)_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); From paul.boutros at utoronto.ca Sat Dec 23 22:46:27 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Sat, 23 Dec 2006 22:46:27 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? Message-ID: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> Hello, I'm trying to get the dna glyph of Bio::Graphics to work and am having some problems. I'm starting with a fasta file, and I am running perl 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 If I try simply using a Bio::Seq object like this: $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: Can't locate object method "start" via package "Bio::Seq" at C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. And if I try creating a Bio::DB::GFFSegment object like this: my $db = Bio::DB::GFF->new( -adaptor => 'berkeleydb', -create => 1, -dsn => '/usr/local/share/gff/dmel' ); $db->initialize(1); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not implemented b y package Bio::DB::GFF::Segment. This is not your fault - author of Bio::DB::GFF::Segment should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::RootI::throw_not_implemented C:/Perl/site/lib/Bio/Root/RootI.pm:522 STACK: Bio::FeatureHolderI::get_SeqFeatures C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 STACK: Bio::Graphics::Glyph::_subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 STACK: Bio::Graphics::Glyph::subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Panel::_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 STACK: Bio::Graphics::Panel::_do_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 STACK: Bio::Graphics::Panel::add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 STACK: create_figure.pl:147 ---------------------------------------------------------------- I'm really unsure what to try next, any suggestions much appreciated! Paul From lstein at cshl.edu Sun Dec 24 12:23:18 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Sun, 24 Dec 2006 12:23:18 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? In-Reply-To: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> References: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> Message-ID: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com> Hi, You need to use either a Bio::SeqFeature::Generic object (with an attached Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to create Bio::DB::GFF::Segment objects directly. e.g. my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000); my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800); $feature->attach_seq($dna); Best, Lincoln On 12/23/06, Paul Boutros wrote: > > Hello, > > I'm trying to get the dna glyph of Bio::Graphics to work and am having > some problems. I'm starting with a fasta file, and I am running perl > 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 > > If I try simply using a Bio::Seq object like this: > $panel->add_track( > $segment, > -glyph => 'dna', > -do_gc => 'true', > -gc_window => 'auto' > ); > > I get the error: > Can't locate object method "start" via package "Bio::Seq" at > C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. > > > And if I try creating a Bio::DB::GFFSegment object like this: > my $db = Bio::DB::GFF->new( > -adaptor => 'berkeleydb', > -create => 1, > -dsn => '/usr/local/share/gff/dmel' > ); > > $db->initialize(1); > > $db->load_sequence_string( > $seq->primary_id(), > $seq->seq() > ); > > my $segment = Bio::DB::GFF::Segment->new( > $db, > $seq->primary_id(), > $seq->primary_id(), > 1, > $seq->length() > ); > > $panel->add_track( > $segment, > -glyph => 'dna', > -do_gc => 'true', > -gc_window => 'auto' > ); > > I get the error: > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not > implemented b > y package Bio::DB::GFF::Segment. > This is not your fault - author of Bio::DB::GFF::Segment should be blamed! > > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::RootI::throw_not_implemented > C:/Perl/site/lib/Bio/Root/RootI.pm:522 > STACK: Bio::FeatureHolderI::get_SeqFeatures > C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 > STACK: Bio::Graphics::Glyph::_subfeat > C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 > STACK: Bio::Graphics::Glyph::subfeat > C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 > STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 > STACK: Bio::Graphics::Glyph::Factory::make_glyph > C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 > STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 > STACK: Bio::Graphics::Glyph::Factory::make_glyph > C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 > STACK: Bio::Graphics::Panel::_add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 > STACK: Bio::Graphics::Panel::_do_add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 > STACK: Bio::Graphics::Panel::add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 > STACK: create_figure.pl:147 > ---------------------------------------------------------------- > > I'm really unsure what to try next, any suggestions much appreciated! > Paul > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From tgenahmet at gmail.com Wed Dec 27 16:38:43 2006 From: tgenahmet at gmail.com (Ahmet Kurdoglu) Date: Wed, 27 Dec 2006 14:38:43 -0700 Subject: [Bioperl-l] get mRNA details for a gene Message-ID: <9d8d0e2a0612271338t7cb15a63v5a08f624888b3f7b@mail.gmail.com> Hi, This is my first message to the list. I hope I get it right. Here is what I'm trying to accomplish: Get the mRNA details for a given gene (ex. DNASE2B) from its GenBank file. Using the web-interface I can search with this query: DNASE2B [sym] AND homo sapiens [ORGN] (returns only one result if you search 'gene' database) and get the GenBank file by clicking on NC_000001.9 and I can see the details for its two mRNAs. (I eventually need to get exon locations for both of its transcripts) However trying to do this in Perl has proved to be very difficult for me. I've tried various methods, including get_Seq_by_id, get_Seq_by_gi, and get_Stream_by_query. Before I explain in detail what I did I'd like to hear your ideas on how to accomplish this. Thank you. From sdavis2 at mail.nih.gov Thu Dec 28 16:57:03 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 28 Dec 2006 16:57:03 -0500 Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers In-Reply-To: References: Message-ID: <45943DAF.70100@mail.nih.gov> Michael Muratet US-Huntsville wrote: > Sean > > Thanks. I did consider the bioconductor package and downloaded your > write-up after it was recommended by the GEO folks. I've looked at R a > few times, but I never got proficient at it. I'm a lot better with perl. > > I've been looking at MINiML, too. It looked like it might be easier to > parse the SOFT file since the data is in-line with the attributes and > I'd have to use a SAX parser (not enough memory for DOM) for MINiML. > > NCBI must have parsers to get the data into their databases. Do you know > what they use? > Michael, You might want to look more specifically at the MINiML format specs. The data tables are stored as separate tab-delimited files with an external reference in the XML, so DOM parsing is possible with just a few kB of memory. Of course, to read in all of the data into memory at once will take a large amount of memory for some datasets. If you are going to load into a database, I would suggest reading the MINiML using DOM and then stepping through the data files one at a time, loading as you go. As for their parsers, I'm not sure what language they use, but writing a parser for either SOFT or MINiML isn't at all difficult. GEO uses a very simplified MAGE schema. As for R vs. perl, if you are planning on doing analyses of microarray data, I would highly suggest looking again at the R/bioconductor project. It will save you reinventing many wheels, such as getting annotation like gene ontology and pathways, doing stats, plotting, maintaining MIAME-compliant data structures, converting from multiple microarray formats, etc. Sean From allenday at ucla.edu Thu Dec 28 18:21:07 2006 From: allenday at ucla.edu (Allen Day) Date: Thu, 28 Dec 2006 15:21:07 -0800 Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers In-Reply-To: <45943DAF.70100@mail.nih.gov> References: <45943DAF.70100@mail.nih.gov> Message-ID: <5c24dcc30612281521o58b9f256sfa36c403f4c30bfa@mail.gmail.com> > As for R vs. perl, if you are planning on doing analyses of microarray > data, I would highly suggest looking again at the R/bioconductor > project. It will save you reinventing many wheels, such as getting > annotation like gene ontology and pathways, doing stats, plotting, > maintaining MIAME-compliant data structures, converting from multiple > microarray formats, etc. I'll second this statement WRT the data analysis. I'm doing all my analysis in R, Perl is just not good at dealing with large matrices or plotting. OTOH, I have also found that R is particularly weak when it comes to pipelining data and system interfacing. If your goal is to do ETL to a local database you're better off using Perl. I've found they're both about equally clunky for dealing with the experimental metadata, with a slight preference for Perl. That's more a reflection of the baroque MAGE model though than the programming languages themselves. -Allen > > Sean > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Paul.Boutros at utoronto.ca Sat Dec 30 02:43:32 2006 From: Paul.Boutros at utoronto.ca (Paul Boutros) Date: Sat, 30 Dec 2006 02:43:32 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? In-Reply-To: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com> Message-ID: <000c01c72be6$34d07e60$ec02a8c0@main> Hi Lincoln, Thanks, that worked like a charm! Can I suggest adding the example/explanation you gave me to the pod for Bio::Graphics::Glyph::dna? Here's a patch against the 1.5.2 version of dna.pm to do that. Paul 266c266,274 < in response to the dna() method. --- > in response to the dna() method. For example, you can use a > Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq > like this: > my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 ); > my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end => 800 ); > $feature->attach_seq($dna); > $panel->add_track( $feature, -glyph => 'dna' ); > > A Bio::Graphics::Feature object may also be used. _____ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Sunday, December 24, 2006 12:23 PM To: Paul.Boutros at utoronto.ca Cc: BioPerl Mailing List Subject: Re: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? Hi, You need to use either a Bio::SeqFeature::Generic object (with an attached Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to create Bio::DB::GFF::Segment objects directly. e.g. my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000); my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800); $feature->attach_seq($dna); Best, Lincoln On 12/23/06, Paul Boutros wrote: Hello, I'm trying to get the dna glyph of Bio::Graphics to work and am having some problems. I'm starting with a fasta file, and I am running perl 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 If I try simply using a Bio::Seq object like this: $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: Can't locate object method "start" via package "Bio::Seq" at C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. And if I try creating a Bio::DB::GFFSegment object like this: my $db = Bio::DB::GFF->new( -adaptor => 'berkeleydb', -create => 1, -dsn => '/usr/local/share/gff/dmel' ); $db->initialize(1); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not implemented b y package Bio::DB::GFF::Segment. This is not your fault - author of Bio::DB::GFF::Segment should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::RootI::throw_not_implemented C:/Perl/site/lib/Bio/Root/RootI.pm:522 STACK: Bio::FeatureHolderI::get_SeqFeatures C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 STACK: Bio::Graphics::Glyph::_subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 STACK: Bio::Graphics::Glyph::subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Panel::_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 STACK: Bio::Graphics::Panel::_do_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 STACK: Bio::Graphics::Panel::add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 STACK: create_figure.pl:147 ---------------------------------------------------------------- I'm really unsure what to try next, any suggestions much appreciated! Paul _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From er at xs4all.nl Sat Dec 30 19:05:16 2006 From: er at xs4all.nl (Erik) Date: Sun, 31 Dec 2006 01:05:16 +0100 (CET) Subject: [Bioperl-l] acquiring a local refseq + index Message-ID: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Hi all, I downloaded the refseq files (.gbff) and want to index the lot with Bio::DB::Flat. It turns out that there are many cases where the SOURCE and ORGANISM lines are messed up, sometimes to a degree where the indexing fails on a Bio::SeqIO::genbank error. I'd like to change Bio::SeqIO::genbank to let this parsing go at least so far as to make the indexing of the refseq files possible, and hopefully improving the taxonomic output ($seq->species->binomial is often mutilated at the moment). Is it still worthwhile to change parsing modules like Bio::SeqIO::genbank? Is anyone already working on a rewrite? Because if this is the case I may be better off writing my own indexing scheme? Below is (outline of) my indexing program, which uses Bio::DB::Flat::DBD. If anyone knows of a better way to get a locally searchable refseq flat file index, I would be very interested. Thanks for your help, Erikjan ------------- use Bio::DB::Flat; my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; my $db=Bio::DB::Flat->new( -directory => $refseq_dir, -dbname => 'refseq', -format => 'genbank', -index => 'bdb', -write_flag => 1, ); my @files = getfiles($refseq_dir); for my $f (@files) { db->build_index($f); } From hlapp at gmx.net Sat Dec 30 20:48:33 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 30 Dec 2006 20:48:33 -0500 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Message-ID: Can you send examples and the resulting error messages? Also, I'm assuming you running the 1.5.2 release of Bioperl; if not that's what I would try first. -hilmar On Dec 30, 2006, at 7:05 PM, Erik wrote: > Hi all, > > I downloaded the refseq files (.gbff) and want to index the lot with > Bio::DB::Flat. > > It turns out that there are many cases where the SOURCE and > ORGANISM lines > are messed up, sometimes to a degree where the indexing fails on a > Bio::SeqIO::genbank error. > > I'd like to change Bio::SeqIO::genbank to let this parsing go at > least so > far as to make the indexing of the refseq files possible, and > hopefully > improving the taxonomic output ($seq->species->binomial is often > mutilated > at the moment). > > Is it still worthwhile to change parsing modules like > Bio::SeqIO::genbank? > Is anyone already working on a rewrite? Because if this is the > case I may > be better off writing my own indexing scheme? > > Below is (outline of) my indexing program, which uses > Bio::DB::Flat::DBD. > If anyone knows of a better way to get a locally searchable refseq > flat > file index, I would be very interested. > > Thanks for your help, > > Erikjan > > > ------------- > use Bio::DB::Flat; > > my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; > my $db=Bio::DB::Flat->new( > -directory => $refseq_dir, > -dbname => 'refseq', > -format => 'genbank', > -index => 'bdb', > -write_flag => 1, > ); > my @files = getfiles($refseq_dir); > for my $f (@files) { > db->build_index($f); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat Dec 30 21:33:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 30 Dec 2006 20:33:23 -0600 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Message-ID: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> Agree with Hilmar, in that we need examples. If you are referring to your submitted bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2167 we could add this in as long as it passes (I'll try giving it a workout with my local bacterial seqs tonight or tomorrow). However, in the not-too-distant future your patch would likely be rendered obsolete, as any parsing in Bio::SeqIO modules pertaining to Bio::Species-related matters will be deprecated in favor of simple parsing (more foolproof, less uncertainty) and Bio::Taxon (which has optional db lookups using NCBI Taxonomy). Bio::Species and anything related to it are considered marked for deprecation. Fair warning... chris On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote: > Can you send examples and the resulting error messages? Also, I'm > assuming you running the 1.5.2 release of Bioperl; if not that's what > I would try first. > > -hilmar > > On Dec 30, 2006, at 7:05 PM, Erik wrote: > >> Hi all, >> >> I downloaded the refseq files (.gbff) and want to index the lot with >> Bio::DB::Flat. >> >> It turns out that there are many cases where the SOURCE and >> ORGANISM lines >> are messed up, sometimes to a degree where the indexing fails on a >> Bio::SeqIO::genbank error. >> >> I'd like to change Bio::SeqIO::genbank to let this parsing go at >> least so >> far as to make the indexing of the refseq files possible, and >> hopefully >> improving the taxonomic output ($seq->species->binomial is often >> mutilated >> at the moment). >> >> Is it still worthwhile to change parsing modules like >> Bio::SeqIO::genbank? >> Is anyone already working on a rewrite? Because if this is the >> case I may >> be better off writing my own indexing scheme? >> >> Below is (outline of) my indexing program, which uses >> Bio::DB::Flat::DBD. >> If anyone knows of a better way to get a locally searchable refseq >> flat >> file index, I would be very interested. >> >> Thanks for your help, >> >> Erikjan >> >> >> ------------- >> use Bio::DB::Flat; >> >> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; >> my $db=Bio::DB::Flat->new( >> -directory => $refseq_dir, >> -dbname => 'refseq', >> -format => 'genbank', >> -index => 'bdb', >> -write_flag => 1, >> ); >> my @files = getfiles($refseq_dir); >> for my $f (@files) { >> db->build_index($f); >> } >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Dec 31 14:36:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 31 Dec 2006 13:36:47 -0600 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> Message-ID: <37FB5BDF-25A9-44F0-9E82-964684A73A58@uiuc.edu> As a followup, I have committed the fix Erik had in Bugzilla. I don't know if this helps with the below issue Erik describes (they sound unrelated). chris On Dec 30, 2006, at 8:33 PM, Chris Fields wrote: > Agree with Hilmar, in that we need examples. If you are referring to > your submitted bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2167 > > we could add this in as long as it passes (I'll try giving it a > workout with my local bacterial seqs tonight or tomorrow). However, > in the not-too-distant future your patch would likely be rendered > obsolete, as any parsing in Bio::SeqIO modules pertaining to > Bio::Species-related matters will be deprecated in favor of simple > parsing (more foolproof, less uncertainty) and Bio::Taxon (which has > optional db lookups using NCBI Taxonomy). Bio::Species and anything > related to it are considered marked for deprecation. Fair warning... > > chris > > On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote: > >> Can you send examples and the resulting error messages? Also, I'm >> assuming you running the 1.5.2 release of Bioperl; if not that's what >> I would try first. >> >> -hilmar >> >> On Dec 30, 2006, at 7:05 PM, Erik wrote: >> >>> Hi all, >>> >>> I downloaded the refseq files (.gbff) and want to index the lot with >>> Bio::DB::Flat. >>> >>> It turns out that there are many cases where the SOURCE and >>> ORGANISM lines >>> are messed up, sometimes to a degree where the indexing fails on a >>> Bio::SeqIO::genbank error. >>> >>> I'd like to change Bio::SeqIO::genbank to let this parsing go at >>> least so >>> far as to make the indexing of the refseq files possible, and >>> hopefully >>> improving the taxonomic output ($seq->species->binomial is often >>> mutilated >>> at the moment). >>> >>> Is it still worthwhile to change parsing modules like >>> Bio::SeqIO::genbank? >>> Is anyone already working on a rewrite? Because if this is the >>> case I may >>> be better off writing my own indexing scheme? >>> >>> Below is (outline of) my indexing program, which uses >>> Bio::DB::Flat::DBD. >>> If anyone knows of a better way to get a locally searchable refseq >>> flat >>> file index, I would be very interested. >>> >>> Thanks for your help, >>> >>> Erikjan >>> >>> >>> ------------- >>> use Bio::DB::Flat; >>> >>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; >>> my $db=Bio::DB::Flat->new( >>> -directory => $refseq_dir, >>> -dbname => 'refseq', >>> -format => 'genbank', >>> -index => 'bdb', >>> -write_flag => 1, >>> ); >>> my @files = getfiles($refseq_dir); >>> for my $f (@files) { >>> db->build_index($f); >>> } >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arareko at campus.iztacala.unam.mx Fri Dec 1 00:56:02 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 30 Nov 2006 18:56:02 -0600 Subject: [Bioperl-l] [Root-l] Intermittent MySQL problems on BioPerl wiki In-Reply-To: References: <000201c714b3$6198e4e0$15327e82@pyrimidine> Message-ID: <456F7DA2.7000408@campus.iztacala.unam.mx> Chris & Chris, I've run the maintenance scripts for MediaWiki (just in case they weren't run in the upgrade to 1.8.2), restarted Apache (with no significant changes on website response), then rebooted the machine (seems like MySQL restart didn't do the trick) and apparently its behaving much better. Please check if the reported error still happens. Regards, Mauricio. Chris Dagdigian wrote: > Reports like this need to go to support at helpdesk.open-bio.org so that > they enter our RT helpdesk queue -- the main reason is that > sometimes emails to the root-l at open-bio.org administrators mailing > list can get lost in the shuffle. > > I am going to bounce this message into RT and will restart mysql on > the portal box. This is probably something we should be doing anyway > to free up memory -- the wikis in particular seem to be pretty hard > on mysql and free memory. > > -Chris > > On Nov 30, 2006, at 2:11 PM, Chris Fields wrote: > >> I'm seeing some MySQL errors on the Bioperl wiki (using Firefox 2 and >> WinXP): >> >> Database error >>> From BioPerl >> Jump to: navigation, search >> A database query syntax error has occurred. This may indicate a bug >> in the >> software. The last attempted database query was: >> >> (SQL query hidden) >> >> from within function "MediaWikiBagOStuff::_doquery". MySQL returned >> error >> "1205: Lock wait timeout exceeded; try restarting transaction >> (localhost)". >> >> >> This occurs intermittently when editting pages, logging in, etc. >> Also, >> pages loading to the browser seem much slower. >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> _______________________________________________ >> Root-l mailing list >> Root-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/root-l > > _______________________________________________ > Root-l mailing list > Root-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/root-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From n.haigh at sheffield.ac.uk Fri Dec 1 07:47:03 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 07:47:03 +0000 Subject: [Bioperl-l] Upgrading my BioPerl RC via ppm? In-Reply-To: <519167.29410.qm@web50804.mail.yahoo.com> References: <519167.29410.qm@web50804.mail.yahoo.com> Message-ID: <456FDDF7.1080403@sheffield.ac.uk> Caitlin wrote: > Hi all. > > I'm currently using BioPerl 1.5.2 RC2 but I've seen multiple references > to 1.5.2 RC5. Can anyone tell me how to upgrade to the latest version? > The ppm GUI (ActivePerl Build 819) doesn't include any BioPerl packages > among those deemed upgradable. > > Thanks, > > ~Katie > > > Hi Katie, Currently there is not an RC5 PPM package available - we are hoping to have the official 1.5.2 release out pretty soon and there will definitely be a PPM package for that! Are you experiencing any problems with your current version of bioperl? If not, there is no need to worry, once we've released an updated PPM package your PPM GUI should then be able to see it as an upgrade - hopefully! :o) Sendu, I know you were working on automatically generating PPM packages - what is the current situation with regards to this? Nath --- avast! Antivirus: Inbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 07:46:58 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 07:47:04 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 09:00:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:00:18 +0000 Subject: [Bioperl-l] BLASTing with a seqio/seq object... In-Reply-To: <456F27E9.70205@york.ac.uk> References: <01ba01c714a2$b9659c10$15327e82@pyrimidine> <456F27E9.70205@york.ac.uk> Message-ID: <456FEF22.4090004@sendu.me.uk> Samantha Thompson wrote: You missed a step... > use strict; > use Bio::Perl; > use Bio::Seq; > use Bio::SeqIO; > > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > > #seq bit > > #$seq_obj = Bio::Seq->new(-format => 'fasta'); > > my $seqio_obj = Bio::SeqIO->new(-file => > "/biol/people/mres/st537/MalEfasta.txt", -format => 'fasta'); > > my $seq_obj = $seqio_obj->next_seq; > > > > #blast bit > > my $remote_blast = Bio::Tools::Run::RemoteBlast->new ( > -prog => 'blastp', -db => 'nr', -expect => '1e-15' ); > > my $blast_report = $remote_blast->submit_blast($seq_obj); Go back to the Bptutorial: http://www.bioperl.org/wiki/Bptutorial.pl#Running_BLAST_.28using_RemoteBlast.pm.29 And you'll see that submit_blast doesn't return a SearchIO object. For a complete working example see the synopsis for RemoteBlast: http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/RemoteBlast.html > #new part for SearchIO... > > while( my $result = $blast_report->next_result ) { > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > if( $hsp->length('total') > 100 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Hit= ", $hit->name, > ",Length=", $hsp->length('total'), > ",Percent_id=", $hsp->percent_identity, "\n"; > } > } > } > } > } From bix at sendu.me.uk Fri Dec 1 09:03:13 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:03:13 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> Message-ID: <456FEFD1.4070704@sendu.me.uk> pelikan at cs.pitt.edu wrote: > Hello all, > > I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, > without Cygwin. The "make test"s have all completed without error. This > is my first time dealing with bioperl, so bear with me. > > I've successfully loaded the most recent taxonomy information using the > biosql-schema scripts. After this, I attempted to load the most recent > release of the uniprot flat file dataset with the following command: > > load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass > ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat > > I am subsequently greeted by many of the following errors: > > Could not store Q7N3Q6: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: The supplied lineage does not start near 'Photorhabdus luminescens > subsp. laumondii' In your uniprot_sprot.dat file there'll be some kind of entry with that Photorhabdus species. Can you post that entry (sans sequence if it has one) so I can take a look at it? Maybe post a few that cause problems, and a few that don't. From bix at sendu.me.uk Fri Dec 1 09:19:09 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:19:09 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <000301c714b4$7846e790$15327e82@pyrimidine> References: <000301c714b4$7846e790$15327e82@pyrimidine> Message-ID: <456FF38D.3070508@sendu.me.uk> Chris Fields wrote: >> Nathan S. Haigh wrote: >>> More updates: >>> >>> After the failed install I updating Module::Build, and re-ran the >>> install, I get: >>> >>> -- snip -- >>> Creating new 'Build' script for 'bioperl' version '1.005002005' >>> Warning: while trying to determine prerequisites for >>> S/SE/SENDU/bioperl-1.5.2_005-RCb.tar.gz wi th the help of >>> Module::Build the following error occurred: 'Failed to re-load >>> 'ModuleBuildBiope >>> rl': Can't locate ModuleBuildBioperl.pm in @INC (@INC contains: >>> _build\lib C:\Perl\site\lib C:\ >>> Perl\lib C:\Documents and Settings\test) at (eval 105) line 1. >>> ' >>> >>> Falling back to META.yml for prerequisites 'YAML' not installed, >>> cannot parse 'C:\Perl\cpan\build\bioperl-1.5.2_005-RC\META.yml' >>> -- snip -- >> I had that problem fleetingly and it drove me crazy because >> later I couldn't reproduce it. Is it reproducible on your end? > > During Module::Build installation I see this: > > ... > t\metadata........ok > 8/43 skipped: YAML_support feature is not enabled You were pointing out the YAML issue? I think I'm less concerned with that (solution: install YAML) and much more concerned with why it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The module in question is in the same dir as the Build script, so it should be found automatically. The only thing I can think of is that CPAN doesn't manage to chdir to the directory. Hopefully I'll be able to reproduce this and then I can investigate further. From n.haigh at sheffield.ac.uk Fri Dec 1 09:26:22 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 09:26:22 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF233.6040704@sendu.me.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> Message-ID: <456FF53E.90907@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: >> >> I know that setting up the PPM is a pain, but I have to say it is >> much faster, and all required PPMs are available. Which makes me >> curious: why bother with trying out a CPAN installation process at >> this point, especially when you have to use PPM to install some of >> the prereqs properly anyway? > > Firstly, problems discovered and resulting fixes will help all > platforms, not just Windows. So thanks for trying it out and reporting > back. Secondly, the PPM method, like Bundle::BioPerl, is > all-or-nothing. The CPAN installation method allows an interactive > choice of which optional things to install. > > If what you say about DB_File is true, then that's a great shame! > > > So I can do further trouble-shooting of my own, what is the sure-fire > way to completely clean-out an ActivePerl install, including any > modules you might have installed with PPMs or CPAN? > > In addition, using CPAN allows you to run the test suite easily without the need to download it separately and run it after a PPM install. I don't know of a way to clean out ActivePerl - I use VMWare Workstation and have a virtual machine with a fresh install of WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas? Nath --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 09:26:23 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 09:13:23 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 09:13:23 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> Message-ID: <456FF233.6040704@sendu.me.uk> Chris Fields wrote: > > I know that setting up the PPM is a pain, but I have to say it is much > faster, and all required PPMs are available. Which makes me curious: > why bother with trying out a CPAN installation process at this point, > especially when you have to use PPM to install some of the prereqs > properly anyway? Firstly, problems discovered and resulting fixes will help all platforms, not just Windows. So thanks for trying it out and reporting back. Secondly, the PPM method, like Bundle::BioPerl, is all-or-nothing. The CPAN installation method allows an interactive choice of which optional things to install. If what you say about DB_File is true, then that's a great shame! So I can do further trouble-shooting of my own, what is the sure-fire way to completely clean-out an ActivePerl install, including any modules you might have installed with PPMs or CPAN? From cjfields at uiuc.edu Fri Dec 1 14:08:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 08:08:55 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF233.6040704@sendu.me.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> Message-ID: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote: > Chris Fields wrote: >> I know that setting up the PPM is a pain, but I have to say it is >> much faster, and all required PPMs are available. Which makes me >> curious: why bother with trying out a CPAN installation process at >> this point, especially when you have to use PPM to install some of >> the prereqs properly anyway? > > Firstly, problems discovered and resulting fixes will help all > platforms, not just Windows. So thanks for trying it out and > reporting back. Secondly, the PPM method, like Bundle::BioPerl, is > all-or-nothing. The CPAN installation method allows an interactive > choice of which optional things to install. Yes, I understand that. My point is, you are generally forced to use PPM anyway due to several modules not installing properly (all the 'trouble' distributions, like DB_File, are available via PPM). I can see using CPAN as an alternative way of installing Bioperl for a distribution, or as the primary method via CVS or manually, but not for distributions. At least not until the kinks are worked out for Windows users. What are the significant issues for a bioperl PPM installation, based on the last PPM Nathan set up? If there is a redirection problem, could we just modify the installation docs to address that ('due to problem X, you must install the following modules prior to installing BioPerl 1.5.2...'). > If what you say about DB_File is true, then that's a great shame! We need to go through the various prereqs to see which ones need PPM vs CPAN. In general, anything that requires C code compilation (and thus needs a recent VC++) will likely be an issue. > So I can do further trouble-shooting of my own, what is the sure- > fire way to completely clean-out an ActivePerl install, including > any modules you might have installed with PPMs or CPAN? Not sure, beyond uninstalling and cleaning out the Perl directory (I think you might be able to delete the site/ directory, but I haven't tried it). ActivePerl comes preloaded with a number of non-core modules which makes it tricky to uninstall them one-by-one. chris From cjfields at uiuc.edu Fri Dec 1 14:10:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 08:10:34 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <456FF38D.3070508@sendu.me.uk> References: <000301c714b4$7846e790$15327e82@pyrimidine> <456FF38D.3070508@sendu.me.uk> Message-ID: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote: > You were pointing out the YAML issue? I think I'm less concerned > with that (solution: install YAML) and much more concerned with why > it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The > module in question is in the same dir as the Build script, so it > should be found automatically. > > The only thing I can think of is that CPAN doesn't manage to chdir > to the directory. Hopefully I'll be able to reproduce this and then > I can investigate further. My thought was the two were related in some way. I'm not sure to tell the truth. -chris From bix at sendu.me.uk Fri Dec 1 14:17:41 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 14:17:41 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu> Message-ID: <45703985.5050203@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> I know that setting up the PPM is a pain, but I have to say it is >>> much faster, and all required PPMs are available. Which makes me >>> curious: why bother with trying out a CPAN installation process at >>> this point, especially when you have to use PPM to install some of >>> the prereqs properly anyway? >> >> Firstly, problems discovered and resulting fixes will help all >> platforms, not just Windows. So thanks for trying it out and reporting >> back. Secondly, the PPM method, like Bundle::BioPerl, is >> all-or-nothing. The CPAN installation method allows an interactive >> choice of which optional things to install. > > Yes, I understand that. My point is, you are generally forced to use > PPM anyway due to several modules not installing properly (all the > 'trouble' distributions, like DB_File, are available via PPM). I can > see using CPAN as an alternative way of installing Bioperl for a > distribution, or as the primary method via CVS or manually, but not for > distributions. At least not until the kinks are worked out for Windows > users. CPAN isn't being suggested as the primary or preferred installation method for Windows. That will still be PPM. I'm mentioning CPAN / manual installation in the Windows INSTALL docs for the benefit of anyone who wants a simple install and test environment when checking out from CVS. > What are the significant issues for a bioperl PPM installation None that I'm aware of - I just need to find the time to start looking into generating an appropriate PPD. Hopefully Nathan's wiki page on the subject will be all I need. From bix at sendu.me.uk Fri Dec 1 14:18:43 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 14:18:43 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP ActivePerl5.8.8.819 In-Reply-To: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> References: <000301c714b4$7846e790$15327e82@pyrimidine> <456FF38D.3070508@sendu.me.uk> <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu> Message-ID: <457039C3.30907@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote: > >> You were pointing out the YAML issue? I think I'm less concerned with >> that (solution: install YAML) and much more concerned with why it >> can't reload ModuleBuildBioperl (claiming it isn't in @INC). The >> module in question is in the same dir as the Build script, so it >> should be found automatically. >> >> The only thing I can think of is that CPAN doesn't manage to chdir to >> the directory. Hopefully I'll be able to reproduce this and then I can >> investigate further. > > My thought was the two were related in some way. I'm not sure to tell > the truth. They weren't, using YAML is the fall-back position incase of earlier failure. I've fixed it now in any case. From gwu at molbio.mgh.harvard.edu Fri Dec 1 15:19:42 2006 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Fri, 01 Dec 2006 10:19:42 -0500 Subject: [Bioperl-l] One more load_seqdatabase.pl question In-Reply-To: <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net> References: <4a9ad8800611270907x64a4a4c0jad92bff6641e300@mail.gmail.com> <53C6D534-6E36-4061-B955-E74537839265@gmx.net> <456CA667.6010609@molbio.mgh.harvard.edu> <456F5648.6070207@molbio.mgh.harvard.edu> <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net> Message-ID: <4570480E.1020701@molbio.mgh.harvard.edu> Thanks Hilmar. I did include the -lookup switch on the command line. The warning messages say that the code failed to "INSERT" instead of "UPDATE", which sounds like a match was not found. But I was just loading the same Genbank file for the second time. To test if it actually updated the records, I made a minor modification on one of the COMMENT feature. Unfortunately it's not updated. By the way, the test genbank file has four "COMMENT" features but they are different. Any idea what's happening there? I wonder if it's a bad idea to "UPDATE" a sequence. Say I got a new sequence version with 5 features removed, 5 features modified and 5 features new. If only --lookup is included, according to the POD, the 5 new features will be inserted, the 5 modified features will be updated and the 5 removed features will be in the database untouched. This rendered the new sequence records a mixture of old and new versions. I did not see a reason anyone would like to have a sequence like this. Either include -remove to replace the old version if only one version is needed, or put the new version under a different name space if multiple versions are needed. Do I have the correct understanding of these issues? I deeply appreciate your help. Gang Hilmar Lapp wrote: > Right. You need to tell it to lookup sequences first if you know that > you are loading sequences which may be in the database already (see > the POD of load_seqdatabase.pl, switch --lookup; there are several > other command line options that control what will happen if a sequence > entry is already present in the database.). > > The messages in you report are warnings, not errors. It looks like > some of the comments are duplicated for a sequence, it doesn't look > like reason for concern. Is not so good if you get errors thrown. > > -hilmar > > On Nov 30, 2006, at 5:08 PM, gang wu wrote: > >> Thanks Hilmar. Do you mean the NVL() clause will make >> load_seqdatabase.pl not work when update? >> >> I have problem with updating. Seems load_seqdatabase.pl only tries to >> insert instead of update. I used one of the test genbank file coming >> whith bioperl-db. Please take a look at the attached output. >> >> Thanks. >> >> Gang >> >> ========================================= >> >perl load_seqdatabase.pl -lookup -host elegans -driver Oracle >> -dbname sparc -dbuser biosqldb-sgowner -dbpass PASS -format genbank >> -namespace test >> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb >> Loading >> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb >> ... >> >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, >> values were ("This sequence was reannotated via the Ensembl system. >> Please visit the Ensembl web site, http://www.ensembl.org/ for more >> information. ","1") FKs (389109) >> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated >> (DBD ERROR: OCIStmtExecute) >> --------------------------------------------------- >> >> -------------------- WARNING --------------------- >> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, >> values were ("The /gene indicates a unique id for a gene, /cds a >> unique id for a translation and a /exon a unique id for an exon. >> These ids are maintained wherever possible between versions. For more >> information on how to interpret the feature table, please visit >> http://www.ensembl.org/Docs/embl.html. ","2") FKs (389109) >> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated >> (DBD ERROR: OCIStmtExecute) >> --------------------------------------------------- >> ... >> ... >> ========================================================== >> Hilmar Lapp wrote: >>> These are the protein translations stored in the feature table as >>> tags of features, right? You can change the type of the column >>> (although there may be some issues when you update the column >>> because the NVL() clause won't work if I recall that correctly), but >>> doing so will deprive you of any 'normal' searches against that >>> column. (You can still use functions >from the DBMS_LOB package, but >>> they will be much slower and are completely non-standard.) It is up >>> to you whether that is too big of a price to pay for having some >>> redundant protein translations (translating the feature's DNA >>> sequence should give you the same) in the database. I always trimmed >>> those feature tags off (using a custom SeqProcessor). An alternative >>> is to convert these feature tags into actual bioentries (i.e., >>> Bio::Seq objects; again, a custom SeqProcessor will allow you to do >>> that). -hilmar On Nov 28, 2006, at 4:13 PM, gang wu wrote: >>>> Hi everyone, I'm using load_seqdatabase.pl to upload some Genbank >>>> genome sequences to my Oracle BioSQL database. I saw some >>>> errors(See attached warning message) related to >>>> seqfeature_qualifier_value (SG_SEQFEATURE_QUALIFIER_ASSOC.VALUE >>>> column), which has Varchar2 data type of maximum 4000 bytes. Did >>>> anybody mention this issue before? Should I just modify the column >>>> to a type being able store more data such as LONG or CLOB? Thanks. >>>> Gang Log information: ============================================ >>>> load_seqdatabase.pl -host elegans -driver Oracle -dbname sparc >>>> -dbuser biosqldb-sgowner -dbpass PASS -format genbank -namespace >>>> genbank /genomeseq/arabidopsis//NC_003070.gbk Loading >>>> /genomeseq/arabidopsis//NC_003070.gbk ... -------------------- >>>> WARNING --------------------- MSG: SimpleValueAdaptor::add_assoc: >>>> unexpected failure of statement execution: ORA-01461: can bind a >>>> LONG value only for insert into a LONG column (DBD ERROR: error >>>> possibly near <*> indicator at char 12 in 'INSERT INTO >>>> <*>seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) >>>> VALUES (:p1, :p2, :p3, :p4)') name: INSERT ASSOC [2] >>>> Bio::SeqFeature::Generic;Bio::Annotation::SimpleValue values: >>>> FK[Bio::SeqFeature::Generic]:14898, >>>> FK[Bio::Annotation::SimpleValue]:800, >>>> value:"MVAVTGEVLHLLRRYLGEYVHGLSTEALRISVWKGDVVLKDLKLKAEALNSLKLPVAVKSGFV >>>> GTITLKVPWKSLGKEPVIVLIDRVFVLAYPAPDDRTLKFFTLVGTEFAYTNYIPGGRQGKASRNQASADR >>>> GTSYFWLMELHGYEAETATLEARAKSKLGSPPQGNSWLGSIIATIIGNLKVSISNVHIRYEDSTRDSSEI >>>> LASFFSYFNNICSSNPGHPFAAGITLAKLAAVTMDEEGNETFDTSGALDKLRKSLQLERLALYHDSNSFP >>>> WEIEKQWDNITPEEWIEMFEDGIKEQTEHKIKSKWALNRHYLLSPINGSLKYHRLGNQERNNPEIPFERA >>>> SVILNDVNVTITEEQYHDWIKLVEVVSRYKTYIEISHLRPMVPVSEAPRLWWRFAAQASLQQKRLWYTRY >>>> IQLYANFLQQSSDVNYPEMREIEKDLDSKVILLWRLLAHAKVESVKSKEAAEQRKLKKGGWFSFNWRTEA >>>> EDDPEVDSVAGGSKLMEERLTKDEWKAINKLLSHQPDEEMNLYSGKDMQNMTHFLVTVSIGQGAARIVDI >>>> NQTEVLCGRFEQLDVTTKFRHRSTQCDVSLRFYGLSAPEGSLAQSVSSERKTNALMASFVNAPIGENIDW >>>> RLSATISPCHATIWTESYDRVLEFVKRSNAVSPTVALETAAVLQMKLEEVTRRAQEQLQIVLEEQSRFAL >>>> DIDIDAPKVRIPLRASGSSKCSSHFLLDFGNFTLTTMDTRSEEQRQNLYSRFCISGRDIAAFFTDCGSDN >>>> QGCSLVMEDFTNQPILSPILEKADNVYSLIDRCGMAVIVDQIKVPHPSYPSTRISIQVPNIGVHFSPTRY >>>> MRIMQLFDILYGAMKTYSQAPVDHMPDGIQPWSPTDLASDARILVWKGIGNSVATWQSCRLVLSGLYLYT >>>> FESEKSLDYQRYLCMAGRQVFEVPPANIGGSPYCLAVGVRGTDLKKALESSSTWIIEFQGEEKAAWLRGL >>>> VQATYQASA! >>>> PLSGDVLGQTSDGDGDFHEPQTRNMKAADLVITGALVETKLYLYGKIKNECDEQVEEVLLLKVLASGGKV >>>> HLISSESGLTVRTKLHSLKIKDELQQQQSGSAQYLAYSVLKNEDIQESLGTCDSFDKEMPVGHADDEDAY >>>> TDALPEFLSPTEPGTPDMDMIQCSMMMDSDEHVGLEDTEGGFHEKDTSQGKSLCDEVFYEVQGGEFSDFV >>>> SVVFLTRSSSSHDYNGIDTQMSIRMSKLEFFCSRPTVVALIGFGFDLSTASYIENDKDANTLVPEKSDSE >>>> KETNDESGRIEGLLGYGKDRVVFYLNMNVDNVTVFLNKEDGSQLAMFVQERFVLDIKVHPSSLSVEGTLG >>>> NFKLCDKSLDSGNCWSWLCDIRDPGVESLIKFKFSSYSAGDDDYEGYDYSLSGKLSAVRIVFLYRFVQEV >>>> TAYFMGLATPHSEEVIKLVDKVGGFEWLIQKDEMDGATAVKLDLSLDTPIIVVPRDSLSKDYIQLDLGQL >>>> EVSNEISWHGCPEKDATAVRVDVLHAKILGLNMSVGINGSIGKPMIREGQGLDIFVRRSLRDVFKKVPTL >>>> SVEVKIDFLHAVMSDKEYDIIVSCTSMNLFEEPKLPPDFRGSSSGPKAKMRLLADKVNLNSQMIMSRTVT >>>> ILAVDINYALLELRNSVNEESSLAHVAVRASEPNSSISWMTSLSETDLYVSVPKVSVLDIRPNTKPEMRL >>>> MLGSSVDASKQASSESLPFSLNKGSFKRANSRAVLDFDAPCSTMLLMDYRWRASSQSCVLRVQQPRILAV >>>> PDFLLAVGEFFVPALRAITGRDETLDPTNDPITRSRGIVLSEPLYKQTEDVVHLSPRRQLVADSLGIDEY >>>> TYDGCGKVISLSEQGEKDLNVGRLEPIIIVGHGKKLRFVNVKIKNGSLLSKCIYLSNDSSCLFSPEDGVD >>>> ISMLENASSNPENVLSNAHKSSDVSDTCQYDSKSGQSFTFEAQVVSPEFTFFDGTKSSLDDSSAVEKLLR >>>> VKLDFNFM! >>>> YASKEKDIWVRALLKNLVVETGSGLIILDPVDISGGYTSVKEKTNMSLTSTDIYMHLSLSALSLLLNLQS >>>> QVTGALQSGNAIPLASCTNFDRIWVSPKENGPRNNLTIWRPQAPSNYVILGDCVTSRAIPPTQAVMAVSN >>>> TYGRVRKPIGFNRIGLFSVIQGLEGDNVQHSHNSNECSLWMPVAPVGYTAMGCVANIGSEQPPDHIVYCL >>>> SIWRADNVLGAFYAHTSTAAPSKKYSPGLSHCLLWNPLQSKTSSSSDPSSTSGSRSEQSSDQTGNSSGWD >>>> ILRSISKATSYHVSTPNFERIWWDKGGDLRRPVSIWRPVPRPGFAILGDSITEGLEPPALGILFKADDSE >>>> IAAKPVQFNKVAHIVGKGFDEVFCWFPVAPPGYVSLGCVLSKFDEAPHVDSFCCPRIDLVNQANIYEASV >>>> TRSSSSKSSQLWSIWKVDNQACTFLARSDLKRPPSRMAFAVGESVKPKTQENVNAEIKLRCFSLTLLDGL >>>> HGMMTPLFDTTVTNIKLATHGRPEAMNAVLISSIAASTFNPQLEAWEPLLEPFDGIFKLETYDTALNQSS >>>> KPGKRLRIAATNILNINVSAANLETLGDAVVSWRRQLELEERAAKMKEESAASRESGDLSAFSALDEDDF >>>> QTIVVENKLGRDIYLKKLEENSDVVVKLCHDENTSVWVPPPRFSNRLNVADSSREARNYMTVQILEAKGL >>>> HIIDDGNSHSFFCTLRLVVDSQGAEPQKLFPQSARTKCVKPSTTIVNDLMECTSKWNELFIFEIPRKGVA >>>> RLEVEVTNLAAKAGKGEVVGSLSFPVGHGESTLRKVASVRMLHQSSDAENISSYTLQRKNAEDKHDNGCL >>>> LISTSYFEKTTIPNTLRNMESKDFVDGDTGFWIGVRPDDSWHSIRSLLPLCIAPKSLQNDFIAMEVSMRN >>>> GRKHATFRCLATVVNDSDVNLEISISSDQNVSSGVSNHNAVIASRSSYVLPWGCLSKDNEQCLHIRPKVE >>>> NSHHSYAWGYCIAVSSGCGKDQPFVDQGLLTRQNTIKQSSRASTFFLRLNQLEKKDMLFCCQPSTGSKPL >>>> WLSVGADAS! >>>> VLHTDLNTPVYDWKISISSPLKLENRLPCPVKFTVWEKTKEGTYLERQHGVVSSRKSAHVYSADIQRPVY >>>> LTLAVHGGWALEKDPIPVLDISSNDSVSSFWFVHQQSKRRLRVSIERDVGETGAAPKTIRFFVPYWITND >>>> SYLPLSYRVVEIEPSENVEAGSPCLTRASKSFKKNPVFSMERRHQKKNVRVLESIEDTSPMPSMLSPQES >>>> AGRSGVVLFPSQKDSYVSPRIGIAVAARDSDSYSPGISLLELEKKERIDVKAFCKDASYYMLSAVLNMTS >>>> DRTKVIHLQPHTLFINRVGVSICLQQCDCQTEEWINPSDPPKLFGWQSSTRLELLKLRVKGYRWSTPFSV >>>> FSEGTMRVPVPKEDGTDQLQLRVQVRSGTKNSRYEVIFRPNSISGPYRIENRSMFLPIRYRQVEGVSESW >>>> QFLPPNAAASFYWENLGRRHLFELLVDGNDPSNSEKFDIDKIGDYPPRSESGPTRPIRVTILKEDKKNIV >>>> RISDWMPAIEPTSSISRRLPASSLSELSGNESQQSHLLASEDSEFHVIVELAELGISVIDHAPEEILYMS >>>> VQNLFVAYSTGLGSGLSRFKLRMQGIQVDNQLPLAPMPVLFRPQRTGDKADYILKFSVTLQSNAGLDLRV >>>> YPYIDFQGRENTAFLINIHEPIIWRIHEMIQQANLSRLSDPNSTAVSVDPFIQIGVLNFSEVRFRVSMAM >>>> SPSQRPRGVLGFWSSLMTALGNTENMPVRISERFHENISMRQSTMINNAIRNVKKDLLGQPLQLLSGVDI >>>> LGNASSALGHMSQGIAALSMDKKFIQSRQRQENKGVEDFGDIIREGGGALAKGLFRGVTGILTKPLEGAK >>>> SSGVEGFVSGFGKGIIGAAAQPVSGVLDLLSKTTEGANAMRMKIAAAITSDEQLLRRRLPRAVGADSLLR >>>> PYNDYRAQGQVILQLAESGSFLGQVDLFKVRGKFALTDAYESHFILPKGKVLMITHRRVILLQQPSNIMG >>>> QRKFIPAK! >>>> DACSIQWDILWNDLVTMELSDGKKDPPNSPPSRLILYLKAKPHDPKEQFRVVKCIPNSKQAFDVYSAIDQ >>>> AINLYGQNALKGMVKNKVTRPYSPISESSWAEGASQQMPASVTPSSTFGTSPTTSSS", >>>> rank:"1" -------------------------------------------------- >>>> ============================================= >>>> _______________________________________________ Bioperl-l mailing >>>> list Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From bosborne11 at verizon.net Fri Dec 1 14:55:18 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 01 Dec 2006 09:55:18 -0500 Subject: [Bioperl-l] An announcement Message-ID: bioperl-l, I would like to call your attention to a job posting and in doing so I realize that I?m probably breaking a rule of this list. I apologize and and acknowledge that I?ve transgressed. The reason I do this is because this is an interesting job that is relevant to a lot of what we do in this mailing list, and some of you might want to consider it. The posting is here: http://www.nescent.org/main/employment.html#gmodhelpdesk I encourage you to pass this on to anyone who you think might be interested. Thanks again, Brian O. From cjfields at uiuc.edu Fri Dec 1 16:49:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 10:49:32 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install onWinXPActivePerl 5.8.8.819 In-Reply-To: <456FF53E.90907@sheffield.ac.uk> References: <002401c714c6$53f65080$15327e82@pyrimidine> <456F500A.7010707@sheffield.ac.uk> <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu> <456FF233.6040704@sendu.me.uk> <456FF53E.90907@sheffield.ac.uk> Message-ID: On Dec 1, 2006, at 3:26 AM, Nathan S. Haigh wrote: ... > In addition, using CPAN allows you to run the test suite easily > without the need to download it separately and run it after a PPM > install. A PPM, by design, is supposed to imply that the distribution passes tests for the specified platform, at that point in time, after all prereqs are installed and any additional postinstall operations (install C libraries, modify config files, etc) are complete. The ActiveState automated PPM building process dictates that; if it fails any test, it will not be made into a PPM. It's sort of a 'stamp of approval' that all tests pass, so you don't need to run them. However, a test may fail (and a PPM may not get generated) for pretty superficial reasons, such as the makefile not specifying that a module is needed, server issues, etc, so the automated process isn't fullproof. That's why Kobes and the other repositories are available, where the PPM/PPD is manually generated and made to work specifically for Windows (or whatever other platform). Saying that, it is completely up to the person packaging the distribution to follow those rules if one were to make a PPM manually. You don't even have to run tests prior to using 'nmake ppd'. We can currently state, though, that all tests pass when all prereqs are installed for this distribution. At least at this point in time! > I don't know of a way to clean out ActivePerl - I use VMWare > Workstation and have a virtual machine with a fresh install of > WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas? I haven't tried it that way. I have Parallels on Mac OS X (I run a SigmaPlot/Excel combo off it). My tests were using a native WinXP installation (i.e. not virtually) on my old Dell. It shouldn't make a difference; VMWare, Parallels, and the like should all run ActivePerl for WinXP since it's a virtual machine. Windows Vista, on the other hand... I think with PPM4 you can install to a custom directory. It may be possible to install all new modules to that directory, then you would at least have an idea of what was there (though I don't think you can delete it directly w/o screwing up the PPM database). chris From bix at sendu.me.uk Fri Dec 1 17:12:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 17:12:49 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> Message-ID: <45706291.80201@sendu.me.uk> pelikan at cs.pitt.edu wrote: > Hello all, > > I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, > without Cygwin. The "make test"s have all completed without error. This > is my first time dealing with bioperl, so bear with me. > > I've successfully loaded the most recent taxonomy information using the > biosql-schema scripts. After this, I attempted to load the most recent > release of the uniprot flat file dataset with the following command: > > load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass > ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat > > I am subsequently greeted by many of the following errors: > > Could not store Q7N3Q6: I extracted just Q7N3Q6 from ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz and was able to load it in using load_seqdatabase.pl under linux with no errors. If you make a file with just that sequence do you still get the error? Is anyone else able to reproduce the problem? From cjfields at uiuc.edu Fri Dec 1 17:57:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 11:57:18 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <45703985.5050203@sendu.me.uk> Message-ID: <006301c71572$24be8830$15327e82@pyrimidine> > Chris Fields wrote: > PPM). I can > > see using CPAN as an alternative way of installing Bioperl for a > > distribution, or as the primary method via CVS or manually, but not > > for distributions. At least not until the kinks are worked out for > > Windows users. > > CPAN isn't being suggested as the primary or preferred > installation method for Windows. That will still be PPM. I'm > mentioning CPAN / manual installation in the Windows INSTALL > docs for the benefit of anyone who wants a simple install and > test environment when checking out from CVS. That's fine by me. I think the focus is making sure the PPM works, but that shouldn't hold up the final 1.5.2 release. The PPM for previous releases was never released concurrently with the distribution (if at all); it generally followed by a few weeks to a few months past a final release. > > What are the significant issues for a bioperl PPM installation > > None that I'm aware of - I just need to find the time to > start looking into generating an appropriate PPD. Hopefully > Nathan's wiki page on the subject will be all I need. I'll try testing it out today and next week (the more people we have looking into the issue the better). I'm sure that Module::Build hasn't updated to using PPM4 XML formatting, but the tags are similar enough. I can always create a local PPM database using a similar directory structure to bioperl.org/DIST and test an installation from it. chris From n.haigh at sheffield.ac.uk Fri Dec 1 18:52:55 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 18:52:55 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine> References: <006301c71572$24be8830$15327e82@pyrimidine> Message-ID: <45707A07.7000106@sheffield.ac.uk> Chris Fields wrote: >> Chris Fields wrote: >> PPM). I can >> >>> see using CPAN as an alternative way of installing Bioperl for a >>> distribution, or as the primary method via CVS or manually, but not >>> for distributions. At least not until the kinks are worked out for >>> Windows users. >>> >> CPAN isn't being suggested as the primary or preferred >> installation method for Windows. That will still be PPM. I'm >> mentioning CPAN / manual installation in the Windows INSTALL >> docs for the benefit of anyone who wants a simple install and >> test environment when checking out from CVS. >> > > That's fine by me. I think the focus is making sure the PPM works, but that > shouldn't hold up the final 1.5.2 release. The PPM for previous releases > was never released concurrently with the distribution (if at all); it > generally followed by a few weeks to a few months past a final release. > > >>> What are the significant issues for a bioperl PPM installation >>> >> None that I'm aware of - I just need to find the time to >> start looking into generating an appropriate PPD. Hopefully >> Nathan's wiki page on the subject will be all I need. >> > > I'll try testing it out today and next week (the more people we have looking > into the issue the better). I'm sure that Module::Build hasn't updated to > using PPM4 XML formatting, but the tags are similar enough. I can always > create a local PPM database using a similar directory structure to > bioperl.org/DIST and test an installation from it. > > chris > To clarify a few things about PPM4 XML and to highlight the main differences: 1) The use of PROVIDE and REQUIRE tags 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules. 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma separated tuples like PPM3 XML 4) the VERSION in PROVIDE and REQUIRE are used internally to do version comparisons for upgrades and solving prereqs etc 5) Module names should all contain '::' either natively according their namespace, if it doesn't have one natively, then one is appended to the end e.g. "GD::" 6) the VERSION in the SOFTPKG key is for human readability only 7) the NAME in SOFTPKG is used to identify which packages are actually the same. Nath --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 18:52:57 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From bix at sendu.me.uk Fri Dec 1 18:52:44 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 18:52:44 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: <457079FC.7010209@sendu.me.uk> Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: [snip] >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux with no > errors. If you make a file with just that sequence do you still get the > error? > > Is anyone else able to reproduce the problem? In fact, if I just try and load it again I reproduce the problem. The situation is similar to http://bugzilla.bioperl.org/show_bug.cgi?id=2092 And I have a tentative fix that extends Brian's fix there. Committed to HEAD only atm. I don't know anything about bioperl-db and don't have the faintest clue why this is happening, nor the time to figure it out. Can someone please have a proper look at this and decide if my fix is sane? All I can say is the the test suites for bioperl-live and bioperl-db continue to pass, but that isn't really saying much. PS. having used load_seqdatabase.pl to load a sequence, how do I remove it afterwards? From cjfields at uiuc.edu Fri Dec 1 19:00:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:00:13 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: >> Hello all, >> >> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >> without Cygwin. The "make test"s have all completed without error. >> This >> is my first time dealing with bioperl, so bear with me. >> >> I've successfully loaded the most recent taxonomy information >> using the >> biosql-schema scripts. After this, I attempted to load the most >> recent >> release of the uniprot flat file dataset with the following command: >> >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - >> dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/ > knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux > with no > errors. If you make a file with just that sequence do you still get > the > error? > > Is anyone else able to reproduce the problem? I can reproduce on both WinXP and Mac OS X using the latest bioperl- db/bioperl-live and a BioSQL database preloaded with taxonomy. Notably the bug doesn't show up with a database lacking taxonomy, where no lookup is used (I guess). Here's some overly verbose debugging (apologies): Loading saved.flat ... attempting to load adaptor class for Bio::Seq::RichSeq attempting to load module Bio::DB::BioSQL::RichSeqAdaptor attempting to load adaptor class for Bio::Seq attempting to load module Bio::DB::BioSQL::SeqAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor attempting to load adaptor class for Bio::Species attempting to load module Bio::DB::BioSQL::SpeciesAdaptor instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor attempting to load adaptor class for Bio::Tree::Tree attempting to load module Bio::DB::BioSQL::TreeAdaptor attempting to load adaptor class for Bio::Root::Root attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::Root::RootI attempting to load module Bio::DB::BioSQL::RootIAdaptor attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::Tree::TreeI attempting to load module Bio::DB::BioSQL::TreeIAdaptor attempting to load module Bio::DB::BioSQL::TreeAdaptor attempting to load adaptor class for Bio::Tree::NodeI attempting to load module Bio::DB::BioSQL::NodeIAdaptor attempting to load module Bio::DB::BioSQL::NodeAdaptor attempting to load adaptor class for Bio::Tree::TreeFunctionsI attempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor attempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor no adaptor found for class Bio::Tree::Tree attempting to load adaptor class for Bio::DB::Taxonomy::list attempting to load module Bio::DB::BioSQL::listAdaptor attempting to load adaptor class for Bio::DB::Taxonomy attempting to load module Bio::DB::BioSQL::TaxonomyAdaptor no adaptor found for class Bio::DB::Taxonomy::list attempting to load adaptor class for Bio::Annotation::Collection attempting to load module Bio::DB::BioSQL::CollectionAdaptor attempting to load adaptor class for Bio::AnnotationCollectionI attempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor attempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor attempting to load adaptor class for Bio::Annotation::TypeManager attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for Bio::Annotation::SimpleValue attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor attempting to load adaptor class for Bio::Annotation::Reference attempting to load module Bio::DB::BioSQL::ReferenceAdaptor instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor attempting to load adaptor class for Bio::Annotation::Comment attempting to load module Bio::DB::BioSQL::CommentAdaptor instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor attempting to load adaptor class for Bio::Annotation::DBLink attempting to load module Bio::DB::BioSQL::DBLinkAdaptor instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor attempting to load adaptor class for Bio::PrimarySeq attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor attempting to load adaptor class for Bio::SeqFeature::Generic attempting to load module Bio::DB::BioSQL::GenericAdaptor attempting to load adaptor class for Bio::SeqFeatureI attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor attempting to load adaptor class for Bio::Location::Simple attempting to load module Bio::DB::BioSQL::SimpleAdaptor attempting to load adaptor class for Bio::Location::Atomic attempting to load module Bio::DB::BioSQL::AtomicAdaptor attempting to load adaptor class for Bio::LocationI attempting to load module Bio::DB::BioSQL::LocationIAdaptor attempting to load module Bio::DB::BioSQL::LocationAdaptor instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for BioNamespace attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load driver for adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BioNamespaceAdaptor preparing UK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ? BioNamespaceAdaptor: binding UK column 1 to "Swiss-Prot" (namespace) preparing INSERT statement: INSERT INTO biodatabase (name, authority) VALUES (?, ?) BioNamespaceAdaptor::insert: binding column 1 to "Swiss- Prot" (namespace) BioNamespaceAdaptor::insert: binding column 2 to "" (authority) no adaptor found for class Bio::Tree::Tree no adaptor found for class Bio::DB::Taxonomy::list attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for Bio::DB::BioSQL::SpeciesAdaptor preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND ncbi_taxon_id = ? SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) SpeciesAdaptor: binding UK column 2 to "141679" (ncbi_taxid) prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM taxon node, taxon taxon, taxon_name name WHERE name.taxon_id = node.taxon_id AND taxon.left_value BETWEEN node.left_value AND node.right_value AND taxon.taxon_id = ? AND name.name_class = 'scientific name' ORDER BY node.left_value attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqAdaptor Could not store Q7N3Q6: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The supplied lineage does not start near 'Photorhabdus luminescens subsp. laumondii' STACK: Error::throw STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ Root/Root.pm:359 STACK: Bio::Species::classification /Users/cjfields/src/bioperl-live/ Bio/Species.pm:166 STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:552 STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /Library/ Perl/5.8.6/Bio/DB/BioSQL/SpeciesAdaptor.pm:281 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 5.8.6/Bio/DB/Persistent/PersistentObject.pm:271 STACK: load_seqdatabase.pl:620 ----------------------------------------------------------- at load_seqdatabase.pl line 633 chris From cjfields at uiuc.edu Fri Dec 1 19:01:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:01:59 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <45707A07.7000106@sheffield.ac.uk> References: <006301c71572$24be8830$15327e82@pyrimidine> <45707A07.7000106@sheffield.ac.uk> Message-ID: On Dec 1, 2006, at 12:52 PM, Nathan S. Haigh wrote: > Chris Fields wrote: >>> Chris Fields wrote: >>> PPM). I can >>>> see using CPAN as an alternative way of installing Bioperl for a >>>> distribution, or as the primary method via CVS or manually, but >>>> not for distributions. At least not until the kinks are worked >>>> out for Windows users. >>>> >>> CPAN isn't being suggested as the primary or preferred >>> installation method for Windows. That will still be PPM. I'm >>> mentioning CPAN / manual installation in the Windows INSTALL docs >>> for the benefit of anyone who wants a simple install and test >>> environment when checking out from CVS. >>> >> >> That's fine by me. I think the focus is making sure the PPM >> works, but that >> shouldn't hold up the final 1.5.2 release. The PPM for previous >> releases >> was never released concurrently with the distribution (if at all); it >> generally followed by a few weeks to a few months past a final >> release. >> >> >>>> What are the significant issues for a bioperl PPM installation >>>> >>> None that I'm aware of - I just need to find the time to start >>> looking into generating an appropriate PPD. Hopefully Nathan's >>> wiki page on the subject will be all I need. >>> >> >> I'll try testing it out today and next week (the more people we >> have looking >> into the issue the better). I'm sure that Module::Build hasn't >> updated to >> using PPM4 XML formatting, but the tags are similar enough. I can >> always >> create a local PPM database using a similar directory structure to >> bioperl.org/DIST and test an installation from it. >> >> chris >> > > To clarify a few things about PPM4 XML and to highlight the main > differences: > > 1) The use of PROVIDE and REQUIRE tags > 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules. > 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma > separated tuples like PPM3 XML > 4) the VERSION in PROVIDE and REQUIRE are used internally to do > version comparisons for upgrades and solving prereqs etc > 5) Module names should all contain '::' either natively according > their namespace, if it doesn't have one natively, then one is > appended to the end e.g. "GD::" > 6) the VERSION in the SOFTPKG key is for human readability only > 7) the NAME in SOFTPKG is used to identify which packages are > actually the same. > > Nath Okay. Maybe place this in the wiki (PPM page) for future reference? Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Fri Dec 1 19:05:38 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 01 Dec 2006 19:05:38 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl 5.8.8.819 In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine> References: <006301c71572$24be8830$15327e82@pyrimidine> Message-ID: <45707D02.9070504@sheffield.ac.uk> Chris Fields wrote: >> Chris Fields wrote: >> PPM). I can >> >>> see using CPAN as an alternative way of installing Bioperl for a >>> distribution, or as the primary method via CVS or manually, but not >>> for distributions. At least not until the kinks are worked out for >>> Windows users. >>> >> CPAN isn't being suggested as the primary or preferred >> installation method for Windows. That will still be PPM. I'm >> mentioning CPAN / manual installation in the Windows INSTALL >> docs for the benefit of anyone who wants a simple install and >> test environment when checking out from CVS. >> > > That's fine by me. I think the focus is making sure the PPM works, but that > shouldn't hold up the final 1.5.2 release. The PPM for previous releases > was never released concurrently with the distribution (if at all); it > generally followed by a few weeks to a few months past a final release. > > Forgot to say, one really annoying thing about PPM is that it seems to display all the versions of Bioperl defined in the XML file. An addition, I think a bug in PPM4 means that if a package is available in ActiveStates repo PPM4 always want to install it rather than a more recent version in a different repo (this includes upgrades). This results in this annoying behaviour: 1) If activestate and bioperl repos are active, searching for bioperl lists several versions 2) If you are using PPM4 GUI, and have installed a non activestate version, then it says you can upgrade to the version in activestates repo (even if it's actually a downgrade). 3) Using ppm-shell, if you choose "install bioperl" or "upgrade bioperl" it will always install the version in the activestate repo. 4) I'm sure there are also some other annoyances. In the end, it means the best way to install and upgrade bioperl, is to search for bioperl packages and install the latest version by eye rather than relying in the "upgrade feature" (at least for the time being). You may also need to remove an old version of bioperl before installing a more recent version. NOTE: using "upgrade" runs the risk of installing bioperl 1.2.3 from activestate and not the latest version in any other repo! I'll update the wiki when I have time. Nath >>> What are the significant issues for a bioperl PPM installation >>> >> None that I'm aware of - I just need to find the time to >> start looking into generating an appropriate PPD. Hopefully >> Nathan's wiki page on the subject will be all I need. >> > > I'll try testing it out today and next week (the more people we have looking > into the issue the better). I'm sure that Module::Build hasn't updated to > using PPM4 XML formatting, but the tags are similar enough. I can always > create a local PPM database using a similar directory structure to > bioperl.org/DIST and test an installation from it. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > --- > avast! Antivirus: Inbound message clean. > Virus Database (VPS): 0652-4, 30/11/2006 > Tested on: 01/12/2006 18:29:23 > avast! - copyright (c) 1988-2006 ALWIL Software. > http://www.avast.com > > > > --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0652-4, 30/11/2006 Tested on: 01/12/2006 19:05:39 avast! - copyright (c) 1988-2006 ALWIL Software. http://www.avast.com From cjfields at uiuc.edu Fri Dec 1 19:06:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:06:53 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <45706291.80201@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> Message-ID: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > pelikan at cs.pitt.edu wrote: >> Hello all, >> >> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >> without Cygwin. The "make test"s have all completed without error. >> This >> is my first time dealing with bioperl, so bear with me. >> >> I've successfully loaded the most recent taxonomy information >> using the >> biosql-schema scripts. After this, I attempted to load the most >> recent >> release of the uniprot flat file dataset with the following command: >> >> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - >> dbpass >> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >> >> I am subsequently greeted by many of the following errors: >> >> Could not store Q7N3Q6: > > I extracted just Q7N3Q6 from > ftp://ftp.expasy.org/databases/uniprot/current_release/ > knowledgebase/complete/uniprot_sprot.dat.gz > and was able to load it in using load_seqdatabase.pl under linux > with no > errors. If you make a file with just that sequence do you still get > the > error? > > Is anyone else able to reproduce the problem? Okay, just updated to get your latest CVS fixes for bioperl-live and it passes now for both Mac OS X and WinXP. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Dec 1 19:09:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:09:15 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457079FC.7010209@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk> Message-ID: On Dec 1, 2006, at 12:52 PM, Sendu Bala wrote: > > PS. having used load_seqdatabase.pl to load a sequence, how do I > remove > it afterwards? There's not much documentation on it, but it demonstrated several times in the test suite. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Dec 1 19:39:17 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Dec 2006 19:39:17 +0000 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu> Message-ID: <457084E5.2050300@sendu.me.uk> Chris Fields wrote: > > On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote: > >> pelikan at cs.pitt.edu wrote: >>> Hello all, >>> >>> I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows, >>> without Cygwin. The "make test"s have all completed without error. This >>> is my first time dealing with bioperl, so bear with me. >>> >>> I've successfully loaded the most recent taxonomy information >>> using the >>> biosql-schema scripts. After this, I attempted to load the most recent >>> release of the uniprot flat file dataset with the following command: >>> >>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass >>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat >>> >>> I am subsequently greeted by many of the following errors: >>> >>> Could not store Q7N3Q6: >> >> I extracted just Q7N3Q6 from >> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz >> >> and was able to load it in using load_seqdatabase.pl under linux with no >> errors. If you make a file with just that sequence do you still get the >> error? >> >> Is anyone else able to reproduce the problem? > > Okay, just updated to get your latest CVS fixes for bioperl-live and it > passes now for both Mac OS X and WinXP. Can you confirm if it is actually working correctly though? Like, having stored a previously-problem sequence, can you get it back out from the database and is its ->species() correct? From cjfields at uiuc.edu Fri Dec 1 19:52:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 1 Dec 2006 13:52:13 -0600 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457084E5.2050300@sendu.me.uk> Message-ID: <000001c71582$329d4d50$15327e82@pyrimidine> > > > > Okay, just updated to get your latest CVS fixes for > bioperl-live and > > it passes now for both Mac OS X and WinXP. > > Can you confirm if it is actually working correctly though? > Like, having stored a previously-problem sequence, can you > get it back out from the database and is its ->species() correct? I would assume so, if we can trust the species tests. I will have to try it again over the weekend. I planned on loading a ton of protein sequences in anyway, most of which are bacterial; if anything breaks it will probably be with those. I think Jason and Hilmar were going to get together about the BioSQL paper at the hackathon. That may be a good place to bring some of the species issues, if they persist. chris From hlapp at gmx.net Sat Dec 2 01:42:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 1 Dec 2006 20:42:05 -0500 Subject: [Bioperl-l] Error with supplied lineages importing uniprot data In-Reply-To: <457079FC.7010209@sendu.me.uk> References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu> <45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk> Message-ID: <8414723F-BA02-4936-8F53-781276C3B526@gmx.net> Either using SQL: -- theoretically you should convince yourself first that there -- is only one such record (the UK is over acc,version,namespace) SQL> DELETE FROM bioentry WHERE accession = 'Q7N3Q6'; or through bioperl-db (see the delete test for examples): my $db = Bio::DB::BioDB->new(....); my $seq = Bio::PrimarySeq->new(-accession_number=>'Q7N3Q6', -namespace=>'whatever you used when loading'); my $adp = $db->get_persistence_adaptor($seq); my $pseq = $adp->find_by_unique_key($seq); $pseq->remove(); $pseq->commit(); -hilmar On Dec 1, 2006, at 1:52 PM, Sendu Bala wrote: > PS. having used load_seqdatabase.pl to load a sequence, how do I > remove > it afterwards? -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From chhalling at verizon.net Mon Dec 4 01:56:51 2006 From: chhalling at verizon.net (Conrad Halling) Date: Sun, 03 Dec 2006 20:56:51 -0500 Subject: [Bioperl-l] BioPerl Wiki is down Message-ID: <45738063.1070504@verizon.net> When I attempted to navigate to http://www.bioperl.org/, I got the following message: A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "MediaWikiBagOStuff::_doquery". MySQL returned error "1205: Lock wait timeout exceeded; try restarting transaction (localhost)". -- Conrad Halling chhalling at verizon.net From rbirnie at totalise.co.uk Sun Dec 3 21:38:02 2006 From: rbirnie at totalise.co.uk (richard) Date: Sun, 3 Dec 2006 21:38:02 +0000 Subject: [Bioperl-l] confused by Bio::Graphics Message-ID: <200612032138.02522.rbirnie@totalise.co.uk> Hi all, I'm having a little trouble getting Bio::Graphics to give me the correct output and I'm looking for some help. I am trying to extend from example 5 of the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. Eventually I intend the script to follow example 6 but I thought I'd try the simpler version first. The basic aim of the script is that it takes as input a file containing a list of GenBank IDs plus some other info for alternative transcripts of a gene. This information is stored in a hash and the GenBank IDs are used to retrieve the appropriate entries from GenBank. I then want to use Bio::Graphics to generate a figure from the feature tables showing the CDSs from the alternative transcripts. So far I have managed to retrieve the GenBank entries extract the feature tables and store a reference to these in the hash mentioned above. I've also got Bio::Graphics to draw a basic image but some of the details aren't right and I don't understand why. I have attached the code I have so far, the input file and the output image to this mail. I didn't want to display it all in the main message but I'm not actually sure which bit is causing the problem. The code is very rough and in need of polishing but I need to get it to work correctly first. These are the problems: 1) As I understand it this: my $wholeseq = Bio::SeqFeature::Generic->new ( -start => 1, -end => $refseq->length, -display_name =>$refseq->display_name ); should display the name of the gene (CD133/Prominin1) near the top of image. It doesn't, am I misunderstanding or is there an error in the code? 2) In the quoted example the CDS is broken up into smaller regions which are then linked together in example 6. This isn't happening in my code and I think it should be, I get one solid block for the CDS. I don't understand why this is because I'm not clear which parts of the feature table are used to define where the CDS should be split. I think this is the relevant bit of code: foreach my $alt_trans (keys %main) { foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { my $feature = $main{$alt_trans}{'features'}{$tag}; $panel->add_track($feature, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'black', -key => $alt_trans, -bump => +1, -height => 8, -label => 1, -description => 1, ) if ($tag eq 'CDS'); } } Can anyone tell me what I am doing wrong? RefSeq entry for the gene of interest is here: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386 If I understand correctly the example file used in the HOWTO is this gene: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=116805320 Final question, does bioperl come with example scripts and is so where whould they normally be found on a Linux system? If anyone is still reading this thanks for your patience. Any clarification will be appreciated. regards, Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: CD133_graphic_code Type: application/x-perl Size: 2702 bytes Desc: not available URL: -------------- next part -------------- sequence_ID Exon_Boundary Assay_location Amplicon_length NM_006017 9 - 10 1118 106 AF027208.1 9 - 10 1118 106 AK027420.1 9 - 10 1312 106 AK027422.1 9 - 10 1334 106 BC012089.1 9 - 10 1289 106 AY449689.1 8 - 9 1054 106 AY449690.1 8 - 9 1054 106 AY449691.1 8 - 9 1054 106 AY449692.1 9 - 10 1081 106 AY449693.1 9 - 10 1081 106 AF507034.1 8 - 9 1091 106 AK075411.1 9 - 10 1289 106 AF117225.1 9 - 10 1334 106 AK226033.1 - 1312 106 DQ895452.1 - 1054 106 -------------- next part -------------- A non-text attachment was scrubbed... Name: CD133.png Type: image/png Size: 4322 bytes Desc: not available URL: From cjfields at uiuc.edu Mon Dec 4 03:35:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 3 Dec 2006 21:35:17 -0600 Subject: [Bioperl-l] BioPerl Wiki is down In-Reply-To: <45738063.1070504@verizon.net> References: <45738063.1070504@verizon.net> Message-ID: <41422FC7-B579-4B45-B8CC-341B8F462BCB@uiuc.edu> On Dec 3, 2006, at 7:56 PM, Conrad Halling wrote: > When I attempted to navigate to http://www.bioperl.org/, I got the > following message: > > A database query syntax error has occurred. This may indicate a bug in > the software. The last attempted database query was: > > (SQL query hidden) > > from within function "MediaWikiBagOStuff::_doquery". MySQL returned > error "1205: Lock wait timeout exceeded; try restarting transaction > (localhost)". > > -- Conrad Halling > chhalling at verizon.net This has been an ongoing problem with the server; I have reported it previously to open-bio support. There have been a few attempts to fix it which seem to work short-term but something else must be wrong. Jason? Chris D? For my part, Googling found the following link, which indicates that this error may be due to heavy server load: http://tibia.erig.net/TibiaWiki:Bug_reports Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Derek.Fairley at bll.n-i.nhs.uk Mon Dec 4 10:18:37 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Mon, 4 Dec 2006 10:18:37 -0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk> Message-ID: Richard, You can find instructions for installing the example scripts directory here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_SCRIPTS or you can get individual scripts from here: http://www.bioperl.org/wiki/Bioperl_scripts11 Derek. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of richard Sent: 03 December 2006 21:38 To: Bioperl list Subject: [Bioperl-l] confused by Bio::Graphics Hi all, I'm having a little trouble getting Bio::Graphics to give me the correct output and I'm looking for some help. I am trying to extend from example 5 of the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. Eventually I intend the script to follow example 6 but I thought I'd try the simpler version first. The basic aim of the script is that it takes as input a file containing a list of GenBank IDs plus some other info for alternative transcripts of a gene. This information is stored in a hash and the GenBank IDs are used to retrieve the appropriate entries from GenBank. I then want to use Bio::Graphics to generate a figure from the feature tables showing the CDSs from the alternative transcripts. So far I have managed to retrieve the GenBank entries extract the feature tables and store a reference to these in the hash mentioned above. I've also got Bio::Graphics to draw a basic image but some of the details aren't right and I don't understand why. I have attached the code I have so far, the input file and the output image to this mail. I didn't want to display it all in the main message but I'm not actually sure which bit is causing the problem. The code is very rough and in need of polishing but I need to get it to work correctly first. These are the problems: 1) As I understand it this: my $wholeseq = Bio::SeqFeature::Generic->new ( -start => 1, -end => $refseq->length, -display_name =>$refseq->display_name ); should display the name of the gene (CD133/Prominin1) near the top of image. It doesn't, am I misunderstanding or is there an error in the code? 2) In the quoted example the CDS is broken up into smaller regions which are then linked together in example 6. This isn't happening in my code and I think it should be, I get one solid block for the CDS. I don't understand why this is because I'm not clear which parts of the feature table are used to define where the CDS should be split. I think this is the relevant bit of code: foreach my $alt_trans (keys %main) { foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { my $feature = $main{$alt_trans}{'features'}{$tag}; $panel->add_track($feature, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'black', -key => $alt_trans, -bump => +1, -height => 8, -label => 1, -description => 1, ) if ($tag eq 'CDS'); } } Can anyone tell me what I am doing wrong? RefSeq entry for the gene of interest is here: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386 If I understand correctly the example file used in the HOWTO is this gene: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=1168053 20 Final question, does bioperl come with example scripts and is so where whould they normally be found on a Linux system? If anyone is still reading this thanks for your patience. Any clarification will be appreciated. regards, Richard From rbirnie at totalise.co.uk Mon Dec 4 09:30:36 2006 From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk) Date: 04 Dec 2006 09:30:36 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From bix at sendu.me.uk Mon Dec 4 14:37:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 14:37:16 +0000 Subject: [Bioperl-l] BLASTing with a seqio/seq object... In-Reply-To: <45706671.9000201@york.ac.uk> References: <01ba01c714a2$b9659c10$15327e82@pyrimidine> <456F27E9.70205@york.ac.uk> <456FEF22.4090004@sendu.me.uk> <45706671.9000201@york.ac.uk> Message-ID: <4574329C.2030905@sendu.me.uk> Samantha Thompson wrote: > Hi, > Thanks for all your help so far, I am still trying to understand a > couple of things... You should make sure your replies are sent to the list, as you're likely to get a faster response. [where $blast_report is the value returned by Bio::Tools::Run::RemoteBlast->submit_blast($seq_object)] > when I run this line.. > > $searchio = Bio::SearchIO->new(-format => 'blast', > -file => $blast_report); > > between submitting the blast search and trying to to process the searchio object like I was attempting before I get the following errors back: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open 1: No such file or directory [snip] > Does this mean that my BLAST is failing when I submit it? No, the -file option of SearchIO->new() takes, unsurprisingly, a filename. I'd tell you to pay careful attention to the docs, but sadly the RemoteBlast docs are currently wrong. submit_blast() claims to return 'Blast report object' (which in any case certainly wouldn't be a filename) when in fact it returns, as you discovered, a (for our purposes) meaningless number. As I suggested before, you need to look at the synopsis for Bio::Tools::Run::RemoteBlast instead. (having called submit_blast you must do the each_rid loop) Does anyone care to go through the POD for RemoteBlast and update it to an accurate state? From bix at sendu.me.uk Mon Dec 4 14:40:27 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 14:40:27 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: References: Message-ID: <4574335B.805@sendu.me.uk> rbirnie at totalise.co.uk wrote: > Hi all, > > I've just seen my previous mail come through on the digest and I noticed > that the code I attached has been scrubbed which means that the message > won't make much sense. If I've contravened list rules by posting > attachments then apologies, I did look for a posting guide but couldn't > see one on the wiki. I deliberatley didn't put the whole code in the > main message because it's quite long. I'm not sure which part is wrong > so I don't know which part to post I'm just not seeing the output I > would expect from the example. What is the best thing for me to do? I saw a few attachments on your post (including your code example), so I think what you did was fine. From cjfields at uiuc.edu Mon Dec 4 15:40:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 09:40:20 -0600 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <4574335B.805@sendu.me.uk> Message-ID: <002001c717ba$823c1500$15327e82@pyrimidine> > rbirnie at totalise.co.uk wrote: > > Hi all, > > > > I've just seen my previous mail come through on the digest and I > > noticed that the code I attached has been scrubbed which means that > > the message won't make much sense. If I've contravened list > rules by > > posting attachments then apologies, I did look for a > posting guide but > > couldn't see one on the wiki. I deliberatley didn't put the > whole code > > in the main message because it's quite long. I'm not sure > which part > > is wrong so I don't know which part to post I'm just not seeing the > > output I would expect from the example. What is the best > thing for me to do? > > I saw a few attachments on your post (including your code > example), so I think what you did was fine. Same here. I received a PNG file and two text files (a script and a data file). chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From rbirnie at totalise.co.uk Mon Dec 4 16:06:51 2006 From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk) Date: 04 Dec 2006 16:06:51 +0000 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <002001c717ba$823c1500$15327e82@pyrimidine> References: <002001c717ba$823c1500$15327e82@pyrimidine> Message-ID: An HTML attachment was scrubbed... URL: From dmessina at wustl.edu Mon Dec 4 16:46:16 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 4 Dec 2006 10:46:16 -0600 Subject: [Bioperl-l] confused by Bio::Graphics In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk> References: <200612032138.02522.rbirnie@totalise.co.uk> Message-ID: Hi Richard, > [richard] > > These are the problems: > 1) As I understand it this: > > my $wholeseq = Bio::SeqFeature::Generic->new ( > -start => 1, > -end => $refseq->length, > -display_name =>$refseq->display_name > ); > > should display the name of the gene (CD133/Prominin1) near the top > of image. > It doesn't, am I misunderstanding or is there an error in the code? The contents of a sequence object's display_name varies depending on the type of sequence record; for a sequence object created from a Genbank record, it's the value of the LOCUS field on the first line of the record. If you want the gene name, you'll have to dig it out of the feature table. If you look at the Genbank record for your first sequence, you'll see that under both the gene and CDS primary features, the HUGO gene abbreviation is stored under the "gene" secondary tag, and various synonyms are under the "note" and "product" secondary tags. LOCUS NM_006017 3794 bp mRNA linear PRI 17-NOV-2006 DEFINITION Homo sapiens prominin 1 (PROM1), mRNA. ACCESSION NM_006017 VERSION NM_006017.1 GI:5174386 [...skipping irrelevant part of the Genbank record...] FEATURES Location/Qualifiers source 1..3794 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /chromosome="4" /map="4p15.32" gene 1..3794 /gene="PROM1" /note="prominin 1; synonyms: AC133, CD133, PROML1, MSTP061" /db_xref="GeneID:8842" /db_xref="HGNC:9454" /db_xref="HPRD:HPRD_05079" /db_xref="MIM:604365" CDS 38..2635 /gene="PROM1" /go_component="integral to plasma membrane [pmid 9389720]; membrane" /go_process="response to stimulus; visual perception" /note="hProminin; prominin (mouse)-like 1; hematopoietic stem cell antigen" /codon_start=1 /product="prominin 1" /protein_id="NP_006008.1" /db_xref="GI:5174387" /db_xref="GeneID:8842" /db_xref="HGNC:9454" /db_xref="HPRD:HPRD_05079" /db_xref="MIM:604365" [....more...] In your script, you grab the primary features between lines 34-60. You can grab the secondary feature you want with something like: [cribbed from the Feature-Annotation HOWTO] for my $feat_object ($seq_object->get_SeqFeatures) { push @ids, $feat_object->get_tag_values("gene") if ($feat_object- >has_tag("gene")); } > 2) In the quoted example the CDS is broken up into smaller regions > which are > then linked together in example 6. This isn't happening in my code > and I > think it should be, I get one solid block for the CDS. I don't > understand why > this is because I'm not clear which parts of the feature table are > used to > define where the CDS should be split. I think this is the relevant > bit of > code: > > foreach my $alt_trans (keys %main) { > foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) { > > my $feature = $main{$alt_trans}{'features'}{$tag}; > > $panel->add_track($feature, > -glyph => 'generic', > -bgcolor => $colors[$idx++ % @colors], > -fgcolor => 'black', > -font2color => 'black', > -key => $alt_trans, > -bump => +1, > -height => 8, > -label => 1, > -description => 1, > ) if ($tag eq 'CDS'); > > } > } The problem here is that RefSeq mRNA records don't contain intron- exon boundary information. I think you'll have to get that from an assembly record. From the Entrez gene page for PROM1, I obtained a Genbank record for the PROM1 genomic locus: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? val=NC_000004.10&from=15578955&to=15686664&strand=2&dopt=gb Saving that as 'PROM1.gb' (the suffix is important), and running the bp_embl2picture.pl script on it, I got an image similar to Figure 6 (attached). Hope this helps, Dave ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PROM1.png Type: image/png Size: 8646 bytes Desc: not available URL: From bix at sendu.me.uk Mon Dec 4 19:37:13 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 04 Dec 2006 19:37:13 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <000001c717db$3ca7b910$15327e82@pyrimidine> References: <000001c717db$3ca7b910$15327e82@pyrimidine> Message-ID: <457478E9.3060405@sendu.me.uk> Chris Fields wrote: > Sendu, > > Are current plans to still try getting the final 1.5.2 release out > before the hackathon next week? Yes, I seriously hope so. I was kind of hoping to see test results from you and Nathan on the wiki though... > There are a few commits I want to make, but I may wait until after > 1.5.2 is out before I add them. But don't let the release stop you. As long as you don't commit to the 1.5.2 branch it will be fine. From cjfields at uiuc.edu Mon Dec 4 19:34:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 13:34:34 -0600 Subject: [Bioperl-l] Timeline on the 1.5.2 release? Message-ID: <000001c717db$3ca7b910$15327e82@pyrimidine> Sendu, Are current plans to still try getting the final 1.5.2 release out before the hackathon next week? There are a few commits I want to make, but I may wait until after 1.5.2 is out before I add them. chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Dec 4 20:23:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 4 Dec 2006 14:23:45 -0600 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> Message-ID: <000001c717e2$19d18e00$15327e82@pyrimidine> > Chris Fields wrote: > > Sendu, > > > > Are current plans to still try getting the final 1.5.2 release out > > before the hackathon next week? > > Yes, I seriously hope so. I was kind of hoping to see test > results from you and Nathan on the wiki though... Ah, forgot to post those! Working on that now... > > There are a few commits I want to make, but I may wait until after > > 1.5.2 is out before I add them. > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. There are a few things I plan on adding over the next few weeks, including some things for Bio::Location::SplitLocation. However I'm sure some of the latter will break tests, so I'll be adding it in a bit at a time. It all depends when I can squeeze time in to work on them! chris From pelikan at cs.pitt.edu Mon Dec 4 22:34:59 2006 From: pelikan at cs.pitt.edu (pelikan at cs.pitt.edu) Date: Mon, 4 Dec 2006 17:34:59 -0500 (EST) Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries Message-ID: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> Hello, My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, and the latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB memory. "make test"s past fine. The problem is that I'm not getting similar numbers of anything when I load datasets using load_seqdatabase.pl. For instance, if I want to load only protiens from Homo Sapiens, I go to UniProt, use the database search function, do a text search for Homo Sapiens (returns 70914 hits), export the hits to flat file format (--format swiss) using the data set manager, and load it using load_seqdatabase.pl. The result of "select count(*) from bioentry;" results in only 1003 entries. Moreover it seems like the entries don't go past the B's in the alphabet - I can't find bioentry.descriptions like '%cytochrome%' or '%myoglobin%', but I can find apolipoproteins, for example. I know this is an annoying question, but if someone has more experience in dealing with this issue, I would be grateful for any assistance. I don't get any error messages, so it's difficult for me to tell what's going on. -Richard From n.haigh at sheffield.ac.uk Tue Dec 5 06:53:34 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 05 Dec 2006 06:53:34 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> Message-ID: <4575176E.3020906@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: > >> Sendu, >> >> Are current plans to still try getting the final 1.5.2 release out >> before the hackathon next week? >> > > Yes, I seriously hope so. I was kind of hoping to see test results from > you and Nathan on the wiki though... > > > OK, I'll get onto this today. >> There are a few commits I want to make, but I may wait until after >> 1.5.2 is out before I add them. >> > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > A: Yes. >> Q: Are you sure? >> >>> A: Because it reverses the logical flow of conversation. >>> >>>> Q: Why is top posting frowned upon? >>>> Get Thunderbird From n.haigh at sheffield.ac.uk Tue Dec 5 11:43:16 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 05 Dec 2006 11:43:16 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <457478E9.3060405@sendu.me.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> Message-ID: <45755B54.7080902@sheffield.ac.uk> Sendu Bala wrote: > Chris Fields wrote: > >> Sendu, >> >> Are current plans to still try getting the final 1.5.2 release out >> before the hackathon next week? >> > > Yes, I seriously hope so. I was kind of hoping to see test results from > you and Nathan on the wiki though... > > > I've added my test results for Debian to the wiki. Nath >> There are a few commits I want to make, but I may wait until after >> 1.5.2 is out before I add them. >> > > But don't let the release stop you. As long as you don't commit to the > 1.5.2 branch it will be fine. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > A: Yes. >> Q: Are you sure? >> >>> A: Because it reverses the logical flow of conversation. >>> >>>> Q: Why is top posting frowned upon? >>>> Get Thunderbird From bix at sendu.me.uk Tue Dec 5 11:47:06 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 05 Dec 2006 11:47:06 +0000 Subject: [Bioperl-l] Timeline on the 1.5.2 release? In-Reply-To: <45755B54.7080902@sheffield.ac.uk> References: <000001c717db$3ca7b910$15327e82@pyrimidine> <457478E9.3060405@sendu.me.uk> <45755B54.7080902@sheffield.ac.uk> Message-ID: <45755C3A.9050903@sendu.me.uk> Nathan S. Haigh wrote: > Sendu Bala wrote: >> Chris Fields wrote: >> >>> Sendu, >>> >>> Are current plans to still try getting the final 1.5.2 release out >>> before the hackathon next week? >>> >> Yes, I seriously hope so. I was kind of hoping to see test results from >> you and Nathan on the wiki though... > > I've added my test results for Debian to the wiki. Thanks (and to Chris as well). I can't tell you how much I loath and despise TCoffee and Tmhmm now ;) From cjfields at uiuc.edu Tue Dec 5 16:04:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Dec 2006 10:04:38 -0600 Subject: [Bioperl-l] Build.PL changes Message-ID: <001b01c71887$10be3160$15327e82@pyrimidine> Sendu, I think the Build.PL commits which force installation of XML::SAX::Expat should be rolled back. XML::Simple works with any XML::SAX backend, not just XML::SAX::Expat, which hasn't been actively maintained since 2003 and is deprecated in favor of XML::SAX::ExpatXS. In fact, forcing XML::SAX::Expat to install as the default XML::SAX backend currently breaks blastxml parsing. Note that forcing this also forces one to install the Expat library (now at v 2), which now has some compatibility problems with XML::SAX::Expat (but not ExpatXS). Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From qetzal at tutopia.com.br Wed Dec 6 15:21:20 2006 From: qetzal at tutopia.com.br (giovani) Date: Wed, 06 Dec 2006 10:21:20 -0500 Subject: [Bioperl-l] Biodiversity graphic Message-ID: An HTML attachment was scrubbed... URL: From benoit at ebi.ac.uk Wed Dec 6 17:30:12 2006 From: benoit at ebi.ac.uk (Benoit Ballester) Date: Wed, 06 Dec 2006 17:30:12 +0000 Subject: [Bioperl-l] Biodiversity graphic In-Reply-To: References: Message-ID: <4576FE24.1030807@ebi.ac.uk> giovani wrote: > > Hello there. I'm trying to write a programa to set a graphic with two > axis and two data sets to each axis. Anyone know some tool similar to > the GD module to set this graphic, because with GD I'm having troubles. > here is an example of what I want to do: > http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm > using with GD module. It looks to me that the graph you pointing too has been made by gnuplot. Why don't you use gnuplot or R instead ? Ben > > #!/usr/bin/perl -w > > use GD::Graph::mixed; > @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 3, 4, 14, 30, 12, 8, 7, 20, 15], > [ 2, 8, 2, 5, 3, 1, 3, 4, 1], > [ 5, 12, 24, 33, 19, 8, 6, 15, 21], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > ); > > $my_graph = new GD::Graph::mixed( ); > $my_graph->set( > x_label => 'X Label', > y1_label => 'Y1 label', > y2_label => 'Y2 label', > title => 'Using two axes', > y1_max_value => 40, > y2_max_value => 8, > y_tick_number => 8, > y_label_skip => 2, > long_ticks => 1, > two_axes => 1, > use_axis => [1,2,1,2], > legend_placement => 'BR', > x_labels_vertical => 1, > x_label_position => 1/2, > ); > > $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY'); > my $gd = $my_graph->plot(\@data) or die $my_graph->error; > open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n"; > binmode IMG; > print IMG $gd->gif; > close IMG; > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gwu at molbio.mgh.harvard.edu Wed Dec 6 21:12:57 2006 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Wed, 06 Dec 2006 16:12:57 -0500 Subject: [Bioperl-l] Biodiversity graphic In-Reply-To: References: Message-ID: <45773259.3010405@molbio.mgh.harvard.edu> Do you mean the GD code can not run or it does not generate image as you wanted? Gang giovani wrote: > > > Hello there. I'm trying to write a programa to set a graphic with two > axis and two data sets to each axis. Anyone know some tool similar to > the GD module to set this graphic, because with GD I'm having > troubles. here is an example of what I want to do: > http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm > using with GD module. > > #!/usr/bin/perl -w > > use GD::Graph::mixed; > @data = ( > ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"], > [ 3, 4, 14, 30, 12, 8, 7, 20, 15], > [ 2, 8, 2, 5, 3, 1, 3, 4, 1], > [ 5, 12, 24, 33, 19, 8, 6, 15, 21], > [ 1, 2, 5, 6, 3, 1.5, 1, 3, 4], > ); > > $my_graph = new GD::Graph::mixed( ); > $my_graph->set( > x_label => 'X Label', > y1_label => 'Y1 label', > y2_label => 'Y2 label', > title => 'Using two axes', > y1_max_value => 40, > y2_max_value => 8, > y_tick_number => 8, > y_label_skip => 2, > long_ticks => 1, > two_axes => 1, > use_axis => [1,2,1,2], > legend_placement => 'BR', > x_labels_vertical => 1, > x_label_position => 1/2, > ); > > $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY'); > my $gd = $my_graph->plot(\@data) or die $my_graph->error; > open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n"; > binmode IMG; > print IMG $gd->gif; > close IMG; > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Wed Dec 6 22:39:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 06 Dec 2006 22:39:49 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 Release Message-ID: <457746B5.2020006@sendu.me.uk> I am proud to announce the final release of Bioperl 1.5.2. http://www.bioperl.org/wiki/Release_1.5.2 bioperl (core): cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-1.5.2_100.zip bioperl-run: cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip bioperl-db: cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip bioperl-network: cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip http://bioperl.org/DIST/SIGNATURES.md5 (all are also available via CVS, and for Windows users, using the Perl Package Manager - see the wiki for details) The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree and bioperl-pipeline) did not see a unified release for 1.5.2. This release represents a developer release which has been thoroughly tested. We consider it the most stable (in terms of bugs) version of Bioperl and believe it to be suitable for most people. It is marked 'developer' or even 'unstable' because its API may change on short notice. It will also not be maintained or supported beyond the next bioperl release. 1.5.2 introduces the following new (core) features: * Taxonomy (Bio::Species) overhaul * Bio::Map improvements * Bio::SearchIO speedup * Build.PL installation For details, and a complete change log, see the wiki. API documentation is available here: http://doc.bioperl.org/ Acknowledgements: Enumerable thanks are due for the tireless efforts of Christopher Fields (bug fixing, testing, documentation, discussion), Nathan Haigh (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra (testing, documentation, support). Feedback and ideas provided by Hilmar Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and elsewhere proved invaluable. None of this would have been possible without the behind-the-scenes work of the open-bio support team. I'd also like to acknowledge Andreas J. Koenig for his help with CPAN matters. Finally, thank you to everyone who tried out the release candidates, and especially those that took the time to file bug reports or report problems. Remember, Bioperl can only go from strength to strength with /your/ help. If you'd like to experience the fame and fortune that naturally follow becoming a Bioperl developer (?!), become one! http://www.bioperl.org/wiki/Becoming_a_developer On behalf of the Bioperl team, Sendu Bala. From cjfields at uiuc.edu Thu Dec 7 02:30:44 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 6 Dec 2006 20:30:44 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c719a7$b48beb90$15327e82@pyrimidine> Great job Sendu! A bit of icing on the cake: all the WinXP PPMs (core, db, network, run) installed w/o a hitch following normal instructions using PPM4 (GUI and command line shell) using clean ActiveState installations. Looks like all the correct prereqs were installed with shell (only XML::SAX::ExpatXS was left out in the GUI installation for reasons outlined before). I'll run more tests tomorrow to see if tests pass with the installed bioperl (this should catch any prereq issues with PPM installation we missed). chris > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using > the Perl Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, > bioperl-pedigree and bioperl-pipeline) did not see a unified > release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of > Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas > provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the > mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with > CPAN matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or > report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. From hlapp at gmx.net Thu Dec 7 03:20:14 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Dec 2006 22:20:14 -0500 Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries In-Reply-To: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> References: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu> Message-ID: <8E15592D-6475-4A4D-BA6D-BD669C4233C3@gmx.net> I seriously doubt that load_seqdatabase.pl would have deliberately stopped loading the file. Either there was an error in loading an entry (which you should see, and you can also ask the script to just keep going by providing the --safe option), or the file only contained 1003 entries. Note that you can get progress logging by using the --logchunk option, which will also give you a final count of the number of sequences loaded. I'm not sure how you ran your search and your download on Uniprot. If I try what you describe I get 70491 hits, and if I try to export them using the data set manager I get the message: This download mechanism only supports 1000 proteins. The first 1000 proteins have been added from the selected. Which perfectly explains what you see. Did you convince yourself that the file contains 70491 entries? If you don't have grep and wc on your windows machine, you can use perl one-liners directly, e.g., perl -n -e '/^ID / && ++$n; END {print "$n entries\n";}' -hilmar On Dec 4, 2006, at 5:34 PM, pelikan at cs.pitt.edu wrote: > Hello, > > My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, > and the > latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB > memory. "make test"s past fine. > > The problem is that I'm not getting similar numbers of anything when I > load datasets using load_seqdatabase.pl. For instance, if I want to > load > only protiens from Homo Sapiens, > I go to UniProt, > use the database search function, > do a text search for Homo Sapiens (returns 70914 hits), > export the hits to flat file format (--format swiss) using the data > set > manager, > and load it using load_seqdatabase.pl. > > The result of "select count(*) from bioentry;" results in only > 1003 entries. > Moreover it seems like the entries don't go past the B's in the > alphabet - > I can't find bioentry.descriptions like '%cytochrome%' or '% > myoglobin%', > but I can find apolipoproteins, for example. > > I know this is an annoying question, but if someone has more > experience in > dealing with this issue, I would be grateful for any assistance. I > don't > get any error messages, so it's difficult for me to tell what's > going on. > > -Richard > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lzhtom at hotmail.com Thu Dec 7 03:13:47 2006 From: lzhtom at hotmail.com (zhihua li) Date: Thu, 07 Dec 2006 03:13:47 +0000 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? Message-ID: Hi netters, Recently I found this: For constructing a new SeqI object, I had to write: $seq_obj=Bio::SeqIO->new( -file => '/home/myfile', -format => 'Fasta'); #Note the dash before the two arguments. If I omitted the dash: $seq_obj=Bio::SeqIO->new( file => '/home/myfile', format => 'Fasta'); I'd get error: MSG: Unknown format given or could not determine it [] STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 So it seems to me that the dashes before the arguments are essential. However, when I tried to build a factory for StandaloneBlast, I found the other way around. If the script had the dash: $blast_obj=Bio::Tools::Run::StandAloneBlast->new( -program => 'blastn', -database => '/home/mydatabase'); I'd get the error message: MSG: Unallowed parameter: - ! STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 If I left out the dash by saying: $blast_obj=Bio::Tools::Run::StandAloneBlast->new( program => 'blastn', database => '/home/mydatabase'); Everyting is fine. Now I'm confused. Why sometimes I have to add the dash, while sometimes I'm not allowed to? Thanks in advance! _________________________________________________________________ ?????????????? MSN Messenger: http://messenger.msn.com/cn From hlapp at gmx.net Thu Dec 7 03:56:44 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Dec 2006 22:56:44 -0500 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: Congrats! Great work, Sendu! Don't forget to celebrate. -hilmar On Dec 6, 2006, at 5:39 PM, Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher > Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by > Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list > and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN > matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or report > problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From arareko at campus.iztacala.unam.mx Thu Dec 7 03:53:21 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 06 Dec 2006 21:53:21 -0600 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <45779031.3050202@campus.iztacala.unam.mx> This has been a great effort. Congrats and thanks to everyone involved! Mauricio. Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN matters. > > Finally, thank you to everyone who tried out the release candidates, and > especially those that took the time to file bug reports or report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From jason at bioperl.org Thu Dec 7 05:06:36 2006 From: jason at bioperl.org (Jason Stajich) Date: Wed, 6 Dec 2006 21:06:36 -0800 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <41A863C9-1B69-4C7B-9271-C577EDD011BB@bioperl.org> hear! hear! Excellent work. Thanks for leading the effort on this release and all of the behind the scenes work, attention to detail, and cat herding work it took make this possible. -jason On Dec 6, 2006, at 2:39 PM, Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher > Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by > Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list > and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN > matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or report > problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From n.haigh at sheffield.ac.uk Thu Dec 7 07:23:47 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 07 Dec 2006 07:23:47 +0000 Subject: [Bioperl-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <4577C183.7010501@sheffield.ac.uk> I know I'm very new to Bioperl development and don't know very much yet, so I'm probably not the best person to express the views of the Bioperl developers or users. However, I'm sure I'm safe in saying that on behalf of everyone associated with Bioperl a *huge* thank you must go out to Sendu for the gargantuan effort he has put into this release. Just looking over some of the e-mails he's sent over the past few weeks alone, it's clear that he has devoted a huge amount of time to the effort and in some cases with little sleep. Since there is very little (or should I say no) monetary recognition in such an important and time consuming role as "Release Pumpkin", I hope Sendu has a warm glow, safe in the knowledge that his efforts have helped enormously and are clearly recognised and fully appreciated by the Bioperl community. Therefore, I'd just like to iterate what others have already said.....Well done, excellent work!!! Nath From valiente at lsi.upc.edu Thu Dec 7 08:25:27 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Thu, 7 Dec 2006 09:25:27 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: Message-ID: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> The following popped out when input more the 110 species to taxonomy2tree script version 1.4: (in cleanup) ------------- EXCEPTION ------------- MSG: Must supply a Bio::Taxon STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ flatfile.pm:260 STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 STACK (eval) taxonomy2tree.pl:0 STACK toplevel taxonomy2tree.pl:0 Any clues? Thanks, Gabriel From bix at sendu.me.uk Thu Dec 7 09:24:39 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Dec 2006 09:24:39 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> Message-ID: <4577DDD7.7060208@sendu.me.uk> Gabriel Valiente wrote: > The following popped out when input more the 110 species to > taxonomy2tree script version 1.4: > > (in cleanup) > ------------- EXCEPTION ------------- > MSG: Must supply a Bio::Taxon > STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ > flatfile.pm:260 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 > STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 > STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 > STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 > STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 > STACK (eval) taxonomy2tree.pl:0 > STACK toplevel taxonomy2tree.pl:0 > > Any clues? Thanks, Are you able to narrow the problem down? What was your command line, what species were you using? Does it work with the first 110 species you tried? Is there anything special about the 111th? Do I understand correctly that this was a problem during cleanup only, and didn't affect the correctness and completeness of the result? From bix at sendu.me.uk Thu Dec 7 09:33:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 07 Dec 2006 09:33:18 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> Message-ID: <4577DFDE.6000500@sendu.me.uk> Gabriel Valiente wrote: > The following popped out when input more the 110 species to > taxonomy2tree script version 1.4: > > (in cleanup) > ------------- EXCEPTION ------------- > MSG: Must supply a Bio::Taxon > STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ > flatfile.pm:260 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 > STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 > STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 > STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 > STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 > STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 > STACK (eval) taxonomy2tree.pl:0 > STACK toplevel taxonomy2tree.pl:0 > > Any clues? Thanks, Oh, does it work with option -e? Or does it work if you delete your old indexes of the nodes and names files and let it re-create them? From valiente at lsi.upc.edu Thu Dec 7 09:38:03 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Thu, 7 Dec 2006 10:38:03 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4577DDD7.7060208@sendu.me.uk> References: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu> <4577DDD7.7060208@sendu.me.uk> Message-ID: Hi, If you run the attached shell script you should be able to reproduce the problem. It is not about any species in particular, but about the total number of species: it crushes with more than 120 species. The resulting tree is not correct, I'm checking it further now. Thanks, Gabriel -------------- next part -------------- A non-text attachment was scrubbed... Name: fetch-bork.sh Type: application/octet-stream Size: 7378 bytes Desc: not available URL: -------------- next part -------------- On Dec 7, 2006, at 10:24 AM, Sendu Bala wrote: > Gabriel Valiente wrote: >> The following popped out when input more the 110 species to >> taxonomy2tree script version 1.4: >> (in cleanup) >> ------------- EXCEPTION ------------- >> MSG: Must supply a Bio::Taxon >> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ >> flatfile.pm:260 >> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476 >> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703 >> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346 >> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466 >> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325 >> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409 >> STACK (eval) taxonomy2tree.pl:0 >> STACK toplevel taxonomy2tree.pl:0 >> Any clues? Thanks, > > Are you able to narrow the problem down? What was your command > line, what species were you using? Does it work with the first 110 > species you tried? Is there anything special about the 111th? > > Do I understand correctly that this was a problem during cleanup > only, and didn't affect the correctness and completeness of the > result? From cjfields at uiuc.edu Thu Dec 7 15:22:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 09:22:47 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110species In-Reply-To: Message-ID: <000001c71a13$8feec840$15327e82@pyrimidine> > Hi, > > If you run the attached shell script you should be able to > reproduce the problem. It is not about any species in > particular, but about the total number of species: it crushes > with more than 120 species. The resulting tree is not > correct, I'm checking it further now. Thanks, > > Gabriel Gabriel, My guess is this may have to do with using an old taxonomy dump file. I got this to work on winXP using the latest NCBI taxonomy. I had to modify taxonomy2tree and your shell script to get it to play nice with Windows, but I didn't get the error and I did get a tree (abbreviated for brevity): (((((("Agrobacterium tumefaciens str. C58","Sinorhizobium meliloti")Rhizobiaceae,... chris From cjfields at uiuc.edu Thu Dec 7 18:44:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 12:44:32 -0600 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: References: Message-ID: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu> On Dec 6, 2006, at 9:13 PM, zhihua li wrote: > Hi netters, > > Recently I found this: > > For constructing a new SeqI object, I had to write: > $seq_obj=Bio::SeqIO->new( > -file => '/home/myfile', > -format => 'Fasta'); #Note the dash before the > two arguments. > > If I omitted the dash: > $seq_obj=Bio::SeqIO->new( > file => '/home/myfile', > format => 'Fasta'); > I'd get error: > MSG: Unknown format given or could not determine it [] > STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 > > So it seems to me that the dashes before the arguments are > essential. However, when I tried to build a factory for > StandaloneBlast, I found the other way around. > > If the script had the dash: > $blast_obj=Bio::Tools::Run::StandAloneBlast->new( > -program => 'blastn', > -database => '/home/mydatabase'); > > I'd get the error message: MSG: Unallowed parameter: - ! > STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ > site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 > STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ > site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 > > If I left out the dash by saying: > $blast_obj=Bio::Tools::Run::StandAloneBlast->new( > program => 'blastn', > database => '/home/mydatabase'); > > Everyting is fine. > > Now I'm confused. Why sometimes I have to add the dash, while > sometimes I'm not allowed to? > > Thanks in advance! I agree that this should be more consistent. Does anyone know the reasoning for this? Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Thu Dec 7 19:32:21 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 07 Dec 2006 14:32:21 -0500 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu> Message-ID: Chris, The latest StandAloneBlast takes "dashed parameters", as in: @params = (-database => 'swissprot',-outfile => 'blast1.out'); $factory = Bio::Tools::Run::StandAloneBlast->new(@params); Or my $factory = Bio::Tools::Run::StandAloneBlast->new(-program =>"wublastp", -database=>"swissprot", -e => 1e-20); So that's why I asked "what version?" Someone made the change to allow dashes in @params a few months ago and I believe that that someone was you! Brian O. On 12/7/06 1:44 PM, "Chris Fields" wrote: > > On Dec 6, 2006, at 9:13 PM, zhihua li wrote: > >> Hi netters, >> >> Recently I found this: >> >> For constructing a new SeqI object, I had to write: >> $seq_obj=Bio::SeqIO->new( >> -file => '/home/myfile', >> -format => 'Fasta'); #Note the dash before the >> two arguments. >> >> If I omitted the dash: >> $seq_obj=Bio::SeqIO->new( >> file => '/home/myfile', >> format => 'Fasta'); >> I'd get error: >> MSG: Unknown format given or could not determine it [] >> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377 >> >> So it seems to me that the dashes before the arguments are >> essential. However, when I tried to build a factory for >> StandaloneBlast, I found the other way around. >> >> If the script had the dash: >> $blast_obj=Bio::Tools::Run::StandAloneBlast->new( >> -program => 'blastn', >> -database => '/home/mydatabase'); >> >> I'd get the error message: MSG: Unallowed parameter: - ! >> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ >> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433 >> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ >> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400 >> >> If I left out the dash by saying: >> $blast_obj=Bio::Tools::Run::StandAloneBlast->new( >> program => 'blastn', >> database => '/home/mydatabase'); >> >> Everyting is fine. >> >> Now I'm confused. Why sometimes I have to add the dash, while >> sometimes I'm not allowed to? >> >> Thanks in advance! > > I agree that this should be more consistent. Does anyone know the > reasoning for this? > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Dec 7 19:44:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 13:44:19 -0600 Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory constructor? In-Reply-To: References: Message-ID: On Dec 7, 2006, at 1:32 PM, Brian Osborne wrote: > Chris, > > The latest StandAloneBlast takes "dashed parameters", as in: > > @params = (-database => 'swissprot',-outfile => 'blast1.out'); > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > Or > > my $factory = Bio::Tools::Run::StandAloneBlast->new(-program > =>"wublastp", > - > database=>"swissprot", > -e => 1e-20); > > So that's why I asked "what version?" > > Someone made the change to allow dashes in @params a few months ago > and I > believe that that someone was you! > > Brian O. Nope, I plead innocent (at least to this!). I haven't made any commits to StandAloneBlast. These were added in by Torsten (see commits 1.59, 1.60), so you'll need to blame/thank him... http://tinyurl.com/y7ym9g So they're now a bit more consistent. That's not to say StandAloneBlast doesn't need some major revisions.... BTW, I didn't see a post from you asking about the version. Chris From akarger at CGR.Harvard.edu Thu Dec 7 21:32:51 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu, 7 Dec 2006 16:32:51 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq Message-ID: I need to know how to get the frame information in exon features (created by Bio::Tools::GFF) into a whole-gene feature that will be translated into a protein. I'm reading in some fungal GFFs generated by Jason Stajich. I - Use Bio::Tools::GFF to create a feature for each exon in a gene - Create a Bio::Location::Split object containing each feature's location - Create a Bio::SeqFeature::Generic object whose location is the above BL::Split - Attach my contig Bio::Seq to the feature - get the protein with feature->spliced_seq->translate->seq (Code below) Unfortunately, I get the wrong result when the GFF features have frame != 0. This happens for only a few percent of the exons, but when it does, I end up translating in the wrong frame. If I read the docs correctly, Location objects don't have a frame. So how do I get the correct spliced_seq, which skips one or two bp at the beginning of certain exons? I suspect the answer to this is that I'm going about this in completely the wrong way, in which case, please tell me how I ought to be doing it. Thanks, - Amir Karger Research Computing Life Sciences Division Harvard University P.S. In case you want to see actual code, here it is. After using Bio::Tools::GFF to create a sorted list of features for each exon (basically stolen from the module POD), I: # Create a new object representing the exons' gene my $coding_loc_obj = new Bio::Location::Split; foreach my $exon (@sorted_exons) { $coding_loc_obj->add_sub_Location($exon->location); } # Build a spliced feature representing the whole gene my $spliced_feat = new Bio::SeqFeature::Generic( -start => $coding_loc_obj->start, -end => $coding_loc_obj->end, -strand => $strand_num, -primary=> "splicedGene", ); $spliced_feat->location($coding_loc_obj); # Attach a contig object containing the sequence $spliced_feat->attach_seq($contig_obj->bioperl_object); # Get the spliced seq and translate to protein: my $coding_seq = $spliced_feat->spliced_seq->seq; my $protein = $spliced_feat->spliced_seq->translate->seq; From bix at sendu.me.uk Thu Dec 7 22:45:32 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 7 Dec 2006 15:45:32 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release Message-ID: <000001c71a51$671a79d0$6400a8c0@CodonSolutions.local> I am proud to announce the final release of Bioperl 1.5.2. http://www.bioperl.org/wiki/Release_1.5.2 bioperl (core): cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-1.5.2_100.zip bioperl-run: cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip bioperl-db: cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip bioperl-network: cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip http://bioperl.org/DIST/SIGNATURES.md5 (all are also available via CVS, and for Windows users, using the Perl Package Manager - see the wiki for details) The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree and bioperl-pipeline) did not see a unified release for 1.5.2. This release represents a developer release which has been thoroughly tested. We consider it the most stable (in terms of bugs) version of Bioperl and believe it to be suitable for most people. It is marked 'developer' or even 'unstable' because its API may change on short notice. It will also not be maintained or supported beyond the next bioperl release. 1.5.2 introduces the following new (core) features: * Taxonomy (Bio::Species) overhaul * Bio::Map improvements * Bio::SearchIO speedup * Build.PL installation For details, and a complete change log, see the wiki. API documentation is available here: http://doc.bioperl.org/ Acknowledgements: Enumerable thanks are due for the tireless efforts of Christopher Fields (bug fixing, testing, documentation, discussion), Nathan Haigh (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra (testing, documentation, support). Feedback and ideas provided by Hilmar Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and elsewhere proved invaluable. None of this would have been possible without the behind-the-scenes work of the open-bio support team. I'd also like to acknowledge Andreas J. Koenig for his help with CPAN matters. Finally, thank you to everyone who tried out the release candidates, and especially those that took the time to file bug reports or report problems. Remember, Bioperl can only go from strength to strength with /your/ help. If you'd like to experience the fame and fortune that naturally follow becoming a Bioperl developer (?!), become one! http://www.bioperl.org/wiki/Becoming_a_developer On behalf of the Bioperl team, Sendu Bala. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From cjfields at uiuc.edu Thu Dec 7 23:00:43 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 16:00:43 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c71a53$85cb4f10$6400a8c0@CodonSolutions.local> Great job Sendu! A bit of icing on the cake: all the WinXP PPMs (core, db, network, run) installed w/o a hitch following normal instructions using PPM4 (GUI and command line shell) using clean ActiveState installations. Looks like all the correct prereqs were installed with shell (only XML::SAX::ExpatXS was left out in the GUI installation for reasons outlined before). I'll run more tests tomorrow to see if tests pass with the installed bioperl (this should catch any prereq issues with PPM installation we missed). chris > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using > the Perl Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, > bioperl-pedigree and bioperl-pipeline) did not see a unified > release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of > Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas > provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the > mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with > CPAN matters. > > Finally, thank you to everyone who tried out the release > candidates, and > especially those that took the time to file bug reports or > report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From kaboroev at sfu.ca Thu Dec 7 22:26:35 2006 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Thu, 07 Dec 2006 14:26:35 -0800 Subject: [Bioperl-l] Bio::Graphics xyplot Message-ID: <4578951B.5050206@sfu.ca> Hi everyone, I'm attempting to add an xyplot of the phred quality scores to an Bio::Graphics image, and cannot get it to work. I have the panel with a track for both the scale and the DNA displaying properly. When I attempt to add the xyplot i just get a garbled track of, what looks like, timy xyplots for each datapoint. I have the cvs (updated today) of bioperl-live running. I think what I am missing is the creation of a "Sequence Feature Group" to hold the individual points of the plot. However, I cannot seem to find such an object. This is what I attempted: -------BEGIN---CODE----------- # start panel my $panel = Bio::Graphics::Panel->new(-length => $f_seqlen, -width => $f_seqlen*10, -pad_left => 10, -pad_right => 10, -grid => 1 ); # add scale $panel->add_track(arrow => Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen), -double => 1, -tick => 2, -fgcolor => 'black'); # add DNA ($feature is of type Bio::SeqFeature::Annotated) $panel->add_track(dna => $feature); # get list of quality scores from database my ($pqs_value) = $dbh->selectrow_array($sql); my @pqs_value = split(/\s/,$pqs_value); # create track my $track = $panel->add_track(-glyph => 'xyplot', -graph_type => 'points', -point_symbol => 'point', -max_score => 100, -min_score => 0, -scale => 'none'); # add "subfeatures" to for (my $i=0;$i<$f_seqlen;$i++) { $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i])); } print $panel->png(); $panel->finished; ------END---CODE---------- I also attempted to create an array of the point features and passed that by reference to the panel "add_track" as it describes in the xyplot documentation, but that resulted in the exact same image. keith -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 From arareko at campus.iztacala.unam.mx Thu Dec 7 23:15:53 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 7 Dec 2006 16:15:53 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release In-Reply-To: <457746B5.2020006@sendu.me.uk> References: <457746B5.2020006@sendu.me.uk> Message-ID: <000001c71a55$a479da60$6400a8c0@CodonSolutions.local> This has been a great effort. Congrats and thanks to everyone involved! Mauricio. Sendu Bala wrote: > I am proud to announce the final release of Bioperl 1.5.2. > > http://www.bioperl.org/wiki/Release_1.5.2 > > bioperl (core): > cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-1.5.2_100.zip > > bioperl-run: > cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip > > bioperl-db: > cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip > > bioperl-network: > cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz > http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2 > http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip > > http://bioperl.org/DIST/SIGNATURES.md5 > > (all are also available via CVS, and for Windows users, using the Perl > Package Manager - see the wiki for details) > > The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree > and bioperl-pipeline) did not see a unified release for 1.5.2. > > > > This release represents a developer release which has been thoroughly > tested. We consider it the most stable (in terms of bugs) version of > Bioperl and believe it to be suitable for most people. It is marked > 'developer' or even 'unstable' because its API may change on short > notice. It will also not be maintained or supported beyond the next > bioperl release. > > 1.5.2 introduces the following new (core) features: > > * Taxonomy (Bio::Species) overhaul > * Bio::Map improvements > * Bio::SearchIO speedup > * Build.PL installation > > For details, and a complete change log, see the wiki. > > API documentation is available here: http://doc.bioperl.org/ > > > Acknowledgements: > Enumerable thanks are due for the tireless efforts of Christopher Fields > (bug fixing, testing, documentation, discussion), Nathan Haigh > (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra > (testing, documentation, support). Feedback and ideas provided by Hilmar > Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and > elsewhere proved invaluable. None of this would have been possible > without the behind-the-scenes work of the open-bio support team. I'd > also like to acknowledge Andreas J. Koenig for his help with CPAN matters. > > Finally, thank you to everyone who tried out the release candidates, and > especially those that took the time to file bug reports or report problems. > > > Remember, Bioperl can only go from strength to strength with /your/ > help. If you'd like to experience the fame and fortune that naturally > follow becoming a Bioperl developer (?!), become one! > http://www.bioperl.org/wiki/Becoming_a_developer > > On behalf of the Bioperl team, > Sendu Bala. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From cain at cshl.edu Thu Dec 7 22:46:09 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 07 Dec 2006 17:46:09 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq In-Reply-To: References: Message-ID: <1165531569.2569.49.camel@localhost.localdomain> Amir, I don't know for sure what the problem is, but here is one possibility: the number in column 8 of a GFF file is not the frame, it is the phase. See the GFF3 spec for a description of what the phase is: http://www.sequenceontology.org/gff3.shtml (It doesn't matter if you are using GFF3 or GFF2, as the phase is the same in both). Scott On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > I need to know how to get the frame information in exon features > (created by Bio::Tools::GFF) into a whole-gene feature that will be > translated into a protein. > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > - Create a Bio::Location::Split object containing each feature's > location > - Create a Bio::SeqFeature::Generic object whose location is the above > BL::Split > - Attach my contig Bio::Seq to the feature > - get the protein with feature->spliced_seq->translate->seq > > (Code below) > > Unfortunately, I get the wrong result when the GFF features have frame > != 0. This happens for only a few percent of the exons, but when it > does, I end up translating in the wrong frame. > > If I read the docs correctly, Location objects don't have a frame. So > how do I get the correct spliced_seq, which skips one or two bp at the > beginning of certain exons? > > I suspect the answer to this is that I'm going about this in completely > the wrong way, in which case, please tell me how I ought to be doing it. > > Thanks, > - Amir Karger > Research Computing > Life Sciences Division > Harvard University > > P.S. In case you want to see actual code, here it is. After using > Bio::Tools::GFF to create a sorted list of features for each exon > (basically stolen from the module POD), I: > # Create a new object representing the exons' gene > my $coding_loc_obj = new Bio::Location::Split; > foreach my $exon (@sorted_exons) { > $coding_loc_obj->add_sub_Location($exon->location); > } > > # Build a spliced feature representing the whole gene > my $spliced_feat = new Bio::SeqFeature::Generic( > -start => $coding_loc_obj->start, > -end => $coding_loc_obj->end, > -strand => $strand_num, > -primary=> "splicedGene", > ); > $spliced_feat->location($coding_loc_obj); > > # Attach a contig object containing the sequence > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > # Get the spliced seq and translate to protein: > my $coding_seq = $spliced_feat->spliced_seq->seq; > my $protein = $spliced_feat->spliced_seq->translate->seq; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Fri Dec 8 02:52:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Dec 2006 20:52:47 -0600 Subject: [Bioperl-l] Using frame info from GFF in gettinga Seq->spliced_seq In-Reply-To: <1165531569.2569.49.camel@localhost.localdomain> Message-ID: <002d01c71a73$f16ecc40$15327e82@pyrimidine> Another issue is the splittype() is not defined, though I don't think that would kill anything as currently implemented. However, one thing we have passingly discussed is having Bio::Location::Split objects possibly exhibit different (but expected) behaviors based upon the splittype() (order, join, or bond). It's one of the things I want to work out for the next release. If Scott's fix doesn't work and the problem persists, you should file a bug report with some sample data for us to test out. chris > Amir, > > I don't know for sure what the problem is, but here is one > possibility: > the number in column 8 of a GFF file is not the frame, it is > the phase. > See the GFF3 spec for a description of what the phase is: > > http://www.sequenceontology.org/gff3.shtml > > (It doesn't matter if you are using GFF3 or GFF2, as the > phase is the same in both). > > Scott > > > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > > I need to know how to get the frame information in exon features > > (created by Bio::Tools::GFF) into a whole-gene feature that will be > > translated into a protein. > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > - Create a Bio::Location::Split object containing each feature's > > location > > - Create a Bio::SeqFeature::Generic object whose location > is the above > > BL::Split > > - Attach my contig Bio::Seq to the feature > > - get the protein with feature->spliced_seq->translate->seq > > > > (Code below) > > > > Unfortunately, I get the wrong result when the GFF features > have frame > > != 0. This happens for only a few percent of the exons, but when it > > does, I end up translating in the wrong frame. > > > > If I read the docs correctly, Location objects don't have a > frame. So > > how do I get the correct spliced_seq, which skips one or > two bp at the > > beginning of certain exons? > > > > I suspect the answer to this is that I'm going about this in > > completely the wrong way, in which case, please tell me how > I ought to be doing it. > > > > Thanks, > > - Amir Karger > > Research Computing > > Life Sciences Division > > Harvard University > > > > P.S. In case you want to see actual code, here it is. After using > > Bio::Tools::GFF to create a sorted list of features for each exon > > (basically stolen from the module POD), I: > > # Create a new object representing the exons' gene > > my $coding_loc_obj = new Bio::Location::Split; > > foreach my $exon (@sorted_exons) { > > $coding_loc_obj->add_sub_Location($exon->location); > > } > > > > # Build a spliced feature representing the whole gene > > my $spliced_feat = new Bio::SeqFeature::Generic( > > -start => $coding_loc_obj->start, > > -end => $coding_loc_obj->end, > > -strand => $strand_num, > > -primary=> "splicedGene", > > ); > > $spliced_feat->location($coding_loc_obj); > > > > # Attach a contig object containing the sequence > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > # Get the spliced seq and translate to protein: > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > my $protein = $spliced_feat->spliced_seq->translate->seq; From jason at bioperl.org Fri Dec 8 02:01:33 2006 From: jason at bioperl.org (Jason Stajich) Date: Thu, 7 Dec 2006 18:01:33 -0800 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq In-Reply-To: References: Message-ID: <866F6CEE-62BB-4880-9B13-6DDE29EAF94E@bioperl.org> This was a problem in the gene prediction output I suspect, more recent versions of the program should have fixed this. I do not currently have free time to deal with the errors in the small number of ORFs where this has happened. I think you just need to do start -= start- (frame*strand) for 1st exons. You can also probably provide the 1st exon's frame to the translate function as another possibility but you should try and get the CDS correct first depending on your downstream analyses. -jason On Dec 7, 2006, at 1:32 PM, Amir Karger wrote: > I need to know how to get the frame information in exon features > (created by Bio::Tools::GFF) into a whole-gene feature that will be > translated into a protein. > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > - Create a Bio::Location::Split object containing each feature's > location > - Create a Bio::SeqFeature::Generic object whose location is the above > BL::Split > - Attach my contig Bio::Seq to the feature > - get the protein with feature->spliced_seq->translate->seq > > (Code below) > > Unfortunately, I get the wrong result when the GFF features have frame > != 0. This happens for only a few percent of the exons, but when it > does, I end up translating in the wrong frame. > > If I read the docs correctly, Location objects don't have a frame. So > how do I get the correct spliced_seq, which skips one or two bp at the > beginning of certain exons? > > I suspect the answer to this is that I'm going about this in > completely > the wrong way, in which case, please tell me how I ought to be > doing it. > > Thanks, > - Amir Karger > Research Computing > Life Sciences Division > Harvard University > > P.S. In case you want to see actual code, here it is. After using > Bio::Tools::GFF to create a sorted list of features for each exon > (basically stolen from the module POD), I: > # Create a new object representing the exons' gene > my $coding_loc_obj = new Bio::Location::Split; > foreach my $exon (@sorted_exons) { > $coding_loc_obj->add_sub_Location($exon->location); > } > > # Build a spliced feature representing the whole gene > my $spliced_feat = new Bio::SeqFeature::Generic( > -start => $coding_loc_obj->start, > -end => $coding_loc_obj->end, > -strand => $strand_num, > -primary=> "splicedGene", > ); > $spliced_feat->location($coding_loc_obj); > > # Attach a contig object containing the sequence > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > # Get the spliced seq and translate to protein: > my $coding_seq = $spliced_feat->spliced_seq->seq; > my $protein = $spliced_feat->spliced_seq->translate->seq; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From neetisomaiya at gmail.com Fri Dec 8 10:21:50 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 8 Dec 2006 15:51:50 +0530 Subject: [Bioperl-l] need help with phrap parser Message-ID: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> Can anyone point me to a Phrap parser which parses the ace file to extract what reads make up each contig (eg. read_a and read_b make contig1; read_d read_e and read_z make contig2, and other information of the reads (like whether the read is complemented or not with respect to the contig, what region of the contig does each read contribute etc), basically the AF and BS lines of the ACE output. -- -Neeti Even my blood says, B positive From pmiguel at purdue.edu Fri Dec 8 14:17:02 2006 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 08 Dec 2006 09:17:02 -0500 Subject: [Bioperl-l] need help with phrap parser In-Reply-To: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> References: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com> Message-ID: <457973DE.6050900@purdue.edu> neeti somaiya wrote: > Can anyone point me to a Phrap parser which parses the ace file to extract > what reads make up each contig (eg. read_a and read_b make contig1; read_d > read_e and read_z make contig2, and other information of the reads (like > whether the read is complemented or not with respect to the contig, what > region of the contig does each read contribute etc), basically the AF and BS > lines of the ACE output. > > neeti, To find the reads that went into each contig, you do *not* want the BS tagged records. My understanding is that BS is just what consed uses to populate its consensus line from the ace file. I write this because of an email sent me by David Gordon in 2001 included here without his permission: > > Phrap writes BS lines which > > indicate, for each consensus position, which read phrap uses at that > > position to become the consensus. These BS ("base segments") are > > manipulated by Consed when there are changes to the assembly, such as > > joins, tears, removing reads, or changing the consensus. > The simplest way is: egrep '^CO|AF|RD' acefilename if you are on a unix system. Or with perl while (<>) { print if (/^CO|AF|RD/); } But then you would need to parse the fields of interest. You get the position/strand in the contig from AF, then you get the length of the read from RD. There does look like there is a part of bioperl that meant to perform this task--including Bio::Assembly::IO::ace but it looks like it was started, but never completed. From cjfields at uiuc.edu Fri Dec 8 15:17:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 09:17:31 -0600 Subject: [Bioperl-l] NAR Database Issue Papers Message-ID: <000601c71adb$fdd60490$15327e82@pyrimidine> For those interested, the Nucleic Acids Research Database issue papers have been popping up in the Advance Access section of the NAR website: http://nar.oxfordjournals.org/papbyrecent.dtl Ensembl, UCSC Browser, Entrez Gene, and a number of others of possible are represented. Of particular note are a few mentions of formatting changes to UniProt, EMBL, and other records, which should be taken care of in the latest BioPerl release (fingers crossed!). chris From cjfields at uiuc.edu Fri Dec 8 15:31:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 09:31:19 -0600 Subject: [Bioperl-l] need help with phrap parser In-Reply-To: <457973DE.6050900@purdue.edu> Message-ID: <000001c71add$ec7147d0$15327e82@pyrimidine> ... > But then you would need to parse the fields of interest. You get the > position/strand in the contig from AF, then you get the length of the > read from RD. > > There does look like there is a part of bioperl that meant to perform > this task--including Bio::Assembly::IO::ace but it looks like it was > started, but never completed. ...and if anyone wants to chip in and work on it, let us know! The various Bio::Assembly modules are one of many areas that needs some updating. chris From akarger at CGR.Harvard.edu Fri Dec 8 18:25:47 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 8 Dec 2006 13:25:47 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq Message-ID: > This was a problem in the gene prediction output I suspect, more > recent versions of the program should have fixed this. I do not > currently have free time to deal with the errors in the small number > of ORFs where this has happened. > > I think you just need to do > start -= start- (frame*strand) > for 1st exons. I used if (strand==1) {start += exon->frame} else {end -= exon->frame} This took me from 90 translations that had * within the sequence to just 9, out of 5500 CDS in S bayanus. > You can also probably provide the 1st exon's frame to the translate > function as another possibility but you should try and get the CDS > correct first depending on your downstream analyses. Yes, I think. Scott Cain pointed out that GFF column 8 is the "phase", which I had never heard of before. My current, very limited, understanding is that sometimes you'll have an exon with, say, 31 bp, followed by an exon with 29 bp. When the intron gets spliced out, you eventually get an mRNA of 60 bp, which translates to a protein of 20 aa. But the second exon has a phase of 1, not 0, because you can't just start translating at the first bp of the second exon and expect to get nice amino acids. By the way, whether or not phase is the same thing as frame, when I call the frame() method on the features created by Bio::Tools::GFF, I get the phase info. I assume that's a feature (no pun intended), not a bug? I'm still confused as to why you would have a phase in the first exon, though. Why not just say the CDS starts 1 or 2 bp later? (This is probably a bio question, not a bioperl question, but a quick Google didn't get me an answer. "Phase" isn't a very good search term.) I guess the real question here, which Jason alludes to, is whether SeqFeature->spliced_seq ought to take into account the phase information of the first exon. Right now, it doesn't, so when you call SeqFeature->spliced_seq->translate, you get gibberish. Are there cases where you would want spliced_seq to include the first bp or two? Should there be an option to spliced_seq for whether you want to take phase information into account? I can't submit a bug report until we confirm it's a bug. Thanks, -Amir Karger > -jason > On Dec 7, 2006, at 1:32 PM, Amir Karger wrote: > > > I need to know how to get the frame information in exon features > > (created by Bio::Tools::GFF) into a whole-gene feature that will be > > translated into a protein. > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > - Create a Bio::Location::Split object containing each feature's > > location > > - Create a Bio::SeqFeature::Generic object whose location > is the above > > BL::Split > > - Attach my contig Bio::Seq to the feature > > - get the protein with feature->spliced_seq->translate->seq > > > > (Code below) > > > > Unfortunately, I get the wrong result when the GFF features > have frame > > != 0. This happens for only a few percent of the exons, but when it > > does, I end up translating in the wrong frame. > > > > If I read the docs correctly, Location objects don't have a > frame. So > > how do I get the correct spliced_seq, which skips one or > two bp at the > > beginning of certain exons? > > > > I suspect the answer to this is that I'm going about this in > > completely > > the wrong way, in which case, please tell me how I ought to be > > doing it. > > > > Thanks, > > - Amir Karger > > Research Computing > > Life Sciences Division > > Harvard University > > > > P.S. In case you want to see actual code, here it is. After using > > Bio::Tools::GFF to create a sorted list of features for each exon > > (basically stolen from the module POD), I: > > # Create a new object representing the exons' gene > > my $coding_loc_obj = new Bio::Location::Split; > > foreach my $exon (@sorted_exons) { > > $coding_loc_obj->add_sub_Location($exon->location); > > } > > > > # Build a spliced feature representing the whole gene > > my $spliced_feat = new Bio::SeqFeature::Generic( > > -start => $coding_loc_obj->start, > > -end => $coding_loc_obj->end, > > -strand => $strand_num, > > -primary=> "splicedGene", > > ); > > $spliced_feat->location($coding_loc_obj); > > > > # Attach a contig object containing the sequence > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > # Get the spliced seq and translate to protein: > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > my $protein = $spliced_feat->spliced_seq->translate->seq; > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From akarger at CGR.Harvard.edu Fri Dec 8 18:33:09 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 8 Dec 2006 13:33:09 -0500 Subject: [Bioperl-l] Using frame info from GFF in gettinga Seq->spliced_seq Message-ID: > Another issue is the splittype() is not defined, though I > don't think that > would kill anything as currently implemented. However, one > thing we have > passingly discussed is having Bio::Location::Split objects > possibly exhibit > different (but expected) behaviors based upon the splittype() > (order, join, > or bond). It's one of the things I want to work out for the > next release. Should I be writing -splittype => "JOIN" or some such in my new()? -Amir Karger > > chris > > > Amir, > > > > I don't know for sure what the problem is, but here is one > > possibility: > > the number in column 8 of a GFF file is not the frame, it is > > the phase. > > See the GFF3 spec for a description of what the phase is: > > > > http://www.sequenceontology.org/gff3.shtml > > > > (It doesn't matter if you are using GFF3 or GFF2, as the > > phase is the same in both). > > > > Scott > > > > > > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote: > > > I need to know how to get the frame information in exon features > > > (created by Bio::Tools::GFF) into a whole-gene feature > that will be > > > translated into a protein. > > > > > > I'm reading in some fungal GFFs generated by Jason Stajich. I > > > > > > - Use Bio::Tools::GFF to create a feature for each exon in a gene > > > - Create a Bio::Location::Split object containing each feature's > > > location > > > - Create a Bio::SeqFeature::Generic object whose location > > is the above > > > BL::Split > > > - Attach my contig Bio::Seq to the feature > > > - get the protein with feature->spliced_seq->translate->seq > > > > > > (Code below) > > > > > > Unfortunately, I get the wrong result when the GFF features > > have frame > > > != 0. This happens for only a few percent of the exons, > but when it > > > does, I end up translating in the wrong frame. > > > > > > If I read the docs correctly, Location objects don't have a > > frame. So > > > how do I get the correct spliced_seq, which skips one or > > two bp at the > > > beginning of certain exons? > > > > > > I suspect the answer to this is that I'm going about this in > > > completely the wrong way, in which case, please tell me how > > I ought to be doing it. > > > > > > Thanks, > > > - Amir Karger > > > Research Computing > > > Life Sciences Division > > > Harvard University > > > > > > P.S. In case you want to see actual code, here it is. After using > > > Bio::Tools::GFF to create a sorted list of features for each exon > > > (basically stolen from the module POD), I: > > > # Create a new object representing the exons' gene > > > my $coding_loc_obj = new Bio::Location::Split; > > > foreach my $exon (@sorted_exons) { > > > $coding_loc_obj->add_sub_Location($exon->location); > > > } > > > > > > # Build a spliced feature representing the whole gene > > > my $spliced_feat = new Bio::SeqFeature::Generic( > > > -start => $coding_loc_obj->start, > > > -end => $coding_loc_obj->end, > > > -strand => $strand_num, > > > -primary=> "splicedGene", > > > ); > > > $spliced_feat->location($coding_loc_obj); > > > > > > # Attach a contig object containing the sequence > > > $spliced_feat->attach_seq($contig_obj->bioperl_object); > > > > > > # Get the spliced seq and translate to protein: > > > my $coding_seq = $spliced_feat->spliced_seq->seq; > > > my $protein = $spliced_feat->spliced_seq->translate->seq; > > > > From cjfields at uiuc.edu Fri Dec 8 19:04:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 13:04:55 -0600 Subject: [Bioperl-l] Using frame info from GFF ingettinga Seq->spliced_seq In-Reply-To: Message-ID: <000901c71afb$bf504210$15327e82@pyrimidine> > > Another issue is the splittype() is not defined, though I > don't think > > that would kill anything as currently implemented. > However, one thing > > we have passingly discussed is having Bio::Location::Split objects > > possibly exhibit different (but expected) behaviors based upon the > > splittype() (order, join, or bond). It's one of the things > I want to > > work out for the next release. > > Should I be writing -splittype => "JOIN" or some such in my new()? > > -Amir Karger I missed the fact that 'JOIN' is the default splittype() from looking at the constructor in Location::Split, so you actually don't have to explicitly set it; apologies for that. If we make any changes that affect how Location::Split behaves we'll likely leave the default splittype() as 'JOIN' as it's by far the most common join operator. chris From cjfields at uiuc.edu Fri Dec 8 20:03:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Dec 2006 14:03:16 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: Message-ID: <000001c71b03$e6741e90$15327e82@pyrimidine> > Yes, I think. Scott Cain pointed out that GFF column 8 is the > "phase", which I had never heard of before. My current, very > limited, understanding is that sometimes you'll have an exon > with, say, 31 bp, followed by an exon with 29 bp. When the > intron gets spliced out, you eventually get an mRNA of 60 bp, > which translates to a protein of 20 aa. > But the second exon has a phase of 1, not 0, because you > can't just start translating at the first bp of the second > exon and expect to get nice amino acids. I think the use of 'frame' here is meant relative to the DNA sequence (i.e. ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e. translation, three frames). At least I think that's what is meant! > By the way, whether or not phase is the same thing as frame, > when I call the frame() method on the features created by > Bio::Tools::GFF, I get the phase info. I assume that's a > feature (no pun intended), not a bug? > > I'm still confused as to why you would have a phase in the > first exon, though. Why not just say the CDS starts 1 or 2 bp > later? (This is probably a bio question, not a bioperl > question, but a quick Google didn't get me an answer. "Phase" > isn't a very good search term.) It could be b/c the location coordinates delineate the exon coding boundary. It's conceivable the first exon in a sequence record is not the first exon of the mRNA (i.e. there may be one or more exons prior to or past the exon of interest that are in 'remote' sequence records). Like this admittedly extreme example (GB acc AF130134): join(AF130124.1:2563..2964,AF130125.1:21..157,AF130126.1:12..174, AF130127.1:21..112,AF130128.1:21..162,AF130128.1:281..595, AF130128.1:661..842,AF130128.1:916..1030,AF130129.1:21..115, AF130130.1:21..165,AF130131.1:21..125,AF130132.1:21..428, AF130132.1:492..746,AF130133.1:21..168,AF130133.1:232..401, AF130133.1:475..906,AF130133.1:970..1107,AF130133.1:1176..1367,21..128) Also, the ends of the lcoation may be uncertain ('fuzzy'): join(complement(1009..>1260),complement(AF081827.1:<1..177)) > I guess the real question here, which Jason alludes to, is whether > SeqFeature->spliced_seq ought to take into account the phase > information > of the first exon. Right now, it doesn't, so when you call > SeqFeature->spliced_seq->translate, you get gibberish. Are there cases > where you would want spliced_seq to include the first bp or > two? Should there be an option to spliced_seq for whether you > want to take phase information into account? > > I can't submit a bug report until we confirm it's a bug. > > Thanks, > -Amir Karger You can already pass the frame or an offset to PrimarySeqI::translate(). Here are the args: Args : -terminator - character for terminator default is * -unknown - character for unknown default is X -frame - frame default is 0 -codontable_id - codon table id default is 1 -complete - complete CDS expected default is 0 -throw - throw exception if not complete default is 0 -orf - find 1st ORF default is 0 -start - alternative initiation codon -codontable - Bio::Tools::CodonTable object -offset - offset for fuzzy locations default is 0 The offset comes from some GenBank seqfeatures which have an '\codon_start' tag indicating which nucleotide to start translation from (1,2,3). This is essentially just the phase+1. We could add a '-phase' argument for convenience which accepts 0,1,2. chris From bobfreemanma at speakeasy.net Fri Dec 8 20:47:15 2006 From: bobfreemanma at speakeasy.net (Bob Freeman) Date: Fri, 8 Dec 2006 15:47:15 -0500 Subject: [Bioperl-l] writing blastxml In-Reply-To: <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com> <000301c6f846$d6227760$15327e82@pyrimidine> <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> Message-ID: Can't seem to find a good post on this to answer my question: Does anyone know a good way to (re)write BLAST reports in XML format? I've got about 30,000 reports I need to rewrite for a (good!) piece of java software that will only import xml formatted BLAST reports. Right now, all mine are plain text. I don't think bioperl can do this yet, correct? If not, any suggestions, besides reblasting all 30,000? I'd like to save a few trees and lumps of coal. TIA, Bob -- ----------------------------------------------------- Bob Freeman, Ph.D. Bioinformatics consultant 51 Downer Avenue, #2 Dorchester, MA 02125 617/699.7057, vox If brains were taxed, he'd get a refund. -- Anonymous From camp_boot at hotmail.com Sun Dec 10 10:00:55 2006 From: camp_boot at hotmail.com (synapse) Date: Sun, 10 Dec 2006 10:00:55 +0000 (UTC) Subject: [Bioperl-l] Driver program for PestFind.pm Message-ID: Dear All, I apologize in advance for my almost total lack of knowledge of perl as a programming language. I need to use PestFind program, part of the biop_run package of bioperl. My understanding is that I will need a simple wrapper program that will read arguments from the command line, and pass them to that module. - Is there such program available that I can just use? - Does anyone know if pestfind can work on multiple sequence files (in fasta format), or does it only process single sequence files? Thanks a lot for the feedback. From cjfields at uiuc.edu Sun Dec 10 18:45:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 12:45:26 -0600 Subject: [Bioperl-l] writing blastxml In-Reply-To: References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com> <000301c6f846$d6227760$15327e82@pyrimidine> <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com> Message-ID: <7FB4EBB9-BEDC-4250-BE2F-3F695D36F350@uiuc.edu> On Dec 8, 2006, at 2:47 PM, Bob Freeman wrote: > Can't seem to find a good post on this to answer my question: > > Does anyone know a good way to (re)write BLAST reports in XML format? > I've got about 30,000 reports I need to rewrite for a (good!) piece > of java software that will only import xml formatted BLAST reports. > Right now, all mine are plain text. > > I don't think bioperl can do this yet, correct? If not, any > suggestions, besides reblasting all 30,000? I'd like to save a few > trees and lumps of coal. > > TIA, > Bob The only BioPerl writers for BLAST reports are in BSML and HTML, not BLAST XML. I don't think there there have been any requests for it, and no one has really stepped forward to submit one. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Dec 10 18:55:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 12:55:16 -0600 Subject: [Bioperl-l] Driver program for PestFind.pm In-Reply-To: References: Message-ID: <32B0F15D-4144-43B6-AA81-5ED9BA848F45@uiuc.edu> On Dec 10, 2006, at 4:00 AM, synapse wrote: > Dear All, > > I apologize in advance for my almost total lack of knowledge of > perl as a > programming language. > > I need to use PestFind program, part of the biop_run package of > bioperl. My > understanding is that I will need a simple wrapper program that > will read > arguments from the command line, and pass them to that module. PestFind is part of the EMBOSS suite of programs: http://emboss.sourceforge.net/ The PestFind module in bioperl-run is actually used via Pise. > - Is there such program available that I can just use? See above > - Does anyone know if pestfind can work on multiple sequence > files (in fasta > format), or does it only process single sequence files? > > Thanks a lot for the feedback. No idea there, but the EMBOSS docs should tell you. chris From cjfields at uiuc.edu Mon Dec 11 05:38:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 10 Dec 2006 23:38:32 -0600 Subject: [Bioperl-l] bioperl-run parameter question Message-ID: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> I am writing up a few bioperl-run modules and have a simple question, though I don't know if anyone knows the answer. I was curious as to why parameters for most (all?) bioperl-run modules lack the '-' preceding them. This came up re: StandAloneBlast last week (something Torsten fixed), but I noticed just about every bioperl-run module uses the dashless parameters. chris From n.haigh at sheffield.ac.uk Mon Dec 11 06:44:25 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Mon, 11 Dec 2006 06:44:25 +0000 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> Message-ID: <457CFE49.5010201@sheffield.ac.uk> Chris Fields wrote: > I am writing up a few bioperl-run modules and have a simple question, > though I don't know if anyone knows the answer. I was curious as to > why parameters for most (all?) bioperl-run modules lack the '-' > preceding them. This came up re: StandAloneBlast last week > (something Torsten fixed), but I noticed just about every bioperl-run > module uses the dashless parameters. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > No idea! Is there any reason for/against using dashed/dashless parameters? I suppose dshed parameters allow you to easy see which tokens on the command line are parameters and which are values. Should modules be able to accept both? Should dashed be preferred? Nath From cjfields at uiuc.edu Mon Dec 11 13:06:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 07:06:32 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457CFE49.5010201@sheffield.ac.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457CFE49.5010201@sheffield.ac.uk> Message-ID: On Dec 11, 2006, at 12:44 AM, Nathan S. Haigh wrote: > Chris Fields wrote: >> I am writing up a few bioperl-run modules and have a simple question, >> though I don't know if anyone knows the answer. I was curious as to >> why parameters for most (all?) bioperl-run modules lack the '-' >> preceding them. This came up re: StandAloneBlast last week >> (something Torsten fixed), but I noticed just about every bioperl-run >> module uses the dashless parameters. >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > No idea! > > Is there any reason for/against using dashed/dashless parameters? I > suppose dshed parameters allow you to easy see which tokens on the > command line are parameters and which are values. Should modules be > able > to accept both? Should dashed be preferred? > > Nath > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l I'm thinking about it from the point of consistency. When using a mix of core and run modules it can be a bit confusing, particularly when (as pointed out in the previous thread on StandAloneBlast) you can use only dashed parameters with core modules, while most (all?) run modules only accept dashless ones (in most cases some exception is thrown). Torsten fixed this in StandAloneBlast so it accepts both, but shouldn't this rule also apply to all run modules? Much of this probably is probably due to the donated nature of much of the bioperl-run code and Jason's 'cat-herding', and I understand that it would be a lot of work to change this for all run modules. However, we could at least try to start enforcing some loose rules with new bioperl-run wrappers (e.g. implement WrapperBase, use core- like parameters, etc). chris From akarger at CGR.Harvard.edu Mon Dec 11 16:20:03 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 11 Dec 2006 11:20:03 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq Message-ID: Chris Fields wrote: > > > Yes, I think. Scott Cain pointed out that GFF column 8 is the > > "phase", which I had never heard of before. My current, very > > limited, understanding is that sometimes you'll have an exon > > with, say, 31 bp, followed by an exon with 29 bp. When the > > intron gets spliced out, you eventually get an mRNA of 60 bp, > > which translates to a protein of 20 aa. > > But the second exon has a phase of 1, not 0, because you > > can't just start translating at the first bp of the second > > exon and expect to get nice amino acids. > > I think the use of 'frame' here is meant relative to the DNA > sequence (i.e. > ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e. > translation, three frames). At least I think that's what is meant! I agree. By the way, I'd love a reference to a simple bio-explanation of what's happening here. Google searches for "coding sequence phase" are not all that relevant. > > I'm still confused as to why you would have a phase in the > > first exon, though. Why not just say the CDS starts 1 or 2 bp > > later? (This is probably a bio question, not a bioperl > > question, but a quick Google didn't get me an answer. "Phase" > > isn't a very good search term.) > > It could be b/c the location coordinates delineate the exon > coding boundary. > It's conceivable the first exon in a sequence record is not > the first exon > of the mRNA (i.e. there may be one or more exons prior to or > past the exon > of interest that are in 'remote' sequence records). That's certainly not the case here, because the files have the entire genomes in them. > Also, the ends of the lcoation may be uncertain ('fuzzy'): > > join(complement(1009..>1260),complement(AF081827.1:<1..177)) Also not the case here. These locations aren't listed as fuzzy. Any other thoughts? > > I guess the real question here, which Jason alludes to, is whether > > SeqFeature->spliced_seq ought to take into account the phase > > information > > of the first exon. Right now, it doesn't, so when you call > > SeqFeature->spliced_seq->translate, you get gibberish. Are > there cases > > where you would want spliced_seq to include the first bp or > > two? Should there be an option to spliced_seq for whether you > > want to take phase information into account? > > You can already pass the frame or an offset to > PrimarySeqI::translate(). > We could add a '-phase' argument for > convenience which accepts 0,1,2. But as Jason pointed out, you should find the problem earlier. What if I want to get the RNA sequence that will become the protein? then having a phase arg to translate() doesn't help. Should there be a phase arg to spliced_seq? Which raises another bio question: at what point are the first 1 or 2 bp dropped when you have a phase of 1 or 2? Do they appear in the mRNA? -Amir Karger From bix at sendu.me.uk Mon Dec 11 18:21:42 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 11 Dec 2006 13:21:42 -0500 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> Message-ID: <457DA1B6.1060706@sendu.me.uk> Chris Fields wrote: > I am writing up a few bioperl-run modules and have a simple question, > though I don't know if anyone knows the answer. I was curious as to > why parameters for most (all?) bioperl-run modules lack the '-' > preceding them. This came up re: StandAloneBlast last week > (something Torsten fixed), but I noticed just about every bioperl-run > module uses the dashless parameters. I didn't follow that particular thread, but from my experience there is a useful distinction between bioperl options using the - as normal for full consistency with core (eg. -verbose), whilst the options that belong to the program the run module is a wrapper for do not take dashes. Again, this seems consistent within the run package. I'd suggest sticking to the current pattern. Cheers, Sendu. From cjfields at uiuc.edu Mon Dec 11 20:07:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 14:07:16 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457DA1B6.1060706@sendu.me.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> Message-ID: On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote: > Chris Fields wrote: >> I am writing up a few bioperl-run modules and have a simple >> question, though I don't know if anyone knows the answer. I was >> curious as to why parameters for most (all?) bioperl-run modules >> lack the '-' preceding them. This came up re: StandAloneBlast >> last week (something Torsten fixed), but I noticed just about >> every bioperl-run module uses the dashless parameters. > > I didn't follow that particular thread, but from my experience > there is a useful distinction between bioperl options using the - > as normal for full consistency with core (eg. -verbose), whilst the > options that belong to the program the run module is a wrapper for > do not take dashes. Again, this seems consistent within the run > package. I respectfully disagree that this is a 'useful' distinction. My main point is consistency. To me, it's counterintuitive to have two Bioperl classes, both which inherit Bio::Root::Root, use two different syntaxes for any parameters passed to the constructor, even if some are 'program' parameters. It's also not consistent with StandAloneBlast or RemoteBlast, both which are considered bioperl-run modules even though they are in core, and both or which use dashed parameters (StandAloneBlast actually allows both). In fact, it isn't consistent within bioperl-run itself. Bio::Tools::Run::EMBOSSApplication uses dashes for parameters in a hashref! Okay, judging by the previous examples, 'consistency' isn't a word I would use to describe bioperl-run as a whole (back to Jason's 'cat- herding' analogy). It would be easier to let it slide for now, especially since changing them would be a serious pain, not to mention an API issue. But shouldn't there be some consistency? And what about new modules? Do we follow the historical (possibly confusing) 'dashless' route, or use the core-like dashed approach (thus breaking from the other run modules)? > I'd suggest sticking to the current pattern. > > > Cheers, > Sendu. I'll allow for both, ala StandAloneBlast. Doesn't hurt to be safe. ; > Have fun at the hackathon! chris From bix at sendu.me.uk Mon Dec 11 21:19:55 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 11 Dec 2006 16:19:55 -0500 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> Message-ID: <457DCB7B.8050500@sendu.me.uk> Chris Fields wrote: > > On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> I am writing up a few bioperl-run modules and have a simple >>> question, though I don't know if anyone knows the answer. I was >>> curious as to why parameters for most (all?) bioperl-run modules >>> lack the '-' preceding them. This came up re: StandAloneBlast last >>> week (something Torsten fixed), but I noticed just about every >>> bioperl-run module uses the dashless parameters. >> >> I didn't follow that particular thread, but from my experience there >> is a useful distinction between bioperl options using the - as normal >> for full consistency with core (eg. -verbose), whilst the options that >> belong to the program the run module is a wrapper for do not take >> dashes. Again, this seems consistent within the run package. > > I respectfully disagree that this is a 'useful' distinction. My main > point is consistency. [snip] We're on the same page in terms of what we think would be a Good Thing, and allowing both ways (dashed and dashless) sounds reasonable. I was just suggesting why bioperl-run might be the way it was. Further to that, there is the practical aspect that it is a lot simpler to figure out which are the program options so they can be farmed out to the AUTOLOAD methods - again something that isn't done in core. If you come up with some generic way of dealing with options and farming to AUTOLOAD, perhaps there's scope for applying it to all the run wrappers (ideally via one of their base classes), so they all instantly gain dashed-mode capability. From cjfields at uiuc.edu Mon Dec 11 22:05:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 16:05:56 -0600 Subject: [Bioperl-l] bioperl-run parameter question In-Reply-To: <457DCB7B.8050500@sendu.me.uk> References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu> <457DA1B6.1060706@sendu.me.uk> <457DCB7B.8050500@sendu.me.uk> Message-ID: On Dec 11, 2006, at 3:19 PM, Sendu Bala wrote: ... >> >> I respectfully disagree that this is a 'useful' distinction. My main >> point is consistency. > [snip] > > We're on the same page in terms of what we think would be a Good > Thing, > and allowing both ways (dashed and dashless) sounds reasonable. I was > just suggesting why bioperl-run might be the way it was. Further to > that, there is the practical aspect that it is a lot simpler to figure > out which are the program options so they can be farmed out to the > AUTOLOAD methods - again something that isn't done in core. Maybe b/c AUTOLOAD is frowned upon for a number of reasons, mainly code maintenance. I'm somewhat neutral on the idea of using AUTOLOAD as a short-term solution, though using heredoc and an eval{} block works well for me (and shows up when using $self->can('method') or when checking for methods via Class::Inspector). > If you come up with some generic way of dealing with options and > farming > to AUTOLOAD, perhaps there's scope for applying it to all the run > wrappers (ideally via one of their base classes), so they all > instantly > gain dashed-mode capability. I think that's the crux of the problem; they do not all have the same base class (except Bio::Root::Root). Most use WrapperBase. I thought at one point a Run-specific root module would be a good idea, but WrapperBase already works well. I'll go ahead with my modules and think about it some more. You could ask the powers-that-be (jason, hilmar, etc) what they think as well. chris From bosborne11 at verizon.net Mon Dec 11 22:24:54 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 11 Dec 2006 17:24:54 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: Message-ID: Amir, Google "intron phase", you will see a number of useful links. Brian O. On 12/11/06 11:20 AM, "Amir Karger" wrote: > I agree. By the way, I'd love a reference to a simple bio-explanation of > what's happening here. Google searches for "coding sequence phase" are > not all that relevant. From cjfields at uiuc.edu Tue Dec 12 03:20:06 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Dec 2006 21:20:06 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: On Dec 11, 2006, at 10:20 AM, Amir Karger wrote: >> I think the use of 'frame' here is meant relative to the DNA >> sequence (i.e. >> ORF searching, 6 frames) and the 'phase' is relative to the mRNA >> (i.e. >> translation, three frames). At least I think that's what is meant! > > I agree. By the way, I'd love a reference to a simple bio- > explanation of > what's happening here. Google searches for "coding sequence phase" are > not all that relevant. Ah, Brian found some links I see... >> It could be b/c the location coordinates delineate the exon >> coding boundary. >> It's conceivable the first exon in a sequence record is not >> the first exon >> of the mRNA (i.e. there may be one or more exons prior to or >> past the exon >> of interest that are in 'remote' sequence records). > > That's certainly not the case here, because the files have the entire > genomes in them. > >> Also, the ends of the lcoation may be uncertain ('fuzzy'): >> >> join(complement(1009..>1260),complement(AF081827.1:<1..177)) > > Also not the case here. These locations aren't listed as fuzzy. > > Any other thoughts? Which GFF files did you use? More specifically, which genes in which GFF file? I saw a reference to S. bayanus, but it's hard to work out what could be the problem unless we know a bit more. >>> I guess the real question here, which Jason alludes to, is whether >>> SeqFeature->spliced_seq ought to take into account the phase >>> information >>> of the first exon. Right now, it doesn't, so when you call >>> SeqFeature->spliced_seq->translate, you get gibberish. Are >> there cases >>> where you would want spliced_seq to include the first bp or >>> two? Should there be an option to spliced_seq for whether you >>> want to take phase information into account? >> >> You can already pass the frame or an offset to >> PrimarySeqI::translate(). >> We could add a '-phase' argument for >> convenience which accepts 0,1,2. > > But as Jason pointed out, you should find the problem earlier. What > if I > want to get the RNA sequence that will become the protein? then > having a > phase arg to translate() doesn't help. Should there be a phase arg to > spliced_seq? You'll also note Jason mentioned there were possible errors in the gene prediction programs which produced the output spliced_seq() is supposed to return the DNA sequence of a split location by splicing together the sublocation sequences in their 'join' order. So, if the first exon was out of phase, once spliced they should all be out of phase to the same degree, assuming all exons are joined together correctly. Translating this using the phase should produce the correct amino acid sequence. Note that Jason suggested passing the frame/phase of the first exon to translate(), not spliced_seq(). I also suggested translate(). > Which raises another bio question: at what point are the first 1 or > 2 bp > dropped when you have a phase of 1 or 2? Do they appear in the mRNA? > > -Amir Karger Any sequence present in the sublocations (exons) would be in the spliced sequence. This would have to include those nucleotides in exons skipped b/c of the phase since they are part of the coding region. chris From neetisomaiya at gmail.com Tue Dec 12 12:06:20 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:36:20 +0530 Subject: [Bioperl-l] need help in phredPhrap Message-ID: <764978cf0612120406m796b116dncd3a9e6c82ffe682@mail.gmail.com> Hi, I am running phredPharp, which runs phred, phrap and polyphred. Please refer to the "Using a reference sequence" section of this link http://droog.mbt.washington.edu/poly_doc50.html#REFER. I am using the reference sequence as described in the link above. With this I am getting the SNP positions on the contig sequence as well as on the reference sequence. Does anyone know if there is some output file which can also give me mapping between contig sequence and reference sequence? -- -Neeti Even my blood says, B positive From akarger at CGR.Harvard.edu Tue Dec 12 16:05:43 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 12 Dec 2006 11:05:43 -0500 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq Message-ID: (sorry if this thread is boring people) Chris Fields wrote: > > I agree. By the way, I'd love a reference to a simple bio- > > explanation of > > what's happening here. Google searches for "coding sequence > phase" are > > not all that relevant. > > Ah, Brian found some links I see... Thanks, Brian! Amazing how "coding sequence phase" finds nothing but "intron phase" finds a ton. This is why you need to actually learn biology, rather than Googling it. > Which GFF files did you use? More specifically, which genes > in which > GFF file? I saw a reference to S. bayanus, but it's hard to > work out > what could be the problem unless we know a bit more. http://fungal.genome.duke.edu/annotations/sbay/gff/saccharomyces_bayanus .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!) c127 (for example) has two lines in that file: sbay_c127 AUGUSTUS mRNA 263 723 . + . ID=sbay_c127-g1.1 sbay_c127 AUGUSTUS CDS 263 723 . + 1 Parent=sbay_c127-g1.1 Now go to gbrowse page: http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/ Type "sbay_c127:250-300" in the search box. As you can see from the translation track, if you start at bp 263, you hit a stop codon after just a few aas. But if you use frame2/phase 1, you get no stop codons all the way to the end of the contig. > >> You can already pass the frame or an offset to > >> PrimarySeqI::translate(). > >> We could add a '-phase' argument for > >> convenience which accepts 0,1,2. > > > > What if I > > want to get the RNA sequence that will become the protein? then > > having a > > phase arg to translate() doesn't help. Should there be a > phase arg to > > spliced_seq? > > You'll also note Jason mentioned there were possible errors in the > gene prediction programs which produced the output That's certainly possible. No gene prediction program will be perfect. In this case, though, it's clear that it found a large region without stop codons in it, and correctly identified the place to start translating. I guess I'm just surprised that, if it found just one exon in a gene (in the whole contig) why it would say the exon starts at 263 with a phase 1, instead of just saying it starts at 264. > spliced_seq() is supposed to return the DNA sequence of a split > location by splicing together the sublocation sequences in their > 'join' order. So, if the first exon was out of phase, once spliced > they should all be out of phase to the same degree, assuming all > exons are joined together correctly. Translating this using the > phase should produce the correct amino acid sequence. > > Note that Jason suggested passing the frame/phase of the first exon > to translate(), not spliced_seq(). I also suggested translate(). You're right. This brings the number of translated polypeptide sequences that have lots of *s in them to 9 instead of 90. I guess I have two requests here. The first is, if a person wants to see exactly which bps are translated to aas -- a nucelotide sequece of exactly 3N bp starting (usually) with ATG -- then they might want an argument to spliced_seq that skips the first one or two bp when necessary. After all, they might want to study the DNA, not the peptides. The second request is for "intelligent objects". If my SeqFeatures know that they're in phase 1, then when I call spliced_seq I want the resulting objects to know that they're phase one, such that when I call translate, Bioperl automatically skips the first bp or two. Admittedly, there might be big ramifications to this. Both requests of course made in the knowledge that Bioperl is open source & developers have a lot to do with their time. -Amir Karger > > Which raises another bio question: at what point are the > first 1 or > > 2 bp > > dropped when you have a phase of 1 or 2? Do they appear in the mRNA? > > > > -Amir Karger > > Any sequence present in the sublocations (exons) would be in the > spliced sequence. This would have to include those nucleotides in > exons skipped b/c of the phase since they are part of the > coding region. > > chris > From neetisomaiya at gmail.com Tue Dec 12 12:14:10 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:44:10 +0530 Subject: [Bioperl-l] needle parser in bioperl? Message-ID: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Hi, Does anyone know of a bioperl parser for needle output, basically I won't where the target sequence aligns on the template (i.e. coordinate on the template where the taget aligns). -- -Neeti Even my blood says, B positive From cjfields at uiuc.edu Tue Dec 12 16:57:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 10:57:27 -0600 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > Hi, > > Does anyone know of a bioperl parser for needle output, basically I > won't > where the target sequence aligns on the template (i.e. coordinate > on the > template where the taget aligns). > > -- > -Neeti > Even my blood says, B positive I answered this a number of months back: http://tinyurl.com/yzlbx5 Basically, newer versions of EMBOSS have changed the output for the AlignIO::emboss parser (which parses needle). I don't believe the parser has been fixed to deal with that, but Jason has pointed out you can use MSF output when running needle, then parse using AlignIO with the format set to 'msf'. chris From bosborne11 at verizon.net Tue Dec 12 16:51:05 2006 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 12 Dec 2006 11:51:05 -0500 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: Neeti, EMBOSS' needle and water produce alignments in what Bioperl calls 'emboss' format, so you can use AlignIO to get SimpleAlign objects. The best description of how to use SimpleAlign is the documentation in the module. Brian O. On 12/12/06 7:14 AM, "neeti somaiya" wrote: > Hi, > > Does anyone know of a bioperl parser for needle output, basically I won't > where the target sequence aligns on the template (i.e. coordinate on the > template where the taget aligns). From kaboroev at sfu.ca Tue Dec 12 17:14:39 2006 From: kaboroev at sfu.ca (Keith Anthony Boroevich) Date: Tue, 12 Dec 2006 09:14:39 -0800 Subject: [Bioperl-l] BLAST reports Message-ID: <457EE37F.4020000@sfu.ca> Hi everyone, I would like to manipulate my blast results with bioperl but would also like to have the html output of the blast. What would be the best way of going about this, as I don't see any write functions in any of the blast modules I have looked at. Would it be better to create my own html layout from the blast data then attempt to recover this from bioperl? keith p.s. - does anyone know what the most informative blast "alignment view" output is? xml i suppose? -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 From cjfields at uiuc.edu Tue Dec 12 18:45:05 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 12:45:05 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: On Dec 12, 2006, at 10:05 AM, Amir Karger wrote: ... > http://fungal.genome.duke.edu/annotations/sbay/gff/ > saccharomyces_bayanus > .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!) > > c127 (for example) has two lines in that file: > sbay_c127 AUGUSTUS mRNA 263 723 . + > . ID=sbay_c127-g1.1 > sbay_c127 AUGUSTUS CDS 263 723 . + > 1 Parent=sbay_c127-g1.1 > > Now go to gbrowse page: > http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/ > Type "sbay_c127:250-300" in the search box. > > As you can see from the translation track, if you start at bp 263, you > hit a stop codon after just a few aas. But if you use frame2/phase 1, > you get no stop codons all the way to the end of the contig. Yes, but there are two things. First, there is no distinct start codon. Second, this is what the top NCBI BLASTX hit for that particular exon is: >gi|6323195|ref|NP_013267.1| Gene info Essential 100kDa subunit of the exocyst complex (Sec3p, Sec5p, Sec6p, Sec8p, Sec10p, Sec15p, Exo70p, and Exo84p), which has the essential function of mediating polarized targeting of secretory vesicles to active sites of exocytosis; Sec10p [Saccharomyces cerevisiae] gi|2498891|sp|Q06245|SEC10_YEAST Gene info Exocyst complex component SEC10 gi|1234854|gb|AAB67490.1| Gene info L9362.12 gene product gi|1781307|emb|CAA70041.1| Gene info 100 kD exocyst complex component [Saccharomyces cerevisiae] Length=871 Score = 285 bits (728), Expect = 7e-77 Identities = 141/152 (92%), Positives = 149/152 (98%), Gaps = 0/152 (0%) Frame = +2 Query 2 FNDFYSMGKSDIVEQLRLSKNWKFNLKSVILMKNLLILSSKLETNSIPKTINTKLIIEKY 181 +NDFYSMGKSDIVEQLRLSKNWK NLKSV LMKNLLILSSKLET+SIPKTINTKL +IEKY Sbjct 168 YNDFYSMGKSDIVEQLRLSKNWKLNLKSVKLMKNLLILSSKLETSSIPKTINTKLVIEKY 227 Query 182 SEMMENKLLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE 361 SEMMEN +LLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE Sbjct 228 SEMMENELLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE 287 Query 362 NEFENVFIKNVKFKERLVDFESHSVIVEASMQ 457 NEFENVFIKNVKFKE+L+DFE+HSVI+E SMQ Sbjct 288 NEFENVFIKNVKFKEQLIDFENHSVIIETSMQ 319 Note the query start is well into the predicted coding sequence. Both the lack of a start codon and the above BLASTX hit suggest this is not actually the first exon in the coding region. Therefore the sequence retrieved from spliced_seq() is only part of the full coding region (it seems to lack at least one 3' exon as well). >>>> You can already pass the frame or an offset to >>>> PrimarySeqI::translate(). >>>> We could add a '-phase' argument for >>>> convenience which accepts 0,1,2. >>> >>> What if I >>> want to get the RNA sequence that will become the protein? then >>> having a >>> phase arg to translate() doesn't help. Should there be a >> phase arg to >>> spliced_seq? >> >> You'll also note Jason mentioned there were possible errors in the >> gene prediction programs which produced the output > > That's certainly possible. No gene prediction program will be perfect. > In this case, though, it's clear that it found a large region without > stop codons in it, and correctly identified the place to start > translating. I guess I'm just surprised that, if it found just one > exon > in a gene (in the whole contig) why it would say the exon starts at > 263 > with a phase 1, instead of just saying it starts at 264. Maybe the gene prediction didn't find the first exon, or didn't tie the predicted exons together. Not unusual considering the number of predictions made. >> spliced_seq() is supposed to return the DNA sequence of a split >> location by splicing together the sublocation sequences in their >> 'join' order. So, if the first exon was out of phase, once spliced >> they should all be out of phase to the same degree, assuming all >> exons are joined together correctly. Translating this using the >> phase should produce the correct amino acid sequence. >> >> Note that Jason suggested passing the frame/phase of the first exon >> to translate(), not spliced_seq(). I also suggested translate(). > > You're right. This brings the number of translated polypeptide > sequences > that have lots of *s in them to 9 instead of 90. > > I guess I have two requests here. The first is, if a person wants > to see > exactly which bps are translated to aas -- a nucelotide sequece of > exactly 3N bp starting (usually) with ATG -- then they might want an > argument to spliced_seq that skips the first one or two bp when > necessary. After all, they might want to study the DNA, not the > peptides. > > The second request is for "intelligent objects". If my SeqFeatures > know > that they're in phase 1, then when I call spliced_seq I want the > resulting objects to know that they're phase one, such that when I > call > translate, Bioperl automatically skips the first bp or two. > Admittedly, > there might be big ramifications to this. > > Both requests of course made in the knowledge that Bioperl is open > source & developers have a lot to do with their time. > > -Amir Karger You may want to post these as enhancement requests to Bugzilla just so we can keep track. I think passing a phase parameter to spliced_seq() can be easily accomplished; it's just a matter of returning a subseq of the spliced sequence based on the phase if set. In fact, I am testing it out now. The second may be more problematic, since there may be a time when one would want those extra nucleotides, so I don't think we would want removal of said nucleotides to be the default behavior. Chris From dmessina at wustl.edu Tue Dec 12 18:44:29 2006 From: dmessina at wustl.edu (David Messina) Date: Tue, 12 Dec 2006 12:44:29 -0600 Subject: [Bioperl-l] BLAST reports In-Reply-To: <457EE37F.4020000@sfu.ca> References: <457EE37F.4020000@sfu.ca> Message-ID: <083B4D17-CC7A-406C-9037-4DA5DC31AA05@wustl.edu> Hi Keith, Take a look at: http://www.bioperl.org/wiki/HOWTO:SearchIO You can read in a whole bunch of different blast formats (see Table 1), and it is possible to write out in HTML. See: http://www.bioperl.org/wiki/HOWTO:SearchIO#Writing_and_formatting_output I'm not sure what you mean by the most informative blast output. If you mean which one gives the most information, I'm pretty sure the standard Blast report has everything. Dave From neetisomaiya at gmail.com Tue Dec 12 12:09:39 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 12 Dec 2006 17:39:39 +0530 Subject: [Bioperl-l] problem in running needle Message-ID: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> I am trying to run needle for the attached two sequence files, on a linux machine. It says "Uncaught exception: Assertion failed, raised at ajmem.c :187". Can anyone tell me what this could be coz of? -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: SEQ_1.REF Type: application/octet-stream Size: 44208 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: seq_of_contig11 Type: application/octet-stream Size: 44344 bytes Desc: not available URL: From cjfields at uiuc.edu Tue Dec 12 20:55:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 14:55:07 -0600 Subject: [Bioperl-l] problem in running needle In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> Message-ID: On Dec 12, 2006, at 6:09 AM, neeti somaiya wrote: > I am trying to run needle for the attached two sequence files, on a > linux > machine. It says "Uncaught exception: Assertion failed, raised at > ajmem.c > :187". > Can anyone tell me what this could be coz of? > > -- > -Neeti > Even my blood says, B positive > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l This would be an EMBOSS error, not a BioPerl error. Maybe the emboss list is the best place for this question? http://emboss.open-bio.org/mailman/listinfo/emboss Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Dec 12 21:30:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Dec 2006 15:30:30 -0600 Subject: [Bioperl-l] Using frame info from GFF in getting aSeq->spliced_seq In-Reply-To: References: Message-ID: <093AE0FF-3C88-4F97-B33F-836B295E3DE3@uiuc.edu> On Dec 12, 2006, at 10:05 AM, Amir Karger wrote: >> Note that Jason suggested passing the frame/phase of the first exon >> to translate(), not spliced_seq(). I also suggested translate(). > > You're right. This brings the number of translated polypeptide > sequences > that have lots of *s in them to 9 instead of 90. > > I guess I have two requests here. The first is, if a person wants > to see > exactly which bps are translated to aas -- a nucelotide sequece of > exactly 3N bp starting (usually) with ATG -- then they might want an > argument to spliced_seq that skips the first one or two bp when > necessary. After all, they might want to study the DNA, not the > peptides. > > The second request is for "intelligent objects". If my SeqFeatures > know > that they're in phase 1, then when I call spliced_seq I want the > resulting objects to know that they're phase one, such that when I > call > translate, Bioperl automatically skips the first bp or two. > Admittedly, > there might be big ramifications to this. > > Both requests of course made in the knowledge that Bioperl is open > source & developers have a lot to do with their time. > > -Amir Karger ... Amir, I committed some code to CVS where I added a -phase parameter option to SeqFeatureI::spliced_seq(). I also added some tests to SeqFeature.t. If you run the following after creating the SeqFeature object $sf (the seq object is $seq): $sf->attach_seq($seq); for my $phase (-1..3) { my $spliced = $sf->spliced_seq(-phase => $phase); print $spliced->seq,"\n"; print $spliced->translate->seq,"\n"; } You should get warnings for any other value than 0, 1, or 2. I'll also note that the sequence you are having trouble with (sbay_c127) is 712 bp, so it doesn't contain the complete coding region. I used it in the test case in SeqFeature.t. Chris From boris.steipe at utoronto.ca Tue Dec 12 21:26:14 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Tue, 12 Dec 2006 16:26:14 -0500 Subject: [Bioperl-l] problem in running needle In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com> Message-ID: Looks like a memory allocation problem. Your whole sequence is in one single line, throwing a few linebreaks in there every 80th character or so will probably do the trick. HTH Boris On 12-Dec-06, at 7:09 AM, neeti somaiya wrote: > I am trying to run needle for the attached two sequence files, on a > linux > machine. It says "Uncaught exception: Assertion failed, raised at > ajmem.c > :187". > Can anyone tell me what this could be coz of? > > -- > -Neeti > Even my blood says, B positive > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Derek.Fairley at bll.n-i.nhs.uk Wed Dec 13 10:00:16 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Wed, 13 Dec 2006 10:00:16 -0000 Subject: [Bioperl-l] BLAST reports In-Reply-To: <457EE37F.4020000@sfu.ca> Message-ID: Hi Keith, >I would like to manipulate my blast results with bioperl but would also >like to have the html output of the blast. What would be the best way >of going about this, as I don't see any write functions in any of the >blast modules I have looked at. Would it be better to create my own >html layout from the blast data then attempt to recover this from bioperl? Take a look at some of the example scripts here: http://www.bioperl.org/wiki/Bioperl_scripts Depending on your Bioperl installation, you may already have these in your /scripts directory or similar. The /examples/searchio/htmlwriter.pl script may be a good starting point. >p.s. - does anyone know what the most informative blast "alignment view" >output is? xml i suppose? Assuming you want to get the HSPs, parsing blastxml reports seems to be the most reliable approach. Again, there's a useful script for this: take a look at /scripts/utilities/search2alnblocks.pls. Derek. -- ><)))?> -cGRASP- < Keith Anthony Boroevich Davidson Lab Dept of Molecular Biology Simon Fraser University Tel: 604-268-7276 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Dec 13 18:02:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 13 Dec 2006 12:02:14 -0600 Subject: [Bioperl-l] Proposal for Meta data Message-ID: I am working on a few RNA-related things related to structure and have a few questions, specifically about Meta data. This is sort of a proposal, but I would like to get everybody's thoughts about this to gauge what everyone thinks. Jason, sorry to bug you but I thought it might be something that would be of use phylohackathon-wise. Heikki has several modules present which adds meta data to sequences (Bio::Seq::Meta). In this case, the meta data is stored as a string (Bio::Seq::Meta) or an array (Bio::Seq::Meta::Array). In both cases you can have multiple types of meta data for a sequence based on a particular tag. However, this also assumes that the meta data is somehow attached strictly to sequence data of some type. It also doesn't allow for having mixed meta data types for a single sequence, such as attaching array data and string data to the same sequence. Hence, I was thinking of a having a simple, generic meta data type (Bio::Meta), one which could encompass simple strings (Bio::Meta::Simple), arrays (Bio::Meta::Array), or any other structured type of data. This could be used to annotate any PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, maybe in a collection (similar to AnnotationCollection). I thought something like this may be of general use for any PrimarySeq (quality, structure), alignments like NEXUS and Stockholm, SeqFeatures where structure could be stored (tRNA or riboswitches), etc. However, this also seems to fall into the category of sequence annotation. So, would it be better to have a set of Bio::Annotation classes used for this purpose? Flames and jibes welcome; I'm wearing my asbestos suit today.... chris From stewarta at nmrc.navy.mil Thu Dec 14 01:06:14 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Wed, 13 Dec 2006 20:06:14 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects Message-ID: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> I am trying to StandAloneBlast->blastall an array or Bio::Seq objects. The documentation claims that blastall can be passed a file name, a Bio::Seq object, or an array of Bio::Seq objects, while the usage suggests that a reference to an array of Bio::Seq objects is what must be passed to blastall. (from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ Bio/Tools/Run/StandAloneBlast.html#POD5) Usage: $seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects $blast_report = $factory->blastall(\@seq_array); Should this be... $report = $factory->blastall(@seq_array); or $report = $factory->blastall(\@seq_array); ??? And if you are blastall'ing an array of Seq objects, then does blastall just return one big blast report or should I be expecting an array of blast reports? I've tried $report = $factory->blastall(@seq_array); which seems to work ok, except that when I process the results, there are only results for the first Seq object in the array. -Andrew -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From arareko at campus.iztacala.unam.mx Thu Dec 14 01:37:27 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 13 Dec 2006 19:37:27 -0600 Subject: [Bioperl-l] BioPerl page in Wikipedia Message-ID: <4580AAD7.3000900@campus.iztacala.unam.mx> Folks, I've updated a little bit of the BioPerl page in the Wikipedia. I think it would be nice if we expand the article a little bit more since it's tagged as a "stub". Here's the link: http://en.wikipedia.org/wiki/BioPerl Cheers, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From lubapardo at gmail.com Thu Dec 14 10:54:07 2006 From: lubapardo at gmail.com (Luba Pardo) Date: Thu, 14 Dec 2006 11:54:07 +0100 Subject: [Bioperl-l] (no subject) Message-ID: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> Hello, I am new bioperl and I have been trying to run the examples available in bptutorial.pl and other basic literature. I have installed the latest release of bioperl 1.5.2 in a usr/local/src directory. Any time I try to retrieve the SwissProt and EMBL databases it gives me an error. With genbank it seems to be fine. I wonder if the installation was not successful, as I would expect that these databases accesses were included in the modules of BioPerl Core. In addition, I would like to ask whether to run Clustaw within the setting of BioPerl I need to download and install it in the same directory in which I have installed bioperl, or is it included in the module of Bio::Align. I am not sure whether this is the best place to ask these very basic questions. If not, could anyone please refer me to the proper e mail account? Thank you very much in advance. Luba Pardo MD, PhD From bix at sendu.me.uk Thu Dec 14 14:10:43 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 09:10:43 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> Message-ID: <45815B63.1020003@sendu.me.uk> Andrew Stewart wrote: > I am trying to StandAloneBlast->blastall an array or Bio::Seq > objects. The documentation claims that blastall can be passed a file > name, You're referring to 'In addition, sequence input may be in the form of either a Bio::Seq object or or an array of Bio::Seq objects'? I agree its not clear, but supplying a reference to an array is still supplying an array. Anyway, I'll clarify it. In any case, the usage for the method is what you should pay attention to: > Usage: > $seq_array_ref = \@seq_array; # where @seq_array is an array of > Bio::Seq objects > $blast_report = $factory->blastall(\@seq_array); > > Should this be... > $report = $factory->blastall(@seq_array); > or > $report = $factory->blastall(\@seq_array); > ??? It should be exactly what it says. A reference to the array. > And if you are blastall'ing an array of Seq objects, then does > blastall just return one big blast report or should I be expecting an > array of blast reports? Returns : Reference to a Blast object or BPlite object containing the blast report. That means, just one big object, not an array. From bix at sendu.me.uk Thu Dec 14 14:42:18 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 09:42:18 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> References: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com> Message-ID: <458162CA.5030803@sendu.me.uk> Luba Pardo wrote: > Hello, I am new bioperl and I have been trying to run the examples > available in bptutorial.pl and other basic literature. I have > installed the latest release of bioperl 1.5.2 in a usr/local/src > directory. Any time I try to retrieve the SwissProt and EMBL > databases it gives me an error. What exactly are you trying? Paste some relevant code along with the exact error message you get when running that code. > I wonder if the installation was not successful, as I would expect > that these databases accesses were included in the modules of BioPerl > Core. They should work with just core installed. In addition, I would like to ask whether to run Clustaw within > the setting of BioPerl I need to download and install it in the same > directory in which I have installed bioperl, or is it included in the > module of Bio::Align. The ClustalW module is in the bioperl-run package, so install that in the same way you installed bioperl (core). The actual ClustalW program you need to download and install according to its own instructions. You let Bioperl know about where you installed ClustalW by eg. setting an environment variable. See http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/Alignment/Clustalw.html#DESCRIPTION for details. > I am not sure whether this is the best place to ask these very basic > questions. If not, could anyone please refer me to the proper e mail > account? Its certainly the correct place, I hope we can resolve your problems. From neetisomaiya at gmail.com Thu Dec 14 08:02:37 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Thu, 14 Dec 2006 13:32:37 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com> Message-ID: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle). I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: 3.out Type: application/octet-stream Size: 204960 bytes Desc: not available URL: From stewarta at nmrc.navy.mil Thu Dec 14 16:34:43 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 11:34:43 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <45815B63.1020003@sendu.me.uk> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> Message-ID: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Thanks for the reply, Sendu. So I've tried passing a reference to an array of Seq objects with the following code... push @blast_run, $factory->blastall(\@query); # where @query is an array of Bio::Seq objects (In case you're wondering, I'm pushing the report into an array of reports because I'm running several instances of blastall with different parameters each time.) ....and it throws me the following exception... ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ Bio/Tools/Run/StandAloneBlast.pm:557 STACK: main::run_blastall ./new_blast_script.pl:215 STACK: ./new_blast_script.pl:115 ----------------------------------------------------------- And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm returns... 757 my $status = system($commandstring); 758 759 $self->throw("$executable call crashed: $? $commandstring \n") 760 unless ($status==0) ; So it looks like the system call isn't returning a happy $status. At this point I'm pretty much stuck, though. Blastall works just fine if I only send it a single Seq object. Looking at _setinput, it appears a reference to an array of Seq objects should end up creating a multi-fasta file. The only possibilities I can think of to explain this is... - The -i file isn't be created for some reason when an (ref to) array of Seqs is passed - There is something wrong with the -i file that is created and sent to blastall. - Something else is wrong with the $commandstring being sent to the system call. Does anyone see something here that I don't? Thanks, Andrew On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote: > Andrew Stewart wrote: >> I am trying to StandAloneBlast->blastall an array or Bio::Seq >> objects. The documentation claims that blastall can be passed a >> file name, > > You're referring to 'In addition, sequence input may be in the form > of either a Bio::Seq object or or an array of Bio::Seq objects'? I > agree its not clear, but supplying a reference to an array is still > supplying an array. Anyway, I'll clarify it. > > > In any case, the usage for the method is what you should pay > attention to: > >> Usage: >> $seq_array_ref = \@seq_array; # where @seq_array is an array of >> Bio::Seq objects >> $blast_report = $factory->blastall(\@seq_array); >> Should this be... >> $report = $factory->blastall(@seq_array); >> or >> $report = $factory->blastall(\@seq_array); >> ??? > > It should be exactly what it says. A reference to the array. > > >> And if you are blastall'ing an array of Seq objects, then does >> blastall just return one big blast report or should I be expecting >> an array of blast reports? > > Returns : Reference to a Blast object or BPlite object > containing the blast report. > > That means, just one big object, not an array. -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From cjfields at uiuc.edu Thu Dec 14 17:03:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 11:03:12 -0600 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Message-ID: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote: > Thanks for the reply, Sendu. > > So I've tried passing a reference to an array of Seq objects with the > following code... > > push @blast_run, $factory->blastall(\@query); # where @query is an > array of Bio::Seq objects > > (In case you're wondering, I'm pushing the report into an array of > reports because I'm running several instances of blastall with > different parameters each time.) > > ....and it throws me the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ > common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E > > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 > STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 > STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ > Bio/Tools/Run/StandAloneBlast.pm:557 > STACK: main::run_blastall ./new_blast_script.pl:215 > STACK: ./new_blast_script.pl:115 > ----------------------------------------------------------- > > And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm > returns... > 757 my $status = system($commandstring); > 758 > 759 $self->throw("$executable call crashed: $? $commandstring > \n") > 760 unless ($status==0) ; > > So it looks like the system call isn't returning a happy $status. At > this point I'm pretty much stuck, though. Blastall works just fine > if I only send it a single Seq object. Looking at _setinput, it > appears a reference to an array of Seq objects should end up creating > a multi-fasta file. The only possibilities I can think of to explain > this is... > > - The -i file isn't be created for some reason when an (ref to) array > of Seqs is passed > - There is something wrong with the -i file that is created and sent > to blastall. > - Something else is wrong with the $commandstring being sent to the > system call. > > Does anyone see something here that I don't? The error pops up when the executable returns a bad status, so maybe it's choking on too many input sequences (i.e. Bioperl is doing everything correctly, but you are attempting to BLAST too many sequences in one go). How many sequences are you attempting to use as input? What happens when you use fewer input sequences? chris From stewarta at nmrc.navy.mil Thu Dec 14 17:49:45 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 12:49:45 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> Message-ID: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> > So can you look at the tempfile that is created and see if it is sane? > > Set -save_tempfiles => 1 whene you initialize the factory object or do > $factory->save_tempfiles(1) > before calling the blastall. > > -jason > Jason, I was actually wondering how to do that. Thanks. Odd though, it still doesn't seem to be saving the tempfiles. Might not matter though, because... > The error pops up when the executable returns a bad status, so > maybe it's choking on too many input sequences (i.e. Bioperl is > doing everything correctly, but you are attempting to BLAST too > many sequences in one go). How many sequences are you attempting > to use as input? What happens when you use fewer input sequences? > > chris > I was processing 738 sequences for input. I cut that down to 20 sequences and I'm getting some other exception thrown further downstream, so it appears you may be correct. You don't happen to know what the max number of sequences that blastall allows for input, would ya? ;) I suppose I'll have to break @query down into smaller doses or something. Thanks, Andrew On Dec 14, 2006, at 12:03 PM, Chris Fields wrote: > > On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote: > >> Thanks for the reply, Sendu. >> >> So I've tried passing a reference to an array of Seq objects with the >> following code... >> >> push @blast_run, $factory->blastall(\@query); # where @query is an >> array of Bio::Seq objects >> >> (In case you're wondering, I'm pushing the report into an array of >> reports because I'm running several instances of blastall with >> different parameters each time.) >> >> ....and it throws me the following exception... >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: blastall call crashed: 11 /common/bin/blastall -p blastp - >> d "/ >> common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: >> 328 >> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ >> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 >> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/ >> lib/ >> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 >> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/ >> perl5/5.8.6/ >> Bio/Tools/Run/StandAloneBlast.pm:557 >> STACK: main::run_blastall ./new_blast_script.pl:215 >> STACK: ./new_blast_script.pl:115 >> ----------------------------------------------------------- >> >> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm >> returns... >> 757 my $status = system($commandstring); >> 758 >> 759 $self->throw("$executable call crashed: $? $commandstring >> \n") >> 760 unless ($status==0) ; >> >> So it looks like the system call isn't returning a happy $status. At >> this point I'm pretty much stuck, though. Blastall works just fine >> if I only send it a single Seq object. Looking at _setinput, it >> appears a reference to an array of Seq objects should end up creating >> a multi-fasta file. The only possibilities I can think of to explain >> this is... >> >> - The -i file isn't be created for some reason when an (ref to) array >> of Seqs is passed >> - There is something wrong with the -i file that is created and sent >> to blastall. >> - Something else is wrong with the $commandstring being sent to the >> system call. >> >> Does anyone see something here that I don't? > > The error pops up when the executable returns a bad status, so > maybe it's choking on too many input sequences (i.e. Bioperl is > doing everything correctly, but you are attempting to BLAST too > many sequences in one go). How many sequences are you attempting > to use as input? What happens when you use fewer input sequences? > > chris > -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From Derek.Fairley at bll.n-i.nhs.uk Thu Dec 14 17:58:10 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Thu, 14 Dec 2006 17:58:10 -0000 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> Message-ID: Neeti, >From http://emboss.sourceforge.net/apps/cvs/needle.html: "The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats." Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format. HTH, Derek. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya Sent: 14 December 2006 08:03 To: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle). I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > -- -Neeti Even my blood says, B positive From cjfields at uiuc.edu Thu Dec 14 18:36:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 12:36:09 -0600 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> Message-ID: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote: >> So can you look at the tempfile that is created and see if it is >> sane? >> >> Set -save_tempfiles => 1 whene you initialize the factory object >> or do >> $factory->save_tempfiles(1) >> before calling the blastall. >> >> -jason >> > > Jason, > I was actually wondering how to do that. Thanks. Odd though, it > still doesn't seem to be saving the tempfiles. Might not matter That needs to be checked out. Can anyone verify that? >> The error pops up when the executable returns a bad status, so >> maybe it's choking on too many input sequences (i.e. Bioperl is >> doing everything correctly, but you are attempting to BLAST too >> many sequences in one go). How many sequences are you attempting >> to use as input? What happens when you use fewer input sequences? >> >> chris >> > > I was processing 738 sequences for input. I cut that down to 20 > sequences and I'm getting some other exception thrown further > downstream, so it appears you may be correct. You don't happen to > know what the max number of sequences that blastall allows for input, > would ya? ;) I suppose I'll have to break @query down into smaller > doses or something. > > Thanks, > Andrew It was a shot in the dark, really. The fact that the return status was bad could be due to a number of problems (permissions issues, bad data, etc). The fact that a single sequence worked indicated that permissions and output format likely weren't to blame. The only other thing left was a problem with blastall itself. BTW, the blast docs do not indicate whether there is a maximum number of sequences. There may be a point where available memory becomes the limiting issue. chris From vaughn at cshl.edu Thu Dec 14 19:09:34 2006 From: vaughn at cshl.edu (Matthew Vaughn) Date: Thu, 14 Dec 2006 14:09:34 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking Message-ID: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> Dear all, I'm trying to bring some of my code into compliance with the BioPerl 1.5.2 and am running into some design decisions that I am unclear on. Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of the 'type' against SOFA? It seems to me that this should be optional behavior as is the case with the Bio::FeatureIO family. I'd be happy to write the patch if there is any agreement with me on this case. Thanks, Matt -- Matthew W. Vaughn, Ph.D. Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 phone: (516) 367-8469 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2413 bytes Desc: not available URL: From jason at bioperl.org Thu Dec 14 16:59:20 2006 From: jason at bioperl.org (Jason Stajich) Date: Thu, 14 Dec 2006 11:59:20 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> Message-ID: <640E2BB7-33F3-44C9-B903-9DDA54F02D12@bioperl.org> So can you look at the tempfile that is created and see if it is sane? Set -save_tempfiles => 1 whene you initialize the factory object or do $factory->save_tempfiles(1) before calling the blastall. -jason On Dec 14, 2006, at 11:34 AM, Andrew Stewart wrote: > Thanks for the reply, Sendu. > > So I've tried passing a reference to an array of Seq objects with the > following code... > > push @blast_run, $factory->blastall(\@query); # where @query is an > array of Bio::Seq objects > > (In case you're wondering, I'm pushing the report into an array of > reports because I'm running several instances of blastall with > different parameters each time.) > > ....and it throws me the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: blastall call crashed: 11 /common/bin/blastall -p blastp -d "/ > common/data/BACILLUS.pep" -i /tmp/Z69hzaqEbR -o /tmp/02Zja7AF3E > > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 > STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ > perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 > STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ > Bio/Tools/Run/StandAloneBlast.pm:557 > STACK: main::run_blastall ./new_blast_script.pl:215 > STACK: ./new_blast_script.pl:115 > ----------------------------------------------------------- > > And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm > returns... > 757 my $status = system($commandstring); > 758 > 759 $self->throw("$executable call crashed: $? $commandstring > \n") > 760 unless ($status==0) ; > > So it looks like the system call isn't returning a happy $status. At > this point I'm pretty much stuck, though. Blastall works just fine > if I only send it a single Seq object. Looking at _setinput, it > appears a reference to an array of Seq objects should end up creating > a multi-fasta file. The only possibilities I can think of to explain > this is... > > - The -i file isn't be created for some reason when an (ref to) array > of Seqs is passed > - There is something wrong with the -i file that is created and sent > to blastall. > - Something else is wrong with the $commandstring being sent to the > system call. > > Does anyone see something here that I don't? > > > Thanks, > Andrew > > > > On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote: > >> Andrew Stewart wrote: >>> I am trying to StandAloneBlast->blastall an array or Bio::Seq >>> objects. The documentation claims that blastall can be passed a >>> file name, >> >> You're referring to 'In addition, sequence input may be in the form >> of either a Bio::Seq object or or an array of Bio::Seq objects'? I >> agree its not clear, but supplying a reference to an array is still >> supplying an array. Anyway, I'll clarify it. >> >> >> In any case, the usage for the method is what you should pay >> attention to: >> >>> Usage: >>> $seq_array_ref = \@seq_array; # where @seq_array is an array of >>> Bio::Seq objects >>> $blast_report = $factory->blastall(\@seq_array); >>> Should this be... >>> $report = $factory->blastall(@seq_array); >>> or >>> $report = $factory->blastall(\@seq_array); >>> ??? >> >> It should be exactly what it says. A reference to the array. >> >> >>> And if you are blastall'ing an array of Seq objects, then does >>> blastall just return one big blast report or should I be expecting >>> an array of blast reports? >> >> Returns : Reference to a Blast object or BPlite object >> containing the blast report. >> >> That means, just one big object, not an array. > > > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From stewarta at nmrc.navy.mil Thu Dec 14 21:23:07 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 16:23:07 -0500 Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects In-Reply-To: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil> <45815B63.1020003@sendu.me.uk> <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil> <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu> <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil> <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu> Message-ID: > It was a shot in the dark, really. The fact that the return status > was bad could be due to a number of problems (permissions issues, > bad data, etc). The fact that a single sequence worked indicated > that permissions and output format likely weren't to blame. The > only other thing left was a problem with blastall itself. > > BTW, the blast docs do not indicate whether there is a maximum > number of sequences. There may be a point where available memory > becomes the limiting issue. > > chris Interesting. I ran the 738-sequence dataset through blastall manually and the report only returned 198 of the 738 expected results. Not only that, it seems to have just cut off right in the middle of the 198th result and a Segmentation fault was reported. I removed the 198th sequence, wondering if it might be some issue with the input, and the segmentation fault occured again with the results ending on the 210th result. I stuck the 198th sequence back in, but at the start of the file and sure enough the Segmentation error occurred earlier. I think we can rule out the size of the input or number of sequences as the source of error here. I'm more inclined to think it has something to do with the blast databases being queried against. I found an old discussion on a problem that sounds fairly similar to this one, for anyone interested. http://bioinformatics.org/pipermail/bioclusters/2004-June/001742.html I think I'll try to work around the problem for now. andrew On Dec 14, 2006, at 1:36 PM, Chris Fields wrote: > > On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote: > >>> So can you look at the tempfile that is created and see if it is >>> sane? >>> >>> Set -save_tempfiles => 1 whene you initialize the factory object >>> or do >>> $factory->save_tempfiles(1) >>> before calling the blastall. >>> >>> -jason >>> >> >> Jason, >> I was actually wondering how to do that. Thanks. Odd though, it >> still doesn't seem to be saving the tempfiles. Might not matter > > That needs to be checked out. Can anyone verify that? > >>> The error pops up when the executable returns a bad status, so >>> maybe it's choking on too many input sequences (i.e. Bioperl is >>> doing everything correctly, but you are attempting to BLAST too >>> many sequences in one go). How many sequences are you attempting >>> to use as input? What happens when you use fewer input sequences? >>> >>> chris >>> >> >> I was processing 738 sequences for input. I cut that down to 20 >> sequences and I'm getting some other exception thrown further >> downstream, so it appears you may be correct. You don't happen to >> know what the max number of sequences that blastall allows for input, >> would ya? ;) I suppose I'll have to break @query down into smaller >> doses or something. >> >> Thanks, >> Andrew > > It was a shot in the dark, really. The fact that the return status > was bad could be due to a number of problems (permissions issues, > bad data, etc). The fact that a single sequence worked indicated > that permissions and output format likely weren't to blame. The > only other thing left was a problem with blastall itself. > > BTW, the blast docs do not indicate whether there is a maximum > number of sequences. There may be a point where available memory > becomes the limiting issue. > > chris -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From lincoln.stein at gmail.com Thu Dec 14 20:24:56 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 14 Dec 2006 15:24:56 -0500 Subject: [Bioperl-l] Bio::Graphics xyplot In-Reply-To: <4578951B.5050206@sfu.ca> References: <4578951B.5050206@sfu.ca> Message-ID: <6dce9a0b0612141224r1ef7cce2s6e6123461c3827d8@mail.gmail.com> Hi, The way it works is that you create a single feature that spans the entire range of the xyplot. It contains subfeatures, each of which has a score. The graph points correspond to each of the subfeatures. Lincoln On 12/7/06, Keith Anthony Boroevich wrote: > > Hi everyone, > > I'm attempting to add an xyplot of the phred quality scores to an > Bio::Graphics image, and cannot get it to work. > I have the panel with a track for both the scale and the DNA displaying > properly. When I attempt to add the xyplot i just get a garbled track > of, what looks like, timy xyplots for each datapoint. I have the cvs > (updated today) of bioperl-live running. I think what I am missing is > the creation of a "Sequence Feature Group" to hold the individual points > of the plot. However, I cannot seem to find such an object. This is > what I attempted: > > -------BEGIN---CODE----------- > # start panel > my $panel = Bio::Graphics::Panel->new(-length => $f_seqlen, > -width => $f_seqlen*10, > -pad_left => 10, > -pad_right => 10, > -grid => 1 > ); > # add scale > $panel->add_track(arrow => > Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen), > -double => 1, > -tick => 2, > -fgcolor => 'black'); > # add DNA ($feature is of type Bio::SeqFeature::Annotated) > $panel->add_track(dna => $feature); > # get list of quality scores from database > my ($pqs_value) = $dbh->selectrow_array($sql); > my @pqs_value = split(/\s/,$pqs_value); > # create track > my $track = $panel->add_track(-glyph => 'xyplot', > -graph_type => 'points', > -point_symbol => 'point', > -max_score => 100, > -min_score => 0, > -scale => 'none'); > # add "subfeatures" to > for (my $i=0;$i<$f_seqlen;$i++) { > > > $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i])); > > } > print $panel->png(); > $panel->finished; > ------END---CODE---------- > > I also attempted to create an array of the point features and passed > that by reference to the panel "add_track" as it describes in the xyplot > documentation, but that resulted in the exact same image. > > keith > > -- > ><)))?> -cGRASP- < > Keith Anthony Boroevich > Davidson Lab > Dept of Molecular Biology > Simon Fraser University > Tel: 604-268-7276 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bix at sendu.me.uk Thu Dec 14 22:15:07 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 14 Dec 2006 17:15:07 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> Message-ID: <4581CCEB.20206@sendu.me.uk> Matthew Vaughn wrote: > Dear all, > > I'm trying to bring some of my code into compliance with the BioPerl > 1.5.2 and am running into some design decisions that I am unclear on. > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of > the 'type' against SOFA? It seems to me that this should be optional > behavior as is the case with the Bio::FeatureIO family. I'd be happy to > write the patch if there is any agreement with me on this case. Lots of people seem to have worked on it over the years, but perhaps Scott Cain is the person to talk to? revision 1.4 date: 2004/09/25 11:41:29; author: scain; state: Exp; lines: +1 -1 two things: * adding SOFA as an available ontology to DocumentRegistry.pm * modifying FeatureIO::gff to use SOFA to validate, and to parse Ontology_term From lincoln.stein at gmail.com Thu Dec 14 21:56:41 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 14 Dec 2006 16:56:41 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: References: Message-ID: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> Hi All, I'm afraid that the xyplot glyph that is in the recent bioperl release has an error that causes the content to be printed to the right of the correct position. Unfortunately this wasn't caught before the release because the glyph was only tested on very large (whole genome) features. You will need to do a CVS update to get a fixed version from bioperl-live. A future bugfix release of gbrowse will patch this glyph for you automatically. Lincoln On 12/12/06, Kara Dolinski wrote: > > Hi, > I'm having a problem getting features and an xyplot properly aligned in > Gbrowse. For example, see this page: > > http://tinyurl.com/ylbq3q > > The feature in the "CENPK SNPs" track should actually be around the peak > of the graph in the "CENPK prediction signal" xyplot ie. the SNP feature > is at position 79, and the xyplot axes and data should span from 61 - 95. > However, as you can see, the data in the xyplot are oddly separated from > the axes (which seem to be in the correct place), with the data shifted over > to about position 120-155. > This occurs elsewhere, not just at the ends of the chromosomes. > > When I zoom to ~80 bp, all is well, see: > > http://tinyurl.com/yzav8k > > The relevant snippets from the GFF and the config files are below. > > Thanks! > Kara > > GFF: > > chrI SNPScanner CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score > is 2.24506 > chrI SNPScanner CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score > is 3.26837 > chrI SNPScanner CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score > is 1.39938 > chrI SNPScanner CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score > is 1.4039 > chrI SNPScanner CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score > is 9.16134 > chrI SNPScanner CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score > is 10.1413 > chrI SNPScanner CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score > is 12.9256 > chrI SNPScanner CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score > is 13.195 > chrI SNPScanner CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score > is 22.7127 > chrI SNPScanner CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score > is 23.8289 > chrI SNPScanner CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score > is 21.9123 > chrI SNPScanner CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score > is 28.3344 > chrI SNPScanner CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score > is 35.0436 > chrI SNPScanner CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score > is 37.361 > chrI SNPScanner CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score > is 39.5408 > chrI SNPScanner CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score > is 28.2008 > chrI SNPScanner CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score > is 32.6254 > chrI SNPScanner CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score > is 36.0832 > chrI SNPScanner CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score > is 41.9883 > chrI SNPScanner CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score > is 32.1205 > chrI SNPScanner CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score > is 41.3048 > chrI SNPScanner CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score > is 30.7975 > chrI SNPScanner CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score > is 29.4282 > chrI SNPScanner CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score > is 35.3586 > chrI SNPScanner CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score > is 34.1426 > chrI SNPScanner CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score > is 30.2966 > chrI SNPScanner CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score > is 17.8402 > chrI SNPScanner CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score > is 15.2637 > chrI SNPScanner CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score > is 12.657 > chrI SNPScanner CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score > is 10.2033 > chrI SNPScanner CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score > is 9.40143 > chrI SNPScanner CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score > is 6.56273 > chrI SNPScanner CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score > is 3.66211 > chrI SNPScanner CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score > is 0.394194 > > CONFIG: > > > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} > > [CENPK_all_scores_graph] > feature = GRAPH_CENPK:SNPScanner > glyph = xyplot > graph_type = boxes > fgcolor = purple > bgcolor = purple > height = 100 > min_score = 0 > max_score = 110 > label = 0 > key = CENPK prediction signal > link = > category = SNPs: signal graphs > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dmessina at wustl.edu Fri Dec 15 01:45:24 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 19:45:24 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: References: Message-ID: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> Hey Chris, My thoughts below. > [Chris] > This could be used to annotate any > PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, > maybe in a collection (similar to AnnotationCollection). I thought > something like this may be of general use for any PrimarySeq > (quality, structure), alignments like NEXUS and Stockholm, > SeqFeatures where structure could be stored (tRNA or riboswitches), > etc. > > However, this also seems to fall into the category of sequence > annotation. So, would it be better to have a set of Bio::Annotation > classes used for this purpose? To me, all meta data is equal. That is, your classic Genbank feature annotation and a user's arbitrary meta-tag like "Bob thinks this is a kinase domain" aren't different in kind even if they are different in content. As resequencing projects multiply, the ability to create arbitrary meta tags, attach them to different types of objects, and use those tags to link them together will become desirable, if not essential. Keeping a common interface to all of these meta data types would be advantageous, plus new users won't have to determine whether they need to use Bio::Meta objects or Bio::Annotation objects. So I would argue for all of the meta data types to live "under one roof". Which roof isn't as important. Bio::Annotation, since it already exists for today's meta data, seems like a reasonable choice. (assuming Annotation objects are flexible enough to be extended as you propose) There, and no flames or jibes even. :) Dave From cjfields at uiuc.edu Fri Dec 15 02:21:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 14 Dec 2006 20:21:10 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> Message-ID: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> On Dec 14, 2006, at 7:45 PM, David Messina wrote: > Hey Chris, > > My thoughts below. > >> [Chris] >> This could be used to annotate any >> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, >> maybe in a collection (similar to AnnotationCollection). I thought >> something like this may be of general use for any PrimarySeq >> (quality, structure), alignments like NEXUS and Stockholm, >> SeqFeatures where structure could be stored (tRNA or riboswitches), >> etc. >> >> However, this also seems to fall into the category of sequence >> annotation. So, would it be better to have a set of Bio::Annotation >> classes used for this purpose? > > > To me, all meta data is equal. That is, your classic Genbank feature > annotation and a user's arbitrary meta-tag like "Bob thinks this is a > kinase domain" aren't different in kind even if they are different in > content. > > As resequencing projects multiply, the ability to create arbitrary > meta tags, attach them to different types of objects, and use those > tags to link them together will become desirable, if not essential. > > Keeping a common interface to all of these meta data types would be > advantageous, plus new users won't have to determine whether they > need to use Bio::Meta objects or Bio::Annotation objects. > > So I would argue for all of the meta data types to live "under one > roof". Which roof isn't as important. Bio::Annotation, since it > already exists for today's meta data, seems like a reasonable choice. > (assuming Annotation objects are flexible enough to be extended as > you propose) > > There, and no flames or jibes even. :) I guess what I want to know is whether there should to be a distinction between 'normal' sequence annotation (comments, references, and so on) and annotation that could be best described as position-specific (like RNA or protein structural annotation). The current meta implementation is for sequence data only; I felt it would be nice to have a generic implementation that would be applicable to any object data. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dmessina at wustl.edu Fri Dec 15 02:46:27 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 20:46:27 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> Message-ID: <9C72012A-EFD7-42DD-93F8-578251CFDE01@wustl.edu> And it all seemed so clear to me when I wrote it. :) > whether there should to be a distinction I would argue no because it would contravene a s > a generic implementation that would be applicable to any object data. I wholeheartedly agree that this is the way to go. A generic implementation would allow arbitrary object data while maintaining a standard interface. From dmessina at wustl.edu Fri Dec 15 02:46:27 2006 From: dmessina at wustl.edu (David Messina) Date: Thu, 14 Dec 2006 20:46:27 -0600 Subject: [Bioperl-l] Proposal for Meta data Message-ID: [oops, accidentally hit send midsentence] And it all seemed so clear to me when I wrote it. :) > whether there should to be a distinction I would argue no because it would contravene a standard interface. > a generic implementation that would be applicable to any object data. I wholeheartedly agree that this is the way to go. A generic implementation would allow arbitrary object data while maintaining a standard interface. Dave From neetisomaiya at gmail.com Fri Dec 15 05:21:42 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 15 Dec 2006 10:51:42 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com> Message-ID: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Hi, Thanks a lot for your response. I ran needle like this /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out It gave me the output in format msf. But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO. Please help. Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence. ~Neeti. On 12/14/06, Fairley, Derek wrote: > > Neeti, > > > > From http://emboss.sourceforge.net/apps/cvs/needle.html: > > > > "The results can be output in one of several styles by using the > command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of > the required format. Some of the alignment formats can cope with an > unlimited number of sequences, while others are only for pairs of sequences. > > > > > The available multiple alignment format names are: unknown, multiple, > simple, fasta, msf, trace, srs > > > > The available pairwise alignment format names are: pair, markx0, markx1, > markx2, markx3, markx10, srspair, score > > > > See: http://emboss.sf.net/docs/themes/AlignFormats.html for further > information on alignment formats." > > > > Not sure based on this whether you can get pairwise alignment in .msf > format; can't think of a good reason why not. The BioPerl Align::IO module > will allow you to parse alignments in .msf format. > > > > HTH, > > > > Derek. > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto: > bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya > Sent: 14 December 2006 08:03 > To: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > > > How do I run needle specifying that I want the MSF format, on a linux box? > > The help doesnt show me any format option. Is there anything available to > > pasre MSF format? > > Please find an example alignment file attached. Here the seq_of_contig > > aligns with the reference sequence (i.e. SEQ_1.REF) starting at position > > (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the > > output alignment, how can I parse the result to get this? > > > > On 12/12/06, Chris Fields wrote: > > > > > > > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > > > > > Hi, > > > > > > > > Does anyone know of a bioperl parser for needle output, basically I > > > > won't > > > > where the target sequence aligns on the template (i.e. coordinate > > > > on the > > > > template where the taget aligns). > > > > > > > > -- > > > > -Neeti > > > > Even my blood says, B positive > > > > > > I answered this a number of months back: > > > > > > http://tinyurl.com/yzlbx5 > > > > > > Basically, newer versions of EMBOSS have changed the output for the > > > AlignIO::emboss parser (which parses needle). I don't believe the > > > parser has been fixed to deal with that, but Jason has pointed out > > > you can use MSF output when running needle, then parse using AlignIO > > > with the format set to 'msf'. > > > > > > chris > > > > > > > > > > > -- > > -Neeti > > Even my blood says, B positive > -- -Neeti Even my blood says, B positive From Derek.Fairley at bll.n-i.nhs.uk Fri Dec 15 09:57:35 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Fri, 15 Dec 2006 09:57:35 -0000 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Message-ID: Neeti, In lieu of a response from a BioPerl guru... why not use Needle to generate your pairwise alignment in fasta format, rather than msf format? The sequence you want should correspond to a single HSP which you can get directly from the fasta alignment with Bio::SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use Bio::AlignIO at all. Derek. -----Original Message----- From: neeti somaiya [mailto:neetisomaiya at gmail.com] Sent: 15 December 2006 05:22 To: Fairley, Derek; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? Hi, Thanks a lot for your response. I ran needle like this ?/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out It gave me the output in format msf. But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO. Please help. Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence. ~Neeti. On 12/14/06, Fairley, Derek wrote: Neeti, ? >From http://emboss.sourceforge.net/apps/cvs/needle.html : ? "The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. ? The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs ? The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score ? See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats." ? Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format. ? HTH, ? Derek. ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya Sent: 14 December 2006 08:03 To: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] needle parser in bioperl? ? How do I run needle specifying that I want the MSF format, on a linux box? The help doesnt show me any format option. Is there anything available to pasre MSF format? Please find an example alignment file attached. Here the seq_of_contig aligns with the reference sequence (i.e. SEQ_1.REF) starting at position (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the output alignment, how can I parse the result to get this? ? On 12/12/06, Chris Fields wrote: > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of a bioperl parser for needle output, basically I > > won't > > where the target sequence aligns on the template (i.e. coordinate > > on the > > template where the taget aligns). > > > > -- > > -Neeti > > Even my blood says, B positive > > I answered this a number of months back: > > http://tinyurl.com/yzlbx5 > > Basically, newer versions of EMBOSS have changed the output for the > AlignIO::emboss parser (which parses needle).? I don't believe the > parser has been fixed to deal with that, but Jason has pointed out > you can use MSF output when running needle, then parse using AlignIO > with the format set to 'msf'. > > chris > ? ? ? -- -Neeti Even my blood says, B positive -- -Neeti Even my blood says, B positive From cain at cshl.edu Fri Dec 15 05:01:36 2006 From: cain at cshl.edu (Scott Cain) Date: Fri, 15 Dec 2006 00:01:36 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <4581CCEB.20206@sendu.me.uk> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> Message-ID: <1166158897.2569.335.camel@localhost.localdomain> As much as I would like to take credit for this :-) Allen Day wrote the original code, and then Chris Fields tried to fix it so that it actually worked :-) I think it would be a good idea to have a validate_terms option like Bio::FeatureIO::gff. Scott On Thu, 2006-12-14 at 17:15 -0500, Sendu Bala wrote: > Matthew Vaughn wrote: > > Dear all, > > > > I'm trying to bring some of my code into compliance with the BioPerl > > 1.5.2 and am running into some design decisions that I am unclear on. > > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of > > the 'type' against SOFA? It seems to me that this should be optional > > behavior as is the case with the Bio::FeatureIO family. I'd be happy to > > write the patch if there is any agreement with me on this case. > > Lots of people seem to have worked on it over the years, but perhaps > Scott Cain is the person to talk to? > > revision 1.4 > date: 2004/09/25 11:41:29; author: scain; state: Exp; lines: +1 -1 > two things: > * adding SOFA as an available ontology to DocumentRegistry.pm > * modifying FeatureIO::gff to use SOFA to validate, and to parse > Ontology_term > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From neetisomaiya at gmail.com Fri Dec 15 12:46:08 2006 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 15 Dec 2006 18:16:08 +0530 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> Message-ID: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> I ran needle like this /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out Please find the output attached. When I run the following :- use Bio::SearchIO; my $io = Bio::SearchIO->new(-file => "1.out", -format => "fasta" ); while ( my $result = $io->next_result() ) { while( my $hit = $result->next_hit) { print "yes\n"; } } It says :- -------------------- WARNING --------------------- MSG: unrecognized FASTA Family report file! --------------------------------------------------- What should I do? ~Neeti. On 12/15/06, Fairley, Derek wrote: > > Neeti, > > In lieu of a response from a BioPerl guru... why not use Needle to > generate your pairwise alignment in fasta format, rather than msf format? > The sequence you want should correspond to a single HSP which you can get > directly from the fasta alignment with Bio::SearchIO: > http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use > Bio::AlignIO at all. > > Derek. > > > -----Original Message----- > From: neeti somaiya [mailto:neetisomaiya at gmail.com] > Sent: 15 December 2006 05:22 > To: Fairley, Derek; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > Hi, > > Thanks a lot for your response. > I ran needle like this > /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out > It gave me the output in format msf. > But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I > get the alignment start and stop coordinates on the sequence. I mean > something like hsp->query->start which gives us the alignment start position > on query sequence in a blast output when using Bio::SearchIO. > Please help. > Like I explained with an example in my previous mail, I want the > coordinate where the alignment starts on the sequence. > > ~Neeti. > On 12/14/06, Fairley, Derek wrote: > Neeti, > > From http://emboss.sourceforge.net/apps/cvs/needle.html : > > "The results can be output in one of several styles by using the > command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of > the required format. Some of the alignment formats can cope with an > unlimited number of sequences, while others are only for pairs of sequences. > > The available multiple alignment format names are: unknown, multiple, > simple, fasta, msf, trace, srs > > The available pairwise alignment format names are: pair, markx0, markx1, > markx2, markx3, markx10, srspair, score > > See: http://emboss.sf.net/docs/themes/AlignFormats.html for further > information on alignment formats." > > Not sure based on this whether you can get pairwise alignment in .msf > format; can't think of a good reason why not. The BioPerl Align::IO module > will allow you to parse alignments in .msf format. > > HTH, > > Derek. > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto: > bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya > Sent: 14 December 2006 08:03 > To: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] needle parser in bioperl? > > How do I run needle specifying that I want the MSF format, on a linux box? > The help doesnt show me any format option. Is there anything available to > pasre MSF format? > Please find an example alignment file attached. Here the seq_of_contig > aligns with the reference sequence (i.e. SEQ_1.REF) starting at position > (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the > output alignment, how can I parse the result to get this? > > On 12/12/06, Chris Fields wrote: > > > > > > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: > > > > > Hi, > > > > > > Does anyone know of a bioperl parser for needle output, basically I > > > won't > > > where the target sequence aligns on the template (i.e. coordinate > > > on the > > > template where the taget aligns). > > > > > > -- > > > -Neeti > > > Even my blood says, B positive > > > > I answered this a number of months back: > > > > http://tinyurl.com/yzlbx5 > > > > Basically, newer versions of EMBOSS have changed the output for the > > AlignIO::emboss parser (which parses needle). I don't believe the > > parser has been fixed to deal with that, but Jason has pointed out > > you can use MSF output when running needle, then parse using AlignIO > > with the format set to 'msf'. > > > > chris > > > > > > -- > -Neeti > Even my blood says, B positive > > > > -- > -Neeti > Even my blood says, B positive > -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.out Type: application/octet-stream Size: 90277 bytes Desc: not available URL: From jason at bioperl.org Fri Dec 15 14:28:13 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 09:28:13 -0500 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> Message-ID: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > > On Dec 14, 2006, at 7:45 PM, David Messina wrote: > >> Hey Chris, >> >> My thoughts below. >> >>> [Chris] >>> This could be used to annotate any >>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you, >>> maybe in a collection (similar to AnnotationCollection). I thought >>> something like this may be of general use for any PrimarySeq >>> (quality, structure), alignments like NEXUS and Stockholm, >>> SeqFeatures where structure could be stored (tRNA or riboswitches), >>> etc. >>> >>> However, this also seems to fall into the category of sequence >>> annotation. So, would it be better to have a set of Bio::Annotation >>> classes used for this purpose? >> >> >> To me, all meta data is equal. That is, your classic Genbank feature >> annotation and a user's arbitrary meta-tag like "Bob thinks this is a >> kinase domain" aren't different in kind even if they are different in >> content. >> >> As resequencing projects multiply, the ability to create arbitrary >> meta tags, attach them to different types of objects, and use those >> tags to link them together will become desirable, if not essential. >> >> Keeping a common interface to all of these meta data types would be >> advantageous, plus new users won't have to determine whether they >> need to use Bio::Meta objects or Bio::Annotation objects. >> >> So I would argue for all of the meta data types to live "under one >> roof". Which roof isn't as important. Bio::Annotation, since it >> already exists for today's meta data, seems like a reasonable choice. >> (assuming Annotation objects are flexible enough to be extended as >> you propose) >> >> There, and no flames or jibes even. :) > > I guess what I want to know is whether there should to be a > distinction between 'normal' sequence annotation (comments, > references, and so on) and annotation that could be best described as > position-specific (like RNA or protein structural annotation). The > current meta implementation is for sequence data only; I felt it > would be nice to have a generic implementation that would be > applicable to any object data. my stream-of-consciousness for right now: I was thinking Bio::Annotation is where this should go - that system doesn't have anything about it that makes it explicitly sequence related. What we're trying to hammer out here on the Alignment side - which fits with your RNA example - is have features, basically SeqFeatures - associated with alignments so columns can be annotated to cover things like character sets and partitions for phylogenetic analyses. As for data which annotates non-contiguous things like RNAstems we may have to be more creative about that or model it with a splitLocation. So currently we've added code so that an Alignment is-a Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this end, with the goal of being able to capture more of the data that can be represented in a NEXUS file. It feels more like a hack than an elegant Meta-data solution, but I am totally sure whether the data you are thinking about doing at this point, perhaps I need to spend more time thinking about it. Or are you worried about the idea of whether the semantic mapping of the data into features or annotations is confusing users? From jason at bioperl.org Fri Dec 15 14:48:32 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 09:48:32 -0500 Subject: [Bioperl-l] needle parser in bioperl? In-Reply-To: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com> <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com> Message-ID: <42CB9018-72CD-433E-A42F-152D63D2F584@bioperl.org> I get the impression you are trying to use the wrong tool for the job. Can you explain a little more generally what you want to do? Semantically FASTA in Bio::SearchIO is much different from FASTA in Bio::AlignIO. We explain this on the wiki, please have a look on the FASTA page. do not use Bio::SearchIO to parse multi-fasta alignment output Bio::SearchIO is for pairwise alignment reports use Bio::AlignIO for a multi-fasta format or for msf - you just provide a different field to '-format'. But none of that is going to help you get start/end for your alignment because that is not part of the output format - do the experiment of looking at the file and figuring out what are the actual fields you want output, if they don't exist then you either have a format that won't work for your question, or you will have to calculate additional . If you trying to align transcripts to genome please consider tools that are built for it (and referenced on the wiki like Sim4, est2genome, exonerate, BLAT). -jason On Dec 15, 2006, at 7:46 AM, neeti somaiya wrote: > I ran needle like this > > /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out > > Please find the output attached. > > When I run the following :- > > use Bio::SearchIO; > > my $io = Bio::SearchIO->new(-file => "1.out", > -format => "fasta" ); > > while ( my $result = $io->next_result() ) > { > while( my $hit = $result->next_hit) > { > > print "yes\n"; > } > } > > > It says :- > > -------------------- WARNING --------------------- > MSG: unrecognized FASTA Family report file! > --------------------------------------------------- > > What should I do? > > ~Neeti. > > On 12/15/06, Fairley, Derek wrote: >> >> Neeti, >> >> In lieu of a response from a BioPerl guru... why not use Needle to >> generate your pairwise alignment in fasta format, rather than msf >> format? >> The sequence you want should correspond to a single HSP which you >> can get >> directly from the fasta alignment with Bio::SearchIO: >> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need >> to use >> Bio::AlignIO at all. >> >> Derek. >> >> >> -----Original Message----- >> From: neeti somaiya [mailto:neetisomaiya at gmail.com] >> Sent: 15 December 2006 05:22 >> To: Fairley, Derek; bioperl-l >> Subject: Re: [Bioperl-l] needle parser in bioperl? >> >> Hi, >> >> Thanks a lot for your response. >> I ran needle like this >> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out >> It gave me the output in format msf. >> But now my problem is, if I use Bio::AlignIO module of Bioperl, >> how can I >> get the alignment start and stop coordinates on the sequence. I mean >> something like hsp->query->start which gives us the alignment >> start position >> on query sequence in a blast output when using Bio::SearchIO. >> Please help. >> Like I explained with an example in my previous mail, I want the >> coordinate where the alignment starts on the sequence. >> >> ~Neeti. >> On 12/14/06, Fairley, Derek wrote: >> Neeti, >> >> From http://emboss.sourceforge.net/apps/cvs/needle.html : >> >> "The results can be output in one of several styles by using the >> command-line qualifier -aformat xxx, where 'xxx' is replaced by >> the name of >> the required format. Some of the alignment formats can cope with an >> unlimited number of sequences, while others are only for pairs of >> sequences. >> >> The available multiple alignment format names are: unknown, multiple, >> simple, fasta, msf, trace, srs >> >> The available pairwise alignment format names are: pair, markx0, >> markx1, >> markx2, markx3, markx10, srspair, score >> >> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further >> information on alignment formats." >> >> Not sure based on this whether you can get pairwise alignment in .msf >> format; can't think of a good reason why not. The BioPerl >> Align::IO module >> will allow you to parse alignments in .msf format. >> >> HTH, >> >> Derek. >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto: >> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya >> Sent: 14 December 2006 08:03 >> To: Chris Fields; bioperl-l >> Subject: Re: [Bioperl-l] needle parser in bioperl? >> >> How do I run needle specifying that I want the MSF format, on a >> linux box? >> The help doesnt show me any format option. Is there anything >> available to >> pasre MSF format? >> Please find an example alignment file attached. Here the >> seq_of_contig >> aligns with the reference sequence (i.e. SEQ_1.REF) starting at >> position >> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate >> from the >> output alignment, how can I parse the result to get this? >> >> On 12/12/06, Chris Fields wrote: >> > >> > >> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote: >> > >> > > Hi, >> > > >> > > Does anyone know of a bioperl parser for needle output, >> basically I >> > > won't >> > > where the target sequence aligns on the template (i.e. coordinate >> > > on the >> > > template where the taget aligns). >> > > >> > > -- >> > > -Neeti >> > > Even my blood says, B positive >> > >> > I answered this a number of months back: >> > >> > http://tinyurl.com/yzlbx5 >> > >> > Basically, newer versions of EMBOSS have changed the output for the >> > AlignIO::emboss parser (which parses needle). I don't believe the >> > parser has been fixed to deal with that, but Jason has pointed out >> > you can use MSF output when running needle, then parse using >> AlignIO >> > with the format set to 'msf'. >> > >> > chris >> > >> >> >> >> -- >> -Neeti >> Even my blood says, B positive >> >> >> >> -- >> -Neeti >> Even my blood says, B positive >> > > > > -- > -Neeti > Even my blood says, B positive > <1.out> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From lubapardo at gmail.com Fri Dec 15 16:39:11 2006 From: lubapardo at gmail.com (Luba Pardo) Date: Fri, 15 Dec 2006 17:39:11 +0100 Subject: [Bioperl-l] NO BLAST Message-ID: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> *Hello,* *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;* ** *I got the following error message: cannot find path to blastall.* *The code I used is (modified from HOWTObeginners): * #! /local/bin/perl -w #use strict; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; use Bio::Tools::Run::StandAloneBlast; my $db_object = Bio::DB::GenBank-> new; #my $seq_ob = $db_object->get_Seq_by_id('NM_004043'); #$seq= Bio::SeqIO->new(-file => "> out.fasta", -format => 'fasta'); #$seq ->write_seq($seq_ob); #print $seq; @params = (program =>'blastn', database =>'db.fa'); $blast_obj =Bio::Tools::Run::StandAloneBlast->new(@params); $seq_obj = Bio::Seq->new(-id =>"testquery", -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); $report_obj = $blast_obj->blastall($seq_obj); $result_obj =$report_obj->next_result; print $result_obj->num_hits; *Whether I create a sequence the novo or retrieve one from internet I got the same message.* From cjfields at uiuc.edu Fri Dec 15 17:23:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 11:23:27 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> References: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu> <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu> <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> Message-ID: On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: > > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > >> >> On Dec 14, 2006, at 7:45 PM, David Messina wrote: >> >>> Hey Chris, >>> >>> My thoughts below. >>> >>>> [Chris] >>>> This could be used to annotate any >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- >>>> you, >>>> maybe in a collection (similar to AnnotationCollection). I thought >>>> something like this may be of general use for any PrimarySeq >>>> (quality, structure), alignments like NEXUS and Stockholm, >>>> SeqFeatures where structure could be stored (tRNA or riboswitches), >>>> etc. >>>> >>>> However, this also seems to fall into the category of sequence >>>> annotation. So, would it be better to have a set of >>>> Bio::Annotation >>>> classes used for this purpose? >>> >>> >>> To me, all meta data is equal. That is, your classic Genbank feature >>> annotation and a user's arbitrary meta-tag like "Bob thinks this >>> is a >>> kinase domain" aren't different in kind even if they are >>> different in >>> content. >>> >>> As resequencing projects multiply, the ability to create arbitrary >>> meta tags, attach them to different types of objects, and use those >>> tags to link them together will become desirable, if not essential. >>> >>> Keeping a common interface to all of these meta data types would be >>> advantageous, plus new users won't have to determine whether they >>> need to use Bio::Meta objects or Bio::Annotation objects. >>> >>> So I would argue for all of the meta data types to live "under one >>> roof". Which roof isn't as important. Bio::Annotation, since it >>> already exists for today's meta data, seems like a reasonable >>> choice. >>> (assuming Annotation objects are flexible enough to be extended as >>> you propose) >>> >>> There, and no flames or jibes even. :) >> >> I guess what I want to know is whether there should to be a >> distinction between 'normal' sequence annotation (comments, >> references, and so on) and annotation that could be best described as >> position-specific (like RNA or protein structural annotation). The >> current meta implementation is for sequence data only; I felt it >> would be nice to have a generic implementation that would be >> applicable to any object data. > > my stream-of-consciousness for right now: > > I was thinking Bio::Annotation is where this should go - that > system doesn't have anything about it that makes it explicitly > sequence related. What we're trying to hammer out here on the > Alignment side - which fits with your RNA example - is have > features, basically SeqFeatures - associated with alignments so > columns can be annotated to cover things like character sets and > partitions for phylogenetic analyses. As for data which annotates > non-contiguous things like RNAstems we may have to be more > creative about that or model it with a splitLocation. > > So currently we've added code so that an Alignment is-a > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this > end, with the goal of being able to capture more of the data that > can be represented in a NEXUS file. > > It feels more like a hack than an elegant Meta-data solution, but I > am totally sure whether the data you are thinking about doing at > this point, perhaps I need to spend more time thinking about it. > Or are you worried about the idea of whether the semantic mapping > of the data into features or annotations is confusing users? Sorry in advance for the longish response here... My original thought was to have a generic abstract class capable of positionally describing data in any another class, similar to Heikki's Bio::Seq::MetaI but not constrained to sequence data only. Implementing classes would be capable of having different data structures based on their use (simple string, array, AoA, AoH, AoO). One MetaCollection class to contain them all in a tag-like system, so you could have mixed data types describe the same object. The latter Collection class is so similar to AnnotationCollection that I agree Bio::Annotation would be the best place for this. The way I reconfigured Stockholm alignment parsing/writing is to use Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is capable of holding a sequence and several meta strings, stored as tags or 'names'. However, there is no Meta object for alignments (for RNA/protein structure consensus and other Rfam/Pfam markup); I hacked around this by using a Bio::Seq::Meta w/o a seq, but I would rather have a generic Meta object independent of the sequence cruft. So for this partial Pfam alignment, Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG #=GR Q92SV1_RHIME/122-299 pAS ......................... Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT #=GC SA_cons 03002200312...1312414..676 #=GC seq_cons luhhLuhsRpl...hthppth..+pG // '#=GC' lines would be in generic meta string objects in the alignment, while '#=GR' tags would be in similar meta objects in the relevant sequences. As long as both aren't AnnotatableI this isn't an issue. Similarly, NEXUS files which contained any position-based values could hold a meta string/array object in a similar tag. The basic scheme is: |--String | Annotation::Meta----|--Array | |--HorriblyComplexDataStruct Then I started thinking about where this could be applied, and whether a true Meta object needs to be constrained only to describing position-based data. This somewhat relates to this bug: http://bugzilla.open-bio.org/show_bug.cgi?id=1825 which seems to need a simple but unconstrained hash-of-arrays-based meta object. Then my head appropriately exploded... Hope everything is going well at the hackathon! Looks like some interesting stuff coming out of it. chris From cjfields at uiuc.edu Fri Dec 15 17:49:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 11:49:45 -0600 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <1166158897.2569.335.camel@localhost.localdomain> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> Message-ID: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> On Dec 14, 2006, at 11:01 PM, Scott Cain wrote: > As much as I would like to take credit for this :-) Allen Day > wrote the > original code, and then Chris Fields tried to fix it so that it > actually > worked :-) I think it would be a good idea to have a validate_terms > option like Bio::FeatureIO::gff. > > Scott I did ?!? I committed a bug fix a while back: Revision 1.34 / (view) - annotate - [select for diffs] , Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields Branch: MAIN CVS Tags: branch-experimental Branch point for: branch-1-5-2 Changes since 1.33: +155 -33 lines Diff to previous 1.33 Bug 2026; Robert's enhancements To tell the truth I don't know if this is where the mandatory checks were added in; I'm not too familiar with SeqFeature::Annotation yet. I agree with Scott (and Matthew) that SOFA checks should be optional. Matthew, can you write up a patch and maybe some tests? chris From stewarta at nmrc.navy.mil Thu Dec 14 23:30:11 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 14 Dec 2006 18:30:11 -0500 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown Message-ID: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> I'm getting the following exception... ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM STACK: Error::throw STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ SearchIO/blast.pm:1172 STACK: main::process_reports ./new_blast_script.pl:254 STACK: ./new_blast_script.pl:132 ----------------------------------------------------------- next_result is a pretty dense chunk of code to decipher. I was wondering if anyone more familiar with that code might know what the "no data for midline $_" exception is referring to? For context: 1161 if( /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+ (\-?\d+)/ ) { 1162 my ($full,$type,$start,$str,$end) = ($1, $2,$3,$4,$5); 1163 if( $str eq '-' ) { 1164 $i = 3 if $type eq 'Sbjct'; 1165 } else { 1166 $data{$type} = $str; 1167 } 1168 $len = length($full); 1169 $self->{"\_$type"}->{'begin'} = $start unless $self->{"_$type"}->{'begin'}; 1170 $self->{"\_$type"}->{'end'} = $end; 1171 } else { 1172 $self->throw("no data for midline $_") 1173 unless (defined $_ && defined $len); 1174 $data{'Mid'} = substr($_,$len); 1175 } -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From jason at bioperl.org Fri Dec 15 18:56:13 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 15 Dec 2006 13:56:13 -0500 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown In-Reply-To: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> Message-ID: It means it is expecting alignment block of data and there is none (or there is none in the context it is expecting it) - so something is wrong with the report as it gets tripped up. I'm not sure reading the code is going to help you - what someone will have to do is figure out what is different about this report than reports that do work for the parser. You'll do better if you just provide an example report that is failing as a bug report. Providing the version of BLAST you are using and version of bioperl will help. I seem to remember NCBI changing the BLAST text format so that will break the parser if it is a significant change. As has been mentioned in the past, this playing cat and mouse with format changes means things will periodically break. If you need rock- solid always going to work, I guess the XML is better route to go. -jason On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote: > I'm getting the following exception... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM > STACK: Error::throw > STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328 > STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ > SearchIO/blast.pm:1172 > STACK: main::process_reports ./new_blast_script.pl:254 > STACK: ./new_blast_script.pl:132 > ----------------------------------------------------------- > > > next_result is a pretty dense chunk of code to decipher. I was > wondering if anyone more familiar with that code might know what the > "no data for midline $_" exception is referring to? > > > -- > Andrew Stewart > Research Assistant, Genomics Team > Navy Medical Research Center (NMRC) > Biological Defense Research Directorate (BDRD) > BDRD Annex > 12300 Washington Avenue, 2nd Floor > Rockville, MD 20852 > > email: stewarta at nmrc.navy.mil > phone: 301-231-6700 Ext 270 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Dec 15 19:21:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 15 Dec 2006 13:21:32 -0600 Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown In-Reply-To: References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil> Message-ID: <6A0D17FA-CB98-4937-998E-11B87FB9CBBD@uiuc.edu> On Dec 15, 2006, at 12:56 PM, Jason Stajich wrote: > It means it is expecting alignment block of data and there is none > (or there is none in the context it is expecting it) - so something > is wrong with the report as it gets tripped up. > > I'm not sure reading the code is going to help you - what someone > will have to do is figure out what is different about this report > than reports that do work for the parser. > You'll do better if you just provide an example report that is > failing as a bug report. > > Providing the version of BLAST you are using and version of bioperl > will help. I seem to remember NCBI changing the BLAST text format so > that will break the parser if it is a significant change. > > As has been mentioned in the past, this playing cat and mouse with > format changes means things will periodically break. If you need rock- > solid always going to work, I guess the XML is better route to go. > > -jason I agree that XML is the only reliable way to go, though I have been reading on the BioPython group about some issues with newer (2.2.13 or greater) BLAST XML output when reports with multiple BLAST queries. Don't know if this affects Bioperl or not. As for the 'midline' error, there was a similar error a while back (fixed for the 1.5.2 release) that had to do with extra lines in the alignment section in some BLAST reports. Unless we have a demo BLAST report and sample code we can't do much about it (we need to reproduce the error in order to fix it), so the best thing to do it file a bug report. chris > On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote: > >> I'm getting the following exception... >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: no data for midline Posted date: Dec 14, 2006 2:52 PM >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: >> 328 >> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ >> SearchIO/blast.pm:1172 >> STACK: main::process_reports ./new_blast_script.pl:254 >> STACK: ./new_blast_script.pl:132 >> ----------------------------------------------------------- >> >> >> next_result is a pretty dense chunk of code to decipher. I was >> wondering if anyone more familiar with that code might know what the >> "no data for midline $_" exception is referring to? >> >> >> -- >> Andrew Stewart >> Research Assistant, Genomics Team >> Navy Medical Research Center (NMRC) >> Biological Defense Research Directorate (BDRD) >> BDRD Annex >> 12300 Washington Avenue, 2nd Floor >> Rockville, MD 20852 >> >> email: stewarta at nmrc.navy.mil >> phone: 301-231-6700 Ext 270 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From vaughn at cshl.edu Fri Dec 15 18:05:47 2006 From: vaughn at cshl.edu (Matthew Vaughn) Date: Fri, 15 Dec 2006 13:05:47 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> Message-ID: Yes, I will. I am working on it today. It's a little more complicated to fix this than I expected because SeqFeature::Annotation->type() returns a Bio::AnnotationI rather than a simple scalar like it used to. On 12/15/06, Chris Fields wrote: > On Dec 14, 2006, at 11:01 PM, Scott Cain wrote: > > > As much as I would like to take credit for this :-) Allen Day > > wrote the > > original code, and then Chris Fields tried to fix it so that it > > actually > > worked :-) I think it would be a good idea to have a validate_terms > > option like Bio::FeatureIO::gff. > > > > Scott > > I did ?!? I committed a bug fix a while back: > > Revision 1.34 / (view) - annotate - [select for diffs] , > Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields > Branch: MAIN > CVS Tags: branch-experimental > Branch point for: branch-1-5-2 > Changes since 1.33: +155 -33 lines > Diff to previous 1.33 > > Bug 2026; Robert's enhancements > > To tell the truth I don't know if this is where the mandatory checks > were added in; I'm not too familiar with SeqFeature::Annotation yet. > > I agree with Scott (and Matthew) that SOFA checks should be > optional. Matthew, can you write up a patch and maybe some tests? > > chris > > > > From valiente at lsi.upc.edu Sat Dec 16 00:45:27 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Sat, 16 Dec 2006 01:45:27 +0100 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4577EFD3.7090904@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> Message-ID: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> > I don't think that can be true. Your error message contains 'Must > supply > a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). > > If you uninstall the fink installation and install 1.5.2 using cpan > (with root privileges by going sudo cpan) that should at least get > rid of the error messages... > > >> The tree is not correct (I've parsed it from R to have a double >> check) but don't know yet what the problem is with it. > > ... But if the tree is wrong anyway... Let me know what you find out. I've uninstalled the fink installation and used the cvs instead, and the error message is gone. However, on a larger set of 190 species, which are all present in the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, something must be wrong with the merge_lineage method in the major rewrite of the taxonomy2tree script. Can someone please check this? I'm attaching the 190 species call to the script. Thanks, Gabriel -------------- next part -------------- A non-text attachment was scrubbed... Name: fetch-bork.sh Type: application/octet-stream Size: 7378 bytes Desc: not available URL: From lincoln.stein at gmail.com Fri Dec 15 16:02:27 2006 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 15 Dec 2006 11:02:27 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> Message-ID: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> This is very embarassing for me, particularly since I spent a lot of time validating that Bio::Graphics was working properly before the 1.5.2 release went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release? Lincoln On 12/14/06, Lincoln Stein wrote: > > Hi All, > > I'm afraid that the xyplot glyph that is in the recent bioperl release has > an error that causes the content to be printed to the right of the correct > position. Unfortunately this wasn't caught before the release because the > glyph was only tested on very large (whole genome) features. > > You will need to do a CVS update to get a fixed version from bioperl-live. > A future bugfix release of gbrowse will patch this glyph for you > automatically. > > Lincoln > > On 12/12/06, Kara Dolinski wrote: > > > > Hi, > > I'm having a problem getting features and an xyplot properly aligned in > > Gbrowse. For example, see this page: > > > > http://tinyurl.com/ylbq3q > > > > The feature in the "CENPK SNPs" track should actually be around the peak > > of the graph in the "CENPK prediction signal" xyplot ie. the SNP > > feature is at position 79, and the xyplot axes and data should span from > > 61 - 95. However, as you can see, the data in the xyplot are oddly > > separated from the axes (which seem to be in the correct place), with the > > data shifted over to about position 120-155. > > This occurs elsewhere, not just at the ends of the chromosomes. > > > > When I zoom to ~80 bp, all is well, see: > > > > http://tinyurl.com/yzav8k > > > > The relevant snippets from the GFF and the config files are below. > > > > Thanks! > > Kara > > > > GFF: > > > > chrI SNPScanner > > CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score > > is 2.24506 > > chrI SNPScanner > > CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score > > is 3.26837 > > chrI SNPScanner > > CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score > > is 1.39938 > > chrI SNPScanner > > CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score > > is 1.4039 > > chrI SNPScanner > > CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score > > is 9.16134 > > chrI SNPScanner > > CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score > > is 10.1413 > > chrI SNPScanner > > CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score > > is 12.9256 > > chrI SNPScanner > > CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score > > is 13.195 > > chrI SNPScanner > > CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score > > is 22.7127 > > chrI SNPScanner > > CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score > > is 23.8289 > > chrI SNPScanner > > CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score > > is 21.9123 > > chrI SNPScanner > > CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score > > is 28.3344 > > chrI SNPScanner > > CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score > > is 35.0436 > > chrI SNPScanner > > CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score > > is 37.361 > > chrI SNPScanner > > CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score > > is 39.5408 > > chrI SNPScanner > > CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score > > is 28.2008 > > chrI SNPScanner > > CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score > > is 32.6254 > > chrI SNPScanner > > CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score > > is 36.0832 > > chrI SNPScanner > > CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score > > is 41.9883 > > chrI SNPScanner > > CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score > > is 32.1205 > > chrI SNPScanner > > CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score > > is 41.3048 > > chrI SNPScanner > > CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score > > is 30.7975 > > chrI SNPScanner > > CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score > > is 29.4282 > > chrI SNPScanner > > CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score > > is 35.3586 > > chrI SNPScanner > > CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score > > is 34.1426 > > chrI SNPScanner > > CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score > > is 30.2966 > > chrI SNPScanner > > CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score > > is 17.8402 > > chrI SNPScanner > > CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score > > is 15.2637 > > chrI SNPScanner > > CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score > > is 12.657 > > chrI SNPScanner > > CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score > > is 10.2033 > > chrI SNPScanner > > CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score > > is 9.40143 > > chrI SNPScanner > > CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score > > is 6.56273 > > chrI SNPScanner > > CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score > > is 3.66211 > > chrI SNPScanner > > CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score > > is 0.394194 > > > > CONFIG: > > > > > > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} > > > > [CENPK_all_scores_graph] > > feature = GRAPH_CENPK:SNPScanner > > glyph = xyplot > > graph_type = boxes > > fgcolor = purple > > bgcolor = purple > > height = 100 > > min_score = 0 > > max_score = 110 > > label = 0 > > key = CENPK prediction signal > > link = > > category = SNPs: signal graphs > > > > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > > your > > opinions on IT & business topics through brief surveys - and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > > > _______________________________________________ > > Gmod-gbrowse mailing list > > Gmod-gbrowse at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > > > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Sat Dec 16 06:10:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 00:10:07 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> Message-ID: <70A5E333-8CF5-49D3-84AC-7A6A02791B5C@uiuc.edu> We could feasibly have regular point releases of the 1.5 dev. series for bug fixes; I guess it just depends on how often these should come out and what critical tests must pass for a release to go forward. Sendu's already done a ton of work towards getting BioPerl switched over to Module::Build and Test::More, and fixing bugs. As Hilmar has pointed out in the past, this is a developer's series, so not every test needs to pass before a release goes out. When would you like this to go out? chris On Dec 15, 2006, at 10:02 AM, Lincoln Stein wrote: > This is very embarassing for me, particularly since I spent a lot > of time > validating that Bio::Graphics was working properly before the 1.5.2 > release > went out. How long before there is a 1.5.3 release? How about a > 1.5.2.1release? > > Lincoln > > On 12/14/06, Lincoln Stein wrote: >> >> Hi All, >> >> I'm afraid that the xyplot glyph that is in the recent bioperl >> release has >> an error that causes the content to be printed to the right of the >> correct >> position. Unfortunately this wasn't caught before the release >> because the >> glyph was only tested on very large (whole genome) features. >> >> You will need to do a CVS update to get a fixed version from >> bioperl-live. >> A future bugfix release of gbrowse will patch this glyph for you >> automatically. >> >> Lincoln >> >> On 12/12/06, Kara Dolinski wrote: >>> >>> Hi, >>> I'm having a problem getting features and an xyplot properly >>> aligned in >>> Gbrowse. For example, see this page: >>> >>> http://tinyurl.com/ylbq3q >>> >>> The feature in the "CENPK SNPs" track should actually be around >>> the peak >>> of the graph in the "CENPK prediction signal" xyplot ie. the SNP >>> feature is at position 79, and the xyplot axes and data should >>> span from >>> 61 - 95. However, as you can see, the data in the xyplot are oddly >>> separated from the axes (which seem to be in the correct place), >>> with the >>> data shifted over to about position 120-155. >>> This occurs elsewhere, not just at the ends of the chromosomes. >>> >>> When I zoom to ~80 bp, all is well, see: >>> >>> http://tinyurl.com/yzav8k >>> >>> The relevant snippets from the GFF and the config files are below. >>> >>> Thanks! >>> Kara >>> >>> GFF: >>> >>> chrI SNPScanner >>> CENPK_GRAPH 61 95 41.9883 . . >>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_CALL 79 79 41.9883 . . >>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_SCORE 61 61 2.24506 . . >>> ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score >>> is 2.24506 >>> chrI SNPScanner >>> CENPK_SCORE 62 62 3.26837 . . >>> ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score >>> is 3.26837 >>> chrI SNPScanner >>> CENPK_SCORE 63 63 1.39938 . . >>> ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score >>> is 1.39938 >>> chrI SNPScanner >>> CENPK_SCORE 64 64 1.4039 . . >>> ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score >>> is 1.4039 >>> chrI SNPScanner >>> CENPK_SCORE 65 65 9.16134 . . >>> ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score >>> is 9.16134 >>> chrI SNPScanner >>> CENPK_SCORE 66 66 10.1413 . . >>> ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score >>> is 10.1413 >>> chrI SNPScanner >>> CENPK_SCORE 67 67 12.9256 . . >>> ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score >>> is 12.9256 >>> chrI SNPScanner >>> CENPK_SCORE 68 68 13.195 . . >>> ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score >>> is 13.195 >>> chrI SNPScanner >>> CENPK_SCORE 69 69 22.7127 . . >>> ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score >>> is 22.7127 >>> chrI SNPScanner >>> CENPK_SCORE 70 70 23.8289 . . >>> ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score >>> is 23.8289 >>> chrI SNPScanner >>> CENPK_SCORE 71 71 21.9123 . . >>> ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score >>> is 21.9123 >>> chrI SNPScanner >>> CENPK_SCORE 72 72 28.3344 . . >>> ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score >>> is 28.3344 >>> chrI SNPScanner >>> CENPK_SCORE 73 73 35.0436 . . >>> ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score >>> is 35.0436 >>> chrI SNPScanner >>> CENPK_SCORE 74 74 37.361 . . >>> ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score >>> is 37.361 >>> chrI SNPScanner >>> CENPK_SCORE 75 75 39.5408 . . >>> ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score >>> is 39.5408 >>> chrI SNPScanner >>> CENPK_SCORE 76 76 28.2008 . . >>> ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score >>> is 28.2008 >>> chrI SNPScanner >>> CENPK_SCORE 77 77 32.6254 . . >>> ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score >>> is 32.6254 >>> chrI SNPScanner >>> CENPK_SCORE 78 78 36.0832 . . >>> ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score >>> is 36.0832 >>> chrI SNPScanner >>> CENPK_SCORE 79 79 41.9883 . . >>> ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score >>> is 41.9883 >>> chrI SNPScanner >>> CENPK_SCORE 80 80 32.1205 . . >>> ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score >>> is 32.1205 >>> chrI SNPScanner >>> CENPK_SCORE 81 81 41.3048 . . >>> ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score >>> is 41.3048 >>> chrI SNPScanner >>> CENPK_SCORE 82 82 30.7975 . . >>> ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score >>> is 30.7975 >>> chrI SNPScanner >>> CENPK_SCORE 83 83 29.4282 . . >>> ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score >>> is 29.4282 >>> chrI SNPScanner >>> CENPK_SCORE 84 84 35.3586 . . >>> ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score >>> is 35.3586 >>> chrI SNPScanner >>> CENPK_SCORE 85 85 34.1426 . . >>> ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score >>> is 34.1426 >>> chrI SNPScanner >>> CENPK_SCORE 86 86 30.2966 . . >>> ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score >>> is 30.2966 >>> chrI SNPScanner >>> CENPK_SCORE 87 87 17.8402 . . >>> ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score >>> is 17.8402 >>> chrI SNPScanner >>> CENPK_SCORE 88 88 15.2637 . . >>> ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score >>> is 15.2637 >>> chrI SNPScanner >>> CENPK_SCORE 89 89 12.657 . . >>> ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score >>> is 12.657 >>> chrI SNPScanner >>> CENPK_SCORE 90 90 10.2033 . . >>> ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score >>> is 10.2033 >>> chrI SNPScanner >>> CENPK_SCORE 91 91 9.40143 . . >>> ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score >>> is 9.40143 >>> chrI SNPScanner >>> CENPK_SCORE 92 92 6.56273 . . >>> ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score >>> is 6.56273 >>> chrI SNPScanner >>> CENPK_SCORE 93 93 3.66211 . . >>> ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score >>> is 3.66211 >>> chrI SNPScanner >>> CENPK_SCORE 94 94 0.394194 . . >>> ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score >>> is 0.394194 >>> >>> CONFIG: >>> >>> >>> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH} >>> >>> [CENPK_all_scores_graph] >>> feature = GRAPH_CENPK:SNPScanner >>> glyph = xyplot >>> graph_type = boxes >>> fgcolor = purple >>> bgcolor = purple >>> height = 100 >>> min_score = 0 >>> max_score = 110 >>> label = 0 >>> key = CENPK prediction signal >>> link = >>> category = SNPs: signal graphs >>> >>> >>> >>> -------------------------------------------------------------------- >>> ----- >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to >>> share >>> your >>> opinions on IT & business topics through brief surveys - and earn >>> cash >>> http://www.techsay.com/default.php? >>> page=join.php&p=sourceforge&CID=DEVDEV >>> >>> >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>> >>> >>> >> >> >> -- >> Lincoln D. Stein >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> FOR URGENT MESSAGES & SCHEDULING, >> PLEASE CONTACT MY ASSISTANT, >> SANDRA MICHELSEN, AT michelse at cshl.edu >> > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sat Dec 16 06:28:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 00:28:47 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: >> I don't think that can be true. Your error message contains 'Must >> supply >> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >> >> If you uninstall the fink installation and install 1.5.2 using >> cpan (with root privileges by going sudo cpan) that should at >> least get rid of the error messages... >> >> >>> The tree is not correct (I've parsed it from R to have a double >>> check) but don't know yet what the problem is with it. >> >> ... But if the tree is wrong anyway... Let me know what you find out. > > I've uninstalled the fink installation and used the cvs instead, > and the error message is gone. However, on a larger set of 190 > species, which are all present in the NCBI taxonomy, the resulting > tree has only 178 taxa. I suspect, something must be wrong with the > merge_lineage method in the major rewrite of the taxonomy2tree > script. Can someone please check this? I'm attaching the 190 > species call to the script. Thanks, > > Gabriel I can confirm that. It is definitely dropping them in merge_lineage (); if you add a call to get_leaf_nodes to check how many are present after each merge_lineage() call, you can see it dropping nodes along the trace. in taxonomy2tree.pl: my $ct; my ($treect, $mergect) = 0; for my $name (@species) { my $ncbi_id = $db->get_taxonid($name); if ($ncbi_id) { #print "Species: $name\n\tTaxID: $ncbi_id\n"; #$ids{$ncbi_id}++; my $node = $db->get_taxon(-taxonid => $ncbi_id); if ($tree) { $tree->merge_lineage($node); } else { $tree = Bio::Tree::Tree->new(-node => $node); } printf("%-3d: Nodes: %-4d\n",$ct,scalar($tree->get_leaf_nodes)); } else { warn "no NCBI Taxonomy node for species ",$name,"\n"; } $ct++; } chris From bix at sendu.me.uk Sat Dec 16 14:37:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 14:37:49 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> Message-ID: <458404BD.8030908@sendu.me.uk> Lincoln Stein wrote: > This is very embarassing for me, particularly since I spent a lot of time > validating that Bio::Graphics was working properly before the 1.5.2 release > went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release? I'm happy to try a point release for critical bug fixes. Why don't you commit the necessary fixes to branch-1-5-2 and let me know when you're happy, and I'll do 1.5.2.1. Cheers, Sendu. From bix at sendu.me.uk Sat Dec 16 14:47:57 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 14:47:57 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: <4584071D.3070005@sendu.me.uk> Gabriel Valiente wrote: >> I don't think that can be true. Your error message contains 'Must supply >> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >> >> If you uninstall the fink installation and install 1.5.2 using cpan >> (with root privileges by going sudo cpan) that should at least get rid >> of the error messages... >> >> >>> The tree is not correct (I've parsed it from R to have a double >>> check) but don't know yet what the problem is with it. >> >> ... But if the tree is wrong anyway... Let me know what you find out. > > I've uninstalled the fink installation and used the cvs instead, and the > error message is gone. However, on a larger set of 190 species, which > are all present in the NCBI taxonomy, the resulting tree has only 178 > taxa. I suspect, something must be wrong with the merge_lineage method > in the major rewrite of the taxonomy2tree script. Can someone please > check this? I'm attaching the 190 species call to the script. Thanks, Ok, I'll look into it. You're also welcome to see if you can take your own code from your original taxonomy2tree script and see if you can merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with your algorithms to get it working correctly. Indeed, does your original version of the script work on this data set? Cheers, Sendu. From cjfields at uiuc.edu Sat Dec 16 15:18:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 16 Dec 2006 09:18:50 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4584071D.3070005@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4584071D.3070005@sendu.me.uk> Message-ID: <6AE33842-B2E7-4E9B-B80D-68A058045818@uiuc.edu> On Dec 16, 2006, at 8:47 AM, Sendu Bala wrote: > Gabriel Valiente wrote: >>> I don't think that can be true. Your error message contains 'Must >>> supply >>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live). >>> >>> If you uninstall the fink installation and install 1.5.2 using cpan >>> (with root privileges by going sudo cpan) that should at least >>> get rid >>> of the error messages... >>> >>> >>>> The tree is not correct (I've parsed it from R to have a double >>>> check) but don't know yet what the problem is with it. >>> >>> ... But if the tree is wrong anyway... Let me know what you find >>> out. >> >> I've uninstalled the fink installation and used the cvs instead, >> and the >> error message is gone. However, on a larger set of 190 species, which >> are all present in the NCBI taxonomy, the resulting tree has only 178 >> taxa. I suspect, something must be wrong with the merge_lineage >> method >> in the major rewrite of the taxonomy2tree script. Can someone please >> check this? I'm attaching the 190 species call to the script. Thanks, > > Ok, I'll look into it. You're also welcome to see if you can take your > own code from your original taxonomy2tree script and see if you can > merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with > your algorithms to get it working correctly. Indeed, does your > original > version of the script work on this data set? > > > Cheers, > Sendu. Sendu, Don't know if it helps, but when I tried Gabriel's shell script last night I ran a modification of taxonomy2tree to see what would pop up. Everything is fine up to about 100 iterations, then merge_lineage () starts dropping leaf nodes. chris From bix at sendu.me.uk Sat Dec 16 15:33:35 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 16 Dec 2006 15:33:35 +0000 Subject: [Bioperl-l] NO BLAST In-Reply-To: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> References: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com> Message-ID: <458411CF.8000707@sendu.me.uk> Luba Pardo wrote: > *Hello,* > *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;* > ** > *I got the following error message: cannot find path to blastall.* > *The code I used is (modified from HOWTObeginners): Bioperl doesn't know where you installed blast. If you've actually installed it, you can set the environment variable BLASTDIR to point to the directory that contains the blastall executable. From cain.cshl at gmail.com Fri Dec 15 18:09:48 2006 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 15 Dec 2006 13:09:48 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu> <4581CCEB.20206@sendu.me.uk> <1166158897.2569.335.camel@localhost.localdomain> <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu> Message-ID: <1166206188.2569.380.camel@localhost.localdomain> On Fri, 2006-12-15 at 11:49 -0600, Chris Fields wrote: > > To tell the truth I don't know if this is where the mandatory checks > were added in; I'm not too familiar with SeqFeature::Annotation yet. > > I agree with Scott (and Matthew) that SOFA checks should be > optional. Matthew, can you write up a patch and maybe some tests? > > chris > That's not where they were added in, it just that they hadn't been fully implemented before then, so they didn't work (which probably meant they weren't mandatory, though I don't remember (it could be that it just croaked)). Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From hlapp at gmx.net Sun Dec 17 06:02:04 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 17 Dec 2006 01:02:04 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <458404BD.8030908@sendu.me.uk> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> Message-ID: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: > Lincoln Stein wrote: >> This is very embarassing for me, particularly since I spent a lot >> of time >> validating that Bio::Graphics was working properly before the >> 1.5.2 release >> went out. How long before there is a 1.5.3 release? How about a >> 1.5.2.1release? > > I'm happy to try a point release for critical bug fixes. Why don't you > commit the necessary fixes to branch-1-5-2 and let me know when you're > happy, and I'll do 1.5.2.1. Feel free to do that, but why not make a 1.5.3 off the main trunk? 1.5.2.1 may be adding more to the version confusion (developer/stable/ point-release/etc) than it is worth, and there is no shame in releasing new developer versions every few weeks. My $0.02 ... -hilmar > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From fgarret at ub.edu Mon Dec 18 12:07:02 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 13:07:02 +0100 Subject: [Bioperl-l] codeml Message-ID: <45868466.508@ub.edu> Hi all, I've been using bioperl's PAML module (specifically the codeml part) but with just one tree. Since the program accepts several trees as input (and runs the analysis for each tree outputting the difference in likelihoods for each one) I was wondering if there's some way to do it through bioperl? thanks in adv, FG From heikki at sanbi.ac.za Mon Dec 18 13:51:50 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon, 18 Dec 2006 15:51:50 +0200 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> Message-ID: <200612181551.51277.heikki@sanbi.ac.za> Reading the discussion, I think it is time to draw some guidelines. 1. Base the Meta implementation to a real use cases. MSA is a good example. 2. Allow generalisations If you can see an other implementation of the same idea that can be merged with the first do it but do not hurt yourself if you can not. The most difficult question is how to separate case-specific attributes that are best implemented by subclassing with additional methods from truly widely variable meta data that is best done as a parallel track meta information holding class. The problem I see with undefined, totally open meta annotation, is that if you can put anything in there, it is also totally confusing to a user. If you can put anything in, how do you know what to get get out and know that it is there? That leads to the the third guideline: 3. Use separate meta classes only when there are several different ways of encoding data that is present in large numbers *and* when you are expecting to be assessing the data computationally rather than just checking if an attribute is there. -Heikki On Friday 15 December 2006 19:23, Chris Fields wrote: > On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: > > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: > >> On Dec 14, 2006, at 7:45 PM, David Messina wrote: > >>> Hey Chris, > >>> > >>> My thoughts below. > >>> > >>>> [Chris] > >>>> This could be used to annotate any > >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- > >>>> you, > >>>> maybe in a collection (similar to AnnotationCollection). I thought > >>>> something like this may be of general use for any PrimarySeq > >>>> (quality, structure), alignments like NEXUS and Stockholm, > >>>> SeqFeatures where structure could be stored (tRNA or riboswitches), > >>>> etc. > >>>> > >>>> However, this also seems to fall into the category of sequence > >>>> annotation. So, would it be better to have a set of > >>>> Bio::Annotation > >>>> classes used for this purpose? > >>> > >>> To me, all meta data is equal. That is, your classic Genbank feature > >>> annotation and a user's arbitrary meta-tag like "Bob thinks this > >>> is a > >>> kinase domain" aren't different in kind even if they are > >>> different in > >>> content. > >>> > >>> As resequencing projects multiply, the ability to create arbitrary > >>> meta tags, attach them to different types of objects, and use those > >>> tags to link them together will become desirable, if not essential. > >>> > >>> Keeping a common interface to all of these meta data types would be > >>> advantageous, plus new users won't have to determine whether they > >>> need to use Bio::Meta objects or Bio::Annotation objects. > >>> > >>> So I would argue for all of the meta data types to live "under one > >>> roof". Which roof isn't as important. Bio::Annotation, since it > >>> already exists for today's meta data, seems like a reasonable > >>> choice. > >>> (assuming Annotation objects are flexible enough to be extended as > >>> you propose) > >>> > >>> There, and no flames or jibes even. :) > >> > >> I guess what I want to know is whether there should to be a > >> distinction between 'normal' sequence annotation (comments, > >> references, and so on) and annotation that could be best described as > >> position-specific (like RNA or protein structural annotation). The > >> current meta implementation is for sequence data only; I felt it > >> would be nice to have a generic implementation that would be > >> applicable to any object data. > > > > my stream-of-consciousness for right now: > > > > I was thinking Bio::Annotation is where this should go - that > > system doesn't have anything about it that makes it explicitly > > sequence related. What we're trying to hammer out here on the > > Alignment side - which fits with your RNA example - is have > > features, basically SeqFeatures - associated with alignments so > > columns can be annotated to cover things like character sets and > > partitions for phylogenetic analyses. As for data which annotates > > non-contiguous things like RNAstems we may have to be more > > creative about that or model it with a splitLocation. > > > > So currently we've added code so that an Alignment is-a > > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this > > end, with the goal of being able to capture more of the data that > > can be represented in a NEXUS file. > > > > It feels more like a hack than an elegant Meta-data solution, but I > > am totally sure whether the data you are thinking about doing at > > this point, perhaps I need to spend more time thinking about it. > > Or are you worried about the idea of whether the semantic mapping > > of the data into features or annotations is confusing users? > > Sorry in advance for the longish response here... > > My original thought was to have a generic abstract class capable of > positionally describing data in any another class, similar to > Heikki's Bio::Seq::MetaI but not constrained to sequence data only. > Implementing classes would be capable of having different data > structures based on their use (simple string, array, AoA, AoH, AoO). > One MetaCollection class to contain them all in a tag-like system, so > you could have mixed data types describe the same object. The latter > Collection class is so similar to AnnotationCollection that I agree > Bio::Annotation would be the best place for this. > > The way I reconfigured Stockholm alignment parsing/writing is to use > Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is > capable of holding a sequence and several meta strings, stored as > tags or 'names'. However, there is no Meta object for alignments > (for RNA/protein structure consensus and other Rfam/Pfam markup); I > hacked around this by using a Bio::Seq::Meta w/o a seq, but I would > rather have a generic Meta object independent of the sequence cruft. > > So for this partial Pfam alignment, > > Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG > #=GR Q92SV1_RHIME/122-299 pAS ......................... > Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS > Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG > #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT > #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 > #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT > #=GC SA_cons 03002200312...1312414..676 > #=GC seq_cons luhhLuhsRpl...hthppth..+pG > // > > '#=GC' lines would be in generic meta string objects in the > alignment, while '#=GR' tags would be in similar meta objects in the > relevant sequences. As long as both aren't AnnotatableI this isn't > an issue. > > Similarly, NEXUS files which contained any position-based values > could hold a meta string/array object in a similar tag. > > The basic scheme is: > |--String > > Annotation::Meta----|--Array > > |--HorriblyComplexDataStruct > > Then I started thinking about where this could be applied, and > whether a true Meta object needs to be constrained only to describing > position-based data. This somewhat relates to this bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=1825 > > which seems to need a simple but unconstrained hash-of-arrays-based > meta object. > > Then my head appropriately exploded... > > Hope everything is going well at the hackathon! Looks like some > interesting stuff coming out of it. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From fgarret at ub.edu Mon Dec 18 16:18:31 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 17:18:31 +0100 Subject: [Bioperl-l] PAML files Message-ID: <4586BF57.4090002@ub.edu> Hi all, does anyone knows how to get the name of the .ctl file created by the PAML module? Inside the tmp directory there are 2 files with random names (tree and ctl). Why do they have random names?? Wouldn't it be easier to assign them a fixed name?? For instance "codeml.ctl" and "tree.nwk"?? thanks in adv, FG From bix at sendu.me.uk Mon Dec 18 16:15:21 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 16:15:21 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> Message-ID: <4586BE99.7020308@sendu.me.uk> Hilmar Lapp wrote: > > On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: > >> Lincoln Stein wrote: >>> This is very embarassing for me, particularly since I spent a lot >>> of time validating that Bio::Graphics was working properly before >>> the 1.5.2 release went out. How long before there is a 1.5.3 >>> release? How about a 1.5.2.1release? >> >> I'm happy to try a point release for critical bug fixes. Why don't >> you commit the necessary fixes to branch-1-5-2 and let me know when >> you're happy, and I'll do 1.5.2.1. > > Feel free to do that, but why not make a 1.5.3 off the main trunk? > 1.5.2.1 may be adding more to the version confusion > (developer/stable/point-release/etc) than it is worth, My feeling is that 1.5.3 should be reserved for some significant changes and new features, and not just a few bug fixes. I'd say this causes less confusion amongst users - they can associate '1.5.2' with a certain API and feature-set, and the specific name of the file they download and install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't matter at all to them. I also won't have to make some major announcement about it; it will simply be the most recent developer version of bioperl available so new users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing 1.5.2 users will only feel compelled to get it if they suffer from the bugs fixed. > and there is no shame in releasing new developer versions every few > weeks. I think doing frequent releases are inadvisable; such a quick release won't have had much testing so we shouldn't encourage people to install it: encouragement is implicit when a major new version comes out like 1.5.3. People who want to live on the edge can and should be using a CVS checkout. From bix at sendu.me.uk Mon Dec 18 19:15:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 19:15:16 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> Message-ID: <4586E8C4.6030306@sendu.me.uk> Chris Fields wrote: > On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: > >> However, on a larger set of 190 species, which are all present in >> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, >> something must be wrong with the merge_lineage method in the major >> rewrite of the taxonomy2tree script. Can someone please check this? >> I'm attaching the 190 species call to the script. Thanks, >> >> Gabriel > > I can confirm that. It is definitely dropping them in merge_lineage > (); if you add a call to get_leaf_nodes to check how many are > present after each merge_lineage() call, you can see it dropping > nodes along the trace. I confirm the 'dropped' nodes, but also claim that this is no bug. For example, the first 'drop' happens for the 101st species which is 'Leptospira interrogans serovar Copenhageni'. This is a variation (descendant) of species 24: 'Leptospira interrogans'. So when the variation is added it becomes a leaf and 'Leptospira interrogans' is no longer a leaf, so the overall number of leaves does not increase. The next drop is for species 103 'Prochlorococcus marinus subsp. pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'. Same deal. I didn't check any others, but suspect the same issue arises in all cases. Gabriel, please confirm this isn't a bug, or suggest how you propose to see your taxa when they are not all leaves of the tree. PS. I changed the merge_lineage() algorithm to be 18x faster (from the absurd 3mins for making the 190 species tree to a more reasonable 10s), without changing the tree produced. From fgarret at ub.edu Mon Dec 18 20:01:38 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 21:01:38 +0100 Subject: [Bioperl-l] PAML files In-Reply-To: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> References: <4586BF57.4090002@ub.edu> <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> Message-ID: <4586F3A2.4010607@ub.edu> Hi Jason, This question is related with the one I made previously today. I need to run codeml with 3 tree topologies. I looked on codeml module but it only accepts one tree as input so I thought of using the codeml module to prepare all the files and then I would just have to run the codeml with the new tree file in batch. But for that I need to know which one is the ctl file. any idea? FG Jason Stajich wrote: > They are temporary names so they are deliberately random and there is no > intention of you needing them after a run since it to be cleaned up on > the fly. We use an internal method for generating tempfiles that takes > care of cleanup afterwards. I suppose since we do all the work within a > temp directory that is cleaned up, one could have a fixed name for the > tree, alignment, and ctl files but honestly we never expect people to be > reading these filenames as they are intended to be transient. > > What problem are you having that you need the filename? > > -jason > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > >> Hi all, >> >> does anyone knows how to get the name of the .ctl file created by the >> PAML module? Inside the tmp directory there are 2 files with random >> names (tree and ctl). Why do they have random names?? Wouldn't it be >> easier to assign them a fixed name?? For instance "codeml.ctl" and >> "tree.nwk"?? >> >> thanks in adv, >> FG >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > From fgarret at ub.edu Mon Dec 18 20:07:46 2006 From: fgarret at ub.edu (Filipe Garrett) Date: Mon, 18 Dec 2006 21:07:46 +0100 Subject: [Bioperl-l] codeml In-Reply-To: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> References: <45868466.508@ub.edu> <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> Message-ID: <4586F512.1030209@ub.edu> Right now it's impossible for me to write it. By February or March I should have more time but I'll let you know. FG Jason Stajich wrote: > This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I > guess we'll need to allow the -tree option to accept and arrayref of trees. > Are you willing to try write this patch? It should be added as a > bug/feature request to bugzilla so it can be corrected in short order. > > -jason > On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote: > >> Hi all, >> >> I've been using bioperl's PAML module (specifically the codeml part) but >> with just one tree. >> >> Since the program accepts several trees as input (and runs the analysis >> for each tree outputting the difference in likelihoods for each one) I >> was wondering if there's some way to do it through bioperl? >> >> thanks in adv, >> FG >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > > From cjfields at uiuc.edu Mon Dec 18 20:55:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 14:55:55 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <4586E8C4.6030306@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> Message-ID: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote: >> >>> However, on a larger set of 190 species, which are all present in >>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect, >>> something must be wrong with the merge_lineage method in the major >>> rewrite of the taxonomy2tree script. Can someone please check this? >>> I'm attaching the 190 species call to the script. Thanks, >>> >>> Gabriel >> >> I can confirm that. It is definitely dropping them in merge_lineage >> (); if you add a call to get_leaf_nodes to check how many are >> present after each merge_lineage() call, you can see it dropping >> nodes along the trace. > > I confirm the 'dropped' nodes, but also claim that this is no bug. > > For example, the first 'drop' happens for the 101st species which is > 'Leptospira interrogans serovar Copenhageni'. This is a variation > (descendant) of species 24: 'Leptospira interrogans'. So when the > variation is added it becomes a leaf and 'Leptospira interrogans' > is no > longer a leaf, so the overall number of leaves does not increase. > > The next drop is for species 103 'Prochlorococcus marinus subsp. > pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'. > Same deal. I didn't check any others, but suspect the same issue > arises > in all cases. Makes sense now. I personally would consider this a bug since the results are unexpected (so the docs need to be modified in order to clarify). Some say tomato... I suppose this is one of the issues one might run into when using NCBI taxonomy to build trees. > Gabriel, please confirm this isn't a bug, or suggest how you > propose to > see your taxa when they are not all leaves of the tree. Having the nodes appear internally seems semantically correct to me. Is there any other way? > PS. I changed the merge_lineage() algorithm to be 18x faster (from the > absurd 3mins for making the 190 species tree to a more reasonable > 10s), > without changing the tree produced. Definitely an improvement! chris From jason at bioperl.org Mon Dec 18 19:33:32 2006 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Dec 2006 14:33:32 -0500 Subject: [Bioperl-l] PAML files In-Reply-To: <4586BF57.4090002@ub.edu> References: <4586BF57.4090002@ub.edu> Message-ID: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> They are temporary names so they are deliberately random and there is no intention of you needing them after a run since it to be cleaned up on the fly. We use an internal method for generating tempfiles that takes care of cleanup afterwards. I suppose since we do all the work within a temp directory that is cleaned up, one could have a fixed name for the tree, alignment, and ctl files but honestly we never expect people to be reading these filenames as they are intended to be transient. What problem are you having that you need the filename? -jason On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > Hi all, > > does anyone knows how to get the name of the .ctl file created by the > PAML module? Inside the tmp directory there are 2 files with random > names (tree and ctl). Why do they have random names?? Wouldn't it be > easier to assign them a fixed name?? For instance "codeml.ctl" and > "tree.nwk"?? > > thanks in adv, > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjm at fruitfly.org Mon Dec 18 21:50:00 2006 From: cjm at fruitfly.org (Chris Mungall) Date: Mon, 18 Dec 2006 13:50:00 -0800 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> Message-ID: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> I agree with everything Heikki is saying, I just wanted to highlight one paragraph: > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? One solution is to give your annotation/metadata-model formal computational semantics and use ontologies to give additional semantics to your metadata tags. This provides both user information in the form of documentation, and a means of specifying to the computer exactly what should be done with the tags. This is probably overkill for bioperl; but if the use cases being proposed do lean in the direction of a new metadata system that is not necessarily backwards compatible with the existing one, then I'd recommend checking out what's already out there before re-inventing the wheel. Perl RDF libraries are getting a little better. If anyone is interested in pursuing this sort of thing (probably on a branch), let me know On Dec 18, 2006, at 5:51 AM, Heikki Lehvaslaiho wrote: > > Reading the discussion, I think it is time to draw some guidelines. > > 1. Base the Meta implementation to a real use cases. > > MSA is a good example. > > 2. Allow generalisations > > If you can see an other implementation of the same idea that can > be merged > with the first do it but do not hurt yourself if you can not. > > > The most difficult question is how to separate case-specific > attributes that > are best implemented by subclassing with additional methods from > truly widely > variable meta data that is best done as a parallel track meta > information > holding class. > > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? > > That leads to the the third guideline: > > 3. Use separate meta classes only when there are several different > ways of > encoding data that is present in large numbers *and* when you are > expecting > to be assessing the data computationally rather than just checking > if an > attribute is there. > > > -Heikki > > > > On Friday 15 December 2006 19:23, Chris Fields wrote: >> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote: >>> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote: >>>> On Dec 14, 2006, at 7:45 PM, David Messina wrote: >>>>> Hey Chris, >>>>> >>>>> My thoughts below. >>>>> >>>>>> [Chris] >>>>>> This could be used to annotate any >>>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- >>>>>> you, >>>>>> maybe in a collection (similar to AnnotationCollection). I >>>>>> thought >>>>>> something like this may be of general use for any PrimarySeq >>>>>> (quality, structure), alignments like NEXUS and Stockholm, >>>>>> SeqFeatures where structure could be stored (tRNA or >>>>>> riboswitches), >>>>>> etc. >>>>>> >>>>>> However, this also seems to fall into the category of sequence >>>>>> annotation. So, would it be better to have a set of >>>>>> Bio::Annotation >>>>>> classes used for this purpose? >>>>> >>>>> To me, all meta data is equal. That is, your classic Genbank >>>>> feature >>>>> annotation and a user's arbitrary meta-tag like "Bob thinks this >>>>> is a >>>>> kinase domain" aren't different in kind even if they are >>>>> different in >>>>> content. >>>>> >>>>> As resequencing projects multiply, the ability to create arbitrary >>>>> meta tags, attach them to different types of objects, and use >>>>> those >>>>> tags to link them together will become desirable, if not >>>>> essential. >>>>> >>>>> Keeping a common interface to all of these meta data types >>>>> would be >>>>> advantageous, plus new users won't have to determine whether they >>>>> need to use Bio::Meta objects or Bio::Annotation objects. >>>>> >>>>> So I would argue for all of the meta data types to live "under one >>>>> roof". Which roof isn't as important. Bio::Annotation, since it >>>>> already exists for today's meta data, seems like a reasonable >>>>> choice. >>>>> (assuming Annotation objects are flexible enough to be extended as >>>>> you propose) >>>>> >>>>> There, and no flames or jibes even. :) >>>> >>>> I guess what I want to know is whether there should to be a >>>> distinction between 'normal' sequence annotation (comments, >>>> references, and so on) and annotation that could be best >>>> described as >>>> position-specific (like RNA or protein structural annotation). The >>>> current meta implementation is for sequence data only; I felt it >>>> would be nice to have a generic implementation that would be >>>> applicable to any object data. >>> >>> my stream-of-consciousness for right now: >>> >>> I was thinking Bio::Annotation is where this should go - that >>> system doesn't have anything about it that makes it explicitly >>> sequence related. What we're trying to hammer out here on the >>> Alignment side - which fits with your RNA example - is have >>> features, basically SeqFeatures - associated with alignments so >>> columns can be annotated to cover things like character sets and >>> partitions for phylogenetic analyses. As for data which annotates >>> non-contiguous things like RNAstems we may have to be more >>> creative about that or model it with a splitLocation. >>> >>> So currently we've added code so that an Alignment is-a >>> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this >>> end, with the goal of being able to capture more of the data that >>> can be represented in a NEXUS file. >>> >>> It feels more like a hack than an elegant Meta-data solution, but I >>> am totally sure whether the data you are thinking about doing at >>> this point, perhaps I need to spend more time thinking about it. >>> Or are you worried about the idea of whether the semantic mapping >>> of the data into features or annotations is confusing users? >> >> Sorry in advance for the longish response here... >> >> My original thought was to have a generic abstract class capable of >> positionally describing data in any another class, similar to >> Heikki's Bio::Seq::MetaI but not constrained to sequence data only. >> Implementing classes would be capable of having different data >> structures based on their use (simple string, array, AoA, AoH, AoO). >> One MetaCollection class to contain them all in a tag-like system, so >> you could have mixed data types describe the same object. The latter >> Collection class is so similar to AnnotationCollection that I agree >> Bio::Annotation would be the best place for this. >> >> The way I reconfigured Stockholm alignment parsing/writing is to use >> Bio::Seq::Meta objects (which are LocatableSeq). Each Seq::Meta is >> capable of holding a sequence and several meta strings, stored as >> tags or 'names'. However, there is no Meta object for alignments >> (for RNA/protein structure consensus and other Rfam/Pfam markup); I >> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would >> rather have a generic Meta object independent of the sequence cruft. >> >> So for this partial Pfam alignment, >> >> Q92SV1_RHIME/122-299 LAMALNLARGI...VDADVDF..REG >> #=GR Q92SV1_RHIME/122-299 pAS ......................... >> Q883D2_PSESM/110-290 LGLMLGLRRRL...FDGNGAV..KRS >> Q8ZXP5_PYRAE/91-262 LALLLAPYKRI...IQYGEKM..KRG >> #=GR Q8ZXP5_PYRAE/91-262 SS HHHHHHHHTTH...HHHHHHX..HTT >> #=GR Q8ZXP5_PYRAE/91-262 SA 00000000000...120030X..474 >> #=GC SS_cons HHHHHHHHTTH...HHHHHHH..HTT >> #=GC SA_cons 03002200312...1312414..676 >> #=GC seq_cons luhhLuhsRpl...hthppth..+pG >> // >> >> '#=GC' lines would be in generic meta string objects in the >> alignment, while '#=GR' tags would be in similar meta objects in the >> relevant sequences. As long as both aren't AnnotatableI this isn't >> an issue. >> >> Similarly, NEXUS files which contained any position-based values >> could hold a meta string/array object in a similar tag. >> >> The basic scheme is: >> |--String >> >> Annotation::Meta----|--Array >> >> |--HorriblyComplexDataStruct >> >> Then I started thinking about where this could be applied, and >> whether a true Meta object needs to be constrained only to describing >> position-based data. This somewhat relates to this bug: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1825 >> >> which seems to need a simple but unconstrained hash-of-arrays-based >> meta object. >> >> Then my head appropriately exploded... >> >> Hope everything is going well at the hackathon! Looks like some >> interesting stuff coming out of it. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Dec 18 19:35:50 2006 From: jason at bioperl.org (Jason Stajich) Date: Mon, 18 Dec 2006 14:35:50 -0500 Subject: [Bioperl-l] codeml In-Reply-To: <45868466.508@ub.edu> References: <45868466.508@ub.edu> Message-ID: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org> This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I guess we'll need to allow the -tree option to accept and arrayref of trees. Are you willing to try write this patch? It should be added as a bug/ feature request to bugzilla so it can be corrected in short order. -jason On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote: > Hi all, > > I've been using bioperl's PAML module (specifically the codeml > part) but > with just one tree. > > Since the program accepts several trees as input (and runs the > analysis > for each tree outputting the difference in likelihoods for each one) I > was wondering if there's some way to do it through bioperl? > > thanks in adv, > FG > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html From gowthaman.ramasamy at sbri.org Mon Dec 18 22:19:09 2006 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Mon, 18 Dec 2006 14:19:09 -0800 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence Message-ID: Hi List, Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) Many thanks in advance, gowtham From cjfields at uiuc.edu Mon Dec 18 22:33:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 16:33:34 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> Message-ID: On Dec 18, 2006, at 7:51 AM, Heikki Lehvaslaiho wrote: > > Reading the discussion, I think it is time to draw some guidelines. > > 1. Base the Meta implementation to a real use cases. > > MSA is a good example. AlignIO::stockholm is where I'll initially test it out. > 2. Allow generalisations > > If you can see an other implementation of the same idea that can > be merged > with the first do it but do not hurt yourself if you can not. I agree. > The most difficult question is how to separate case-specific > attributes that > are best implemented by subclassing with additional methods from > truly widely > variable meta data that is best done as a parallel track meta > information > holding class. I would probably start with a general Bio::Annotation::MetaI abstract class, which supplements AnnotationI with general meta-specific methods (meta, meta_text, named_meta, etc)? Implement this in whatever way one wanted (RNA structure as strings, quality data as arrays, etc) under the constraints of the interface description. Multiple meta objects, potentially of mixed data types, could be added in an AnnotationCollection along with other Bio::Annotation data, or stored in a nested meta-specific AnnotationCollection object (I favor the former as it's simpler). So you could have an alignment, sequence, seqfeature (anything that is AnnotatableI) with a regular AnnotationCollection also containing possibly multiple meta objects, each meta object also containing possibly more than one set of meta data. The key issue I have is whether or not to constrain these to describing positional data, similar to Bio::Seq::Meta, by ensuring that the data is_flush(), etc. My current inclination is 'no', and to have a separate abstract class which describes these methods, implementing those separately. > The problem I see with undefined, totally open meta annotation, is > that if you > can put anything in there, it is also totally confusing to a user. > If you can > put anything in, how do you know what to get get out and know that > it is > there? > > That leads to the the third guideline: > > 3. Use separate meta classes only when there are several different > ways of > encoding data that is present in large numbers *and* when you are > expecting > to be assessing the data computationally rather than just checking > if an > attribute is there. > > > -Heikki The initial use case for this would be simple data strings for alignment data. I already have a partial implementation in place for stockholm using Bio::Seq::Meta (which led me to this proposal!). I like Chris M.'s idea of ensuring that meta implementations use some sort of formalized ontology, but I'll probably start out very simple and work up from there. chris From cjfields at uiuc.edu Mon Dec 18 22:38:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 16:38:14 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem? In-Reply-To: <4586BE99.7020308@sendu.me.uk> References: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com> <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com> <458404BD.8030908@sendu.me.uk> <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net> <4586BE99.7020308@sendu.me.uk> Message-ID: <6AD475AE-7F5E-4612-BC24-73B65AA47F30@uiuc.edu> On Dec 18, 2006, at 10:15 AM, Sendu Bala wrote: > Hilmar Lapp wrote: >> >> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote: >> >>> Lincoln Stein wrote: >>>> This is very embarassing for me, particularly since I spent a lot >>>> of time validating that Bio::Graphics was working properly before >>>> the 1.5.2 release went out. How long before there is a 1.5.3 >>>> release? How about a 1.5.2.1release? >>> >>> I'm happy to try a point release for critical bug fixes. Why don't >>> you commit the necessary fixes to branch-1-5-2 and let me know when >>> you're happy, and I'll do 1.5.2.1. >> >> Feel free to do that, but why not make a 1.5.3 off the main trunk? >> 1.5.2.1 may be adding more to the version confusion >> (developer/stable/point-release/etc) than it is worth, > > My feeling is that 1.5.3 should be reserved for some significant > changes > and new features, and not just a few bug fixes. I'd say this causes > less > confusion amongst users - they can associate '1.5.2' with a certain > API > and feature-set, and the specific name of the file they download and > install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't > matter at all to them. > > I also won't have to make some major announcement about it; it will > simply be the most recent developer version of bioperl available so > new > users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing > 1.5.2 users will only feel compelled to get it if they suffer from the > bugs fixed. > > >> and there is no shame in releasing new developer versions every few >> weeks. > > I think doing frequent releases are inadvisable; such a quick release > won't have had much testing so we shouldn't encourage people to > install > it: encouragement is implicit when a major new version comes out like > 1.5.3. People who want to live on the edge can and should be using a > CVS checkout. I thought that 1.5.2 was considered a point release for the 1.5 dev series, for bug fixes along with the potential for added/experimental features. Similarly, 1.6.x releases would be point releases for bug fixes only with all tests passing (no added features since it is a stable release series). I guess one could reason that 1.5.x releases have both bug fixes and new features, while 1.5.x.y releases are simply bug fixes for the 1.5.x branch (no new features). We probably should add something to the FAQ and maybe make a few changes to the 1.5.2 wiki page. I think having a 1.5.2.1 release is feasible as a quick one-off to get Lincoln's fixes in, since you would make them off the 1.5.2 branch anyway (so I guess it could be considered a bug release from that branch). It's probably not something we should make a habit of, but then again I'm not the Pumpkin! chris From bix at sendu.me.uk Mon Dec 18 22:50:11 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 22:50:11 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> Message-ID: <45871B23.8070103@sendu.me.uk> Chris Fields wrote: > > On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: > >> For example, the first 'drop' happens for the 101st species which is >> 'Leptospira interrogans serovar Copenhageni'. This is a variation >> (descendant) of species 24: 'Leptospira interrogans'. So when the >> variation is added it becomes a leaf and 'Leptospira interrogans' is no >> longer a leaf, so the overall number of leaves does not increase. > > Makes sense now. I personally would consider this a bug since the > results are unexpected (so the docs need to be modified in order to > clarify). Some say tomato... > > I suppose this is one of the issues one might run into when using NCBI > taxonomy to build trees. No, the tree produced is perfectly fine. The taxonomy2tree.pl script deliberately then does: # simple paths are contracted by removing degree one nodes $tree->contract_linear_paths; Because that is what Gabriel's script originally did. >> Gabriel, please confirm this isn't a bug, or suggest how you propose to >> see your taxa when they are not all leaves of the tree. > > Having the nodes appear internally seems semantically correct to me. Is > there any other way? I suppose if we want to see all the input species output again we have to make contract_linear_paths() aware of nodes we want to keep, even when they are degree one nodes. Gabriel, is that what you want to see? From cjfields at uiuc.edu Mon Dec 18 23:14:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 17:14:23 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <45871B23.8070103@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> Message-ID: On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: >>> For example, the first 'drop' happens for the 101st species which is >>> 'Leptospira interrogans serovar Copenhageni'. This is a variation >>> (descendant) of species 24: 'Leptospira interrogans'. So when the >>> variation is added it becomes a leaf and 'Leptospira interrogans' >>> is no >>> longer a leaf, so the overall number of leaves does not increase. >> >> Makes sense now. I personally would consider this a bug since the >> results are unexpected (so the docs need to be modified in order >> to clarify). Some say tomato... >> I suppose this is one of the issues one might run into when using >> NCBI taxonomy to build trees. > > No, the tree produced is perfectly fine. The taxonomy2tree.pl > script deliberately then does: > > # simple paths are contracted by removing degree one nodes > $tree->contract_linear_paths; > > Because that is what Gabriel's script originally did. I think you misunderstood me. The tree is fine; the data used to make the tree (NCBI taxonomy) is the issue. One of the clear caveats that NCBI attaches to their taxonomy data is that should not be the 'primary source for taxonomic or phylogenetic information': http://tinyurl.com/y3k624 I think it works as a good guide as long as one takes the above into consideration. That and the fact that not all taxids attached to sequence data will represent leaf nodes. chris From cjfields at uiuc.edu Mon Dec 18 23:15:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 17:15:56 -0600 Subject: [Bioperl-l] Proposal for Meta data In-Reply-To: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> References: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org> <200612181551.51277.heikki@sanbi.ac.za> <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org> Message-ID: <16D6DB51-C2CB-4E89-A597-4672FAA6681B@uiuc.edu> On Dec 18, 2006, at 3:50 PM, Chris Mungall wrote: > > I agree with everything Heikki is saying, I just wanted to highlight > one paragraph: > >> The problem I see with undefined, totally open meta annotation, is >> that if you >> can put anything in there, it is also totally confusing to a user. >> If you can >> put anything in, how do you know what to get get out and know that >> it is >> there? > > One solution is to give your annotation/metadata-model formal > computational semantics and use ontologies to give additional > semantics to your metadata tags. This provides both user information > in the form of documentation, and a means of specifying to the > computer exactly what should be done with the tags. > > This is probably overkill for bioperl; but if the use cases being > proposed do lean in the direction of a new metadata system that is > not necessarily backwards compatible with the existing one, then I'd > recommend checking out what's already out there before re-inventing > the wheel. Perl RDF libraries are getting a little better. > > If anyone is interested in pursuing this sort of thing (probably on a > branch), let me know ... I like the idea of of using ontologies (although that's one of my many weak points!). I'll likely start off with simple examples using meta data initially, then progress from there. It is a developer series, after all! Thanks everybody! I think I have an idea on how to at least get started. chris From bix at sendu.me.uk Mon Dec 18 23:27:15 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 23:27:15 +0000 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> Message-ID: <458723D3.4010908@sendu.me.uk> Chris Fields wrote: > > On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote: >>>> For example, the first 'drop' happens for the 101st species which is >>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation >>>> (descendant) of species 24: 'Leptospira interrogans'. So when the >>>> variation is added it becomes a leaf and 'Leptospira interrogans' is no >>>> longer a leaf, so the overall number of leaves does not increase. >>> >>> Makes sense now. I personally would consider this a bug since the >>> results are unexpected (so the docs need to be modified in order to >>> clarify). Some say tomato... >>> I suppose this is one of the issues one might run into when using >>> NCBI taxonomy to build trees. >> >> No, the tree produced is perfectly fine. The taxonomy2tree.pl script >> deliberately then does: >> >> # simple paths are contracted by removing degree one nodes >> $tree->contract_linear_paths; >> >> Because that is what Gabriel's script originally did. > > I think you misunderstood me. The tree is fine; the data used to make > the tree (NCBI taxonomy) is the issue. In what way is it the issue? The database is also fine as far as I can see, in so far as it is not causing any problems in this instance. Gabriel asked for a tree featuring a species and its subspecies. The NCBI taxonomy database provided Bioperl the correct data to build such a tree. Then Gabriel asked to remove the degree one nodes of his tree. His problem was that doing that happened to (correctly) remove the species node. If he wants to see both his species and his subspecies he must either not remove degree one nodes, or alter the method of doing so to keep desired nodes. There is no possible way for NCBI to improve matters here. From bix at sendu.me.uk Mon Dec 18 23:45:59 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 18 Dec 2006 23:45:59 +0000 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <45872837.6050403@sendu.me.uk> Gowthaman Ramasamy wrote: > Hi List, Is there any module in bioperl which can find out the primer > binding sites in a genomic sequence. I am interested in finding > locations with few mismatches along the primer...not just the exact > match (which is a very trivial task) There's no module dedicated to that task, but Bioperl may help you to answer the question. Probably the easiest/reliable/clear thing to do is to do a Blast with appropriate settings for short sequence with few mismatches. You can write a script to only consider hits for your forward primer that are a 'primable' distance from a hit to your reverse primer (and check their orientations are correct as well). Or use some e-pcr tool. From Kevin.M.Brown at asu.edu Mon Dec 18 23:52:20 2006 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 18 Dec 2006 16:52:20 -0700 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence Message-ID: <1A4207F8295607498283FE9E93B775B40270F3BB@EX02.asurite.ad.asu.edu> A function I use to find the first landing site for a primer. Should be modifiable to handle multiple occurences: =head1 C Match searches for a near alignment between two strings and returns the position at which the two strings align. Match is based on 80% conformation match($this, $in_that) =cut sub match { my ($primer, $gene) = @_; my $start = 0; my $pattern = ""; for (my $i = 0 ; $i < length($primer) ; $i++) { $pattern .= substr($primer, $i, 1); pos($gene) = 0; if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($pattern) + 1; } else { $start = 0; chop($pattern); $pattern .= '.'; } } if ($pattern =~ /\.$/) { if ($gene =~ m/$pattern/gi) { $start = pos($gene) - length($pattern) + 1; } } $pattern =~ s/\.//g; if ((length($pattern) / length($primer)) > .8) { #print $start . "\n"; return $start; } return 0; } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, December 18, 2006 4:46 PM > To: Gowthaman Ramasamy > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] module to find out primer binding > sites in a genome sequence > > Gowthaman Ramasamy wrote: > > Hi List, Is there any module in bioperl which can find out > the primer > > binding sites in a genomic sequence. I am interested in finding > > locations with few mismatches along the primer...not just the exact > > match (which is a very trivial task) > > There's no module dedicated to that task, but Bioperl may help you to > answer the question. > > Probably the easiest/reliable/clear thing to do is to do a Blast with > appropriate settings for short sequence with few mismatches. You can > write a script to only consider hits for your forward primer > that are a > 'primable' distance from a hit to your reverse primer (and check their > orientations are correct as well). > > Or use some e-pcr tool. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From torsten.seemann at infotech.monash.edu.au Mon Dec 18 23:52:58 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 19 Dec 2006 10:52:58 +1100 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <458729DA.9030909@infotech.monash.edu.au> Gowthaman Ramasamy wrote: > Hi List, > Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. > I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) This FAQ question may help: http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F This software may help: http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From sdavis2 at mail.nih.gov Tue Dec 19 02:16:19 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 18 Dec 2006 21:16:19 -0500 Subject: [Bioperl-l] module to find out primer binding sites in a genome sequence In-Reply-To: References: Message-ID: <45874B73.7010600@mail.nih.gov> Gowthaman Ramasamy wrote: > Hi List, > Is there any module in bioperl which can find out the primer binding sites in a genomic sequence. > I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task) > See here: http://genome.ucsc.edu/cgi-bin/hgPcr?command=start It is designed for exactly this task, is very fast, is available as an executable or web-based (though watch the usage requirements), and the output can be parsed rather easily. Sean From cjfields at uiuc.edu Tue Dec 19 02:30:04 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 20:30:04 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <458723D3.4010908@sendu.me.uk> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> Message-ID: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> >> I think you misunderstood me. The tree is fine; the data used to >> make >> the tree (NCBI taxonomy) is the issue. > > In what way is it the issue? The database is also fine as far as I can > see, in so far as it is not causing any problems in this instance. I should maybe have clarified a bit more: what I said has nothing to do with the structure of the database itself. I was just pointing out that NCBI Taxonomy isn't the best source of data for building a phylogenetic tree, something NCBI also points out (the link in my last post). Not a big deal, really. > Gabriel asked for a tree featuring a species and its subspecies. The > NCBI taxonomy database provided Bioperl the correct data to build > such a > tree. Then Gabriel asked to remove the degree one nodes of his > tree. His > problem was that doing that happened to (correctly) remove the species > node. If he wants to see both his species and his subspecies he must > either not remove degree one nodes, or alter the method of doing so to > keep desired nodes. There is no possible way for NCBI to improve > matters > here. Actually, there isn't any way they could w/o digging through the literature in many cases. The problem is incomplete taxonomic information for nodes derived from older sequence data, where a genus and species was designated but nothing else (strain, etc) is known. Again, I merely was pointing out what I had mentioned above. I wasn't criticizing you, Gabriel, or the methodology here. Honest! chris From avilella at gmail.com Mon Dec 18 21:43:27 2006 From: avilella at gmail.com (Albert Vilella) Date: Mon, 18 Dec 2006 21:43:27 +0000 Subject: [Bioperl-l] PAML files In-Reply-To: <4586F3A2.4010607@ub.edu> References: <4586BF57.4090002@ub.edu> <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org> <4586F3A2.4010607@ub.edu> Message-ID: <358f4d650612181343o5bd51169w7b46cceb34a5c92b@mail.gmail.com> Filipe, if you need to create the ctl file but not run the job, you can use the "prepare" method in Codeml run. Also, there is a tmpdir and save_tempfiles method that will keep the files where you want. About the naming, you can add a ".tree" and ".aln" extension to the tempnames if you want, by altering the $temptreefile and $tempseqfile variables in bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm (cvs head version). If you want, you can also add a couple of getters/setters there: sub alnfilename{ my $self = shift; return $self->{'alnfilename'} = shift if @_; return $self->{'alnfilename'}; } and subtitute those $tempseqfile io calls for you $self->{'alnfilename'} io calls. $codeml->alnfilename("/path/name"); $codeml->prepare; ... $codeml->run; What I use to do is to have the aln and tree files in a different place. Codeml will create the tmp files for running somewhere, and then delete all the stuff when done. Cheers, Albert. On 12/18/06, Filipe Garrett wrote: > > Hi Jason, > > This question is related with the one I made previously today. > I need to run codeml with 3 tree topologies. I looked on codeml module > but it only accepts one tree as input so I thought of using the codeml > module to prepare all the files and then I would just have to run the > codeml with the new tree file in batch. But for that I need to know > which one is the ctl file. > > any idea? > FG > > Jason Stajich wrote: > > They are temporary names so they are deliberately random and there is no > > intention of you needing them after a run since it to be cleaned up on > > the fly. We use an internal method for generating tempfiles that takes > > care of cleanup afterwards. I suppose since we do all the work within a > > temp directory that is cleaned up, one could have a fixed name for the > > tree, alignment, and ctl files but honestly we never expect people to be > > reading these filenames as they are intended to be transient. > > > > What problem are you having that you need the filename? > > > > -jason > > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote: > > > >> Hi all, > >> > >> does anyone knows how to get the name of the .ctl file created by the > >> PAML module? Inside the tmp directory there are 2 files with random > >> names (tree and ctl). Why do they have random names?? Wouldn't it be > >> easier to assign them a fixed name?? For instance "codeml.ctl" and > >> "tree.nwk"?? > >> > >> thanks in adv, > >> FG > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > http://jason.open-bio.org/ > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From valiente at lsi.upc.edu Tue Dec 19 04:18:20 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Tue, 19 Dec 2006 13:18:20 +0900 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> Message-ID: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> Thanks a lot for the prompt answer and follow-up discussion. I think this turned out not to be a bug in the merge_lineage() code but a direct consequence of building a phylogenetic tree instead of a taxonomic tree, aka with internal node labels. In order to reconstruct the NCBI taxonomy for the set of species present in a given phylogenetic tree, the only reasonable work-around seems to be a first step of merging lineages and contracting linear paths with the current implementation, followed by a second step of restricting the given phylogenetic tree to the set of species present in the obtained NCBI taxonomy. But this does not affect the code for merge_lineage(). Gabriel >>> I think you misunderstood me. The tree is fine; the data used to >>> make >>> the tree (NCBI taxonomy) is the issue. >> >> In what way is it the issue? The database is also fine as far as I >> can >> see, in so far as it is not causing any problems in this instance. > > I should maybe have clarified a bit more: what I said has nothing > to do with the structure of the database itself. I was just > pointing out that NCBI Taxonomy isn't the best source of data for > building a phylogenetic tree, something NCBI also points out (the > link in my last post). Not a big deal, really. > >> Gabriel asked for a tree featuring a species and its subspecies. The >> NCBI taxonomy database provided Bioperl the correct data to build >> such a >> tree. Then Gabriel asked to remove the degree one nodes of his >> tree. His >> problem was that doing that happened to (correctly) remove the >> species >> node. If he wants to see both his species and his subspecies he must >> either not remove degree one nodes, or alter the method of doing >> so to >> keep desired nodes. There is no possible way for NCBI to improve >> matters >> here. > > Actually, there isn't any way they could w/o digging through the > literature in many cases. The problem is incomplete taxonomic > information for nodes derived from older sequence data, where a > genus and species was designated but nothing else (strain, etc) is > known. > > Again, I merely was pointing out what I had mentioned above. I > wasn't criticizing you, Gabriel, or the methodology here. Honest! > > chris From cjfields at uiuc.edu Tue Dec 19 04:41:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 18 Dec 2006 22:41:16 -0600 Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species In-Reply-To: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> References: <68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu> <4577E4A2.5090303@sendu.me.uk> <4577EAAF.7030509@sendu.me.uk> <0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu> <4577EFD3.7090904@sendu.me.uk> <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu> <4586E8C4.6030306@sendu.me.uk> <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu> <45871B23.8070103@sendu.me.uk> <458723D3.4010908@sendu.me.uk> <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu> <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu> Message-ID: On Dec 18, 2006, at 10:18 PM, Gabriel Valiente wrote: > Thanks a lot for the prompt answer and follow-up discussion. I > think this turned out not to be a bug in the merge_lineage() code > but a direct consequence of building a phylogenetic tree instead of > a taxonomic tree, aka with internal node labels. > > In order to reconstruct the NCBI taxonomy for the set of species > present in a given phylogenetic tree, the only reasonable work- > around seems to be a first step of merging lineages and contracting > linear paths with the current implementation, followed by a second > step of restricting the given phylogenetic tree to the set of > species present in the obtained NCBI taxonomy. But this does not > affect the code for merge_lineage(). > > Gabriel I did notice one thing, though it's minor: if you use the option to retrieve the data from Entrez, a few species aren't found (even though they show up in a local taxonomy search). I think both were E. coli strains. chris From DGroskreutz at twt.com Tue Dec 19 07:00:40 2006 From: DGroskreutz at twt.com (DGroskreutz at twt.com) Date: Tue, 19 Dec 2006 01:00:40 -0600 Subject: [Bioperl-l] CN=Deb Groskreutz/OU=MSN/O=TWT is out of the office. Message-ID: I will be out of the office starting 12/18/2006 and will not return until 01/02/2007. NOTICE OF CONFIDENTIALITY: The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments. From michael.watson at bbsrc.ac.uk Tue Dec 19 12:20:56 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 19 Dec 2006 12:20:56 -0000 Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs? Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I'm using bioperl-1.4. I did do a google search fro this but couldn't find anything. If this is fixed in 1.5.2 then forgive me. I'm getting a warning: MSG: No whitespace allowed in FASTA ID [unknown id] When trying to convert from EMBL format to fasta. The offending sequence is CK234114: ID CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP. XX AC CK234114; XX DT 03-MAR-2004 (Rel. 79, Created) DT 03-MAR-2004 (Rel. 79, Last updated, Version 1) XX DE SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon cDNA DE Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA DE sequence. Etc Any advice? Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From michael.watson at bbsrc.ac.uk Tue Dec 19 12:27:59 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 19 Dec 2006 12:27:59 -0000 Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs? In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E682@iahce2ksrv1.iah.bbsrc.ac.uk> Sorry, problem solved. Mick -----Original Message----- From: michael watson (IAH-C) Sent: 19 December 2006 12:21 To: bioperl-l at lists.open-bio.org Subject: Problems with EMBL entries and fasta IDs? Hi I'm using bioperl-1.4. I did do a google search fro this but couldn't find anything. If this is fixed in 1.5.2 then forgive me. I'm getting a warning: MSG: No whitespace allowed in FASTA ID [unknown id] When trying to convert from EMBL format to fasta. The offending sequence is CK234114: ID CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP. XX AC CK234114; XX DT 03-MAR-2004 (Rel. 79, Created) DT 03-MAR-2004 (Rel. 79, Last updated, Version 1) XX DE SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon cDNA DE Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA DE sequence. Etc Any advice? Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From roest216 at student.otago.ac.nz Tue Dec 19 09:15:55 2006 From: roest216 at student.otago.ac.nz (Stephan Roessner) Date: Tue, 19 Dec 2006 22:15:55 +1300 Subject: [Bioperl-l] problems installing bioperl Message-ID: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> Dear support team, I installed bioperl 1.5.2_100 on a ferdora machine to be able to use gbrowse. The installation seems to work (except of the test failures) but the gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but of course it requires 1.52. Is there a chance to find out what went wrong? thanks a lot, Stephan From bix at sendu.me.uk Tue Dec 19 15:12:39 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 15:12:39 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> Message-ID: <45880167.9010605@sendu.me.uk> Stephan Roessner wrote: > Dear support team, > > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use > gbrowse. > The installation seems to work (except of the test failures) but the > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but > of course it requires 1.52. > > Is there a chance to find out what went wrong? Nothing went wrong with the Bioperl installation (well, expect there shouldn't have been any test failures - can you post those please?). gbrowse simply defined its Bioperl requirement incorrectly. If you tell me exactly where you downloaded gbrowse from and how you went about installing it, and provide the exact, complete error message you got from it, I might be able help the authors fix the problem. Or I'm pretty sure they can figure it our for themselves :) From cjfields at uiuc.edu Tue Dec 19 16:05:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 10:05:01 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] problems installing bioperl In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> Message-ID: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> On Dec 19, 2006, at 9:31 AM, Scott Cain wrote: > I really don't think the BioPerl version detection is wrong. I > actually > don't check Bio::Root::Version::VERSION in Makefile.PL, I check > Bio::Graphics::Panel->api_version. When it doesn't find the correct > api_version, it gives a warning the BioPerl 1.5.2 is not installed. I > have seen this happen when more than one BioPerl instance is installed > and `perl Makefile.PL` finds the wrong one first. My suggestion is to > try reinstalling BioPerl and providing the --uninst 1 argument to > remove > older versions of BioPerl: > > sudo ./Build install --uninst 1 > > Scott Could having two Bioperl instances explain the test failures? I'm not sure (maybe Sendu can answer this), but I would assume Module::Build uses the current working directory for test runs. chris From bix at sendu.me.uk Tue Dec 19 17:02:34 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 17:02:34 +0000 Subject: [Bioperl-l] [Gmod-gbrowse] problems installing bioperl In-Reply-To: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu> Message-ID: <45881B2A.8060907@sendu.me.uk> Chris Fields wrote: > > On Dec 19, 2006, at 9:31 AM, Scott Cain wrote: > >> I really don't think the BioPerl version detection is wrong. I actually >> don't check Bio::Root::Version::VERSION in Makefile.PL, I check >> Bio::Graphics::Panel->api_version. When it doesn't find the correct >> api_version, it gives a warning the BioPerl 1.5.2 is not installed. I >> have seen this happen when more than one BioPerl instance is installed >> and `perl Makefile.PL` finds the wrong one first. My suggestion is to >> try reinstalling BioPerl and providing the --uninst 1 argument to remove >> older versions of BioPerl: >> >> sudo ./Build install --uninst 1 >> >> Scott > > Could having two Bioperl instances explain the test failures? I'm not > sure (maybe Sendu can answer this), but I would assume Module::Build > uses the current working directory for test runs. It does, so that shouldn't be an issue for the test failures. From ferraria at gmail.com Tue Dec 19 16:40:05 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Tue, 19 Dec 2006 17:40:05 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy Message-ID: Hi all, I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with the cpan shell. I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI 'gene' database (first step of my pipeline). But the installation of this package doesn't seem to be correct : The simple example given on the documentation doesn't work. (this one : http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) Here is the error message I got : "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." In the UserAgent package, line 779 is in the private "_need_proxy" subroutine and corresponds to the code : ...if (@{ $self->{'no_proxy'} }) ... If I comment this line in the UserAgent package and the corresponding "}", the example works. But obviously, I prefer to solve the problem in a regular way :) Indeed, my computer accesses the internet via a http proxy and I didn't tell this to BioPerl at any moment. As I read on the BioPerl Wiki site, I tried to configure an $http_proxy environment variable but it still doesn't work. One last maybe important information is that I saw during the installation that the tests 't/EUtilities' were skipped because of an unrevealed reason. So finally I got two questions : 1. Is there somebody who can figure out what is my problem ? 2. At the moment, is the Bio::DB::EUtilities package really efficient or using directly the NCBI eutilities with the LWP::Simple package could be an good alternative ? Many thanks in advance, Best Regards, Anthony Ferrari From bix at sendu.me.uk Tue Dec 19 17:06:03 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 17:06:03 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> Message-ID: <45881BFB.7020008@sendu.me.uk> Scott Cain wrote: > I really don't think the BioPerl version detection is wrong. I actually > don't check Bio::Root::Version::VERSION in Makefile.PL, I check > Bio::Graphics::Panel->api_version. When it doesn't find the correct > api_version, it gives a warning the BioPerl 1.5.2 is not installed. I > have seen this happen when more than one BioPerl instance is installed > and `perl Makefile.PL` finds the wrong one first. Yes, I saw that, which is why I thought I must be looking at something different to what the OP had installed. > My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove > older versions of BioPerl: > > sudo ./Build install --uninst 1 My confusion is that he has definitely installed 1.5.2 and this version is being polled for its version number (by something!) and returning the correct '1.0050021', whilst the something expects '1.52'. Anyway, this can only be resolved if Stephan provides the real error message and its context. From cjfields at uiuc.edu Tue Dec 19 17:27:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 11:27:24 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: Message-ID: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote: > Hi all, > > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake > machine with > the cpan shell. > I want to use the Bio::DB::EUtilities to retrieve data (id's) from > NCBI > 'gene' database (first step of my pipeline). > > But the installation of this package doesn't seem to be correct : > The simple example given on the documentation doesn't work. (this > one : > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) > > Here is the error message I got : > "Can't use an undefined value as an ARRAY reference at > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > In the UserAgent package, line 779 is in the private "_need_proxy" > subroutine and corresponds to the code : ...if (@{ $self-> > {'no_proxy'} }) > ... > > If I comment this line in the UserAgent package and the > corresponding "}", > the example works. But obviously, I prefer to solve the problem in > a regular > way :) > > Indeed, my computer accesses the internet via a http proxy and I > didn't tell > this to BioPerl at any moment. > As I read on the BioPerl Wiki site, I tried to configure an > $http_proxy > environment variable but it still doesn't work. > > One last maybe important information is that I saw during the > installation > that the tests 't/EUtilities' were skipped because of an unrevealed > reason. > > > So finally I got two questions : > 1. Is there somebody who can figure out what is my problem ? > 2. At the moment, is the Bio::DB::EUtilities package really > efficient or > using directly the NCBI eutilities with the LWP::Simple package > could be an > good alternative ? > > Many thanks in advance, > Best Regards, > Anthony Ferrari First things first: at the moment the BioPerl EUtilities interface is very experimental (as specifically outlined in the POD), so I can't really recommend it for production use until the API is cleaned up. However, I do appreciate any feedback or comments re:EUtilities (the reason it's out there in the 1.5.2 release). You might check out this bug report, which relates directly to your issue: http://bugzilla.open-bio.org/show_bug.cgi?id=2109 After I worked out the proxy issue Torsten got it working. Let me know if this doesn't help or fix the problem. chris From cain at cshl.edu Tue Dec 19 15:31:50 2006 From: cain at cshl.edu (Scott Cain) Date: Tue, 19 Dec 2006 10:31:50 -0500 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <45880167.9010605@sendu.me.uk> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> Message-ID: <1166542310.6981.119.camel@localhost.localdomain> I really don't think the BioPerl version detection is wrong. I actually don't check Bio::Root::Version::VERSION in Makefile.PL, I check Bio::Graphics::Panel->api_version. When it doesn't find the correct api_version, it gives a warning the BioPerl 1.5.2 is not installed. I have seen this happen when more than one BioPerl instance is installed and `perl Makefile.PL` finds the wrong one first. My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove older versions of BioPerl: sudo ./Build install --uninst 1 Scott On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote: > Stephan Roessner wrote: > > Dear support team, > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use > > gbrowse. > > The installation seems to work (except of the test failures) but the > > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but > > of course it requires 1.52. > > > > Is there a chance to find out what went wrong? > > Nothing went wrong with the Bioperl installation (well, expect there > shouldn't have been any test failures - can you post those please?). > gbrowse simply defined its Bioperl requirement incorrectly. If you tell > me exactly where you downloaded gbrowse from and how you went about > installing it, and provide the exact, complete error message you got > from it, I might be able help the authors fix the problem. > > Or I'm pretty sure they can figure it our for themselves :) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From ferraria at gmail.com Tue Dec 19 17:06:31 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Tue, 19 Dec 2006 18:06:31 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: Message-ID: Hi all, I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with the cpan shell. I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI 'gene' database (first step of my pipeline). But the installation of this package doesn't seem to be correct : The simple example given on the documentation doesn't work. (this one : http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) Here is the error message I got : "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." In the UserAgent package, line 779 is in the private "_need_proxy" subroutine and corresponds to the code : ...if (@{ $self->{'no_proxy'} }) ... If I comment this line in the UserAgent package and the corresponding "}", the example works. But obviously, I prefer to solve the problem in a regular way :) Indeed, my computer accesses the internet via a http proxy and I didn't tell this to BioPerl at any moment. As I read on the BioPerl Wiki site, I tried to configure an $http_proxy environment variable but it still doesn't work. One last maybe important information is that I saw during the installation that the tests 't/EUtilities' were skipped because of an unrevealed reason. So finally I got two questions : 1. Is there somebody who can figure out what is my problem ? 2. At the moment, is the Bio::DB::EUtilities package really efficient or using directly the NCBI eutilities with the LWP::Simple package could be an good alternative ? Many thanks in advance, Best Regards, Anthony Ferrari From stewarta at nmrc.navy.mil Tue Dec 19 18:49:57 2006 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Tue, 19 Dec 2006 13:49:57 -0500 Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3 Message-ID: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> I see that Bio::Tools::Glimmer documentation clearly states that this module is intended for use with GlimmerM (eukaryotic version) only. I am wondering if anyone can recall any talk about adopting Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)? I've searched the list history with little luck other than someone else asking a similar question. If not, does anyone have any thoughts on how difficult it might be to implement support for glimmer2/3 result parsing? Perhaps just a matter of editing the _parse_predictions method? -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From rvosa at sfu.ca Tue Dec 19 18:53:47 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 19 Dec 2006 10:53:47 -0800 Subject: [Bioperl-l] problems installing bioperl Message-ID: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From cjfields at uiuc.edu Tue Dec 19 19:31:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 19 Dec 2006 13:31:17 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3 In-Reply-To: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> References: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil> Message-ID: <71E04575-DFD2-4F5A-B268-493D3246CBFA@uiuc.edu> On Dec 19, 2006, at 12:49 PM, Andrew Stewart wrote: > I see that Bio::Tools::Glimmer documentation clearly states that this > module is intended for use with GlimmerM (eukaryotic version) only. > I am wondering if anyone can recall any talk about adopting > Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)? > I've searched the list history with little luck other than someone > else asking a similar question. There is a thread here: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12546/ focus=12546 > If not, does anyone have any thoughts on how difficult it might be to > implement support for glimmer2/3 result parsing? Perhaps just a > matter of editing the _parse_predictions method? It depends on how different the various Glimmer formats are; I'll have to look at the ones Torsten added in CVS. You could always try modifying Bio::Tools::Glimmer to parse Glimmer2/3 and GlimmerM reports, but based on the mail list thread above it may not be so straightforward. chris From MEC at stowers-institute.org Tue Dec 19 19:57:48 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 19 Dec 2006 13:57:48 -0600 Subject: [Bioperl-l] bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation Message-ID: Lincoln and fellow Bio::DB::SeqFeature travelers, I find that using bp_seqfeature_load.PLS to load subfeatures of genes already loaded using bp_seqfeature_load.PLS fails with ------------- EXCEPTION ------------- MSG: FBgn0017545 doesn't have a primary id STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 STACK toplevel /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo ad.PLS:76 Where FBgn0017545 is the ID of a gene previously loaded. I am unsure how to remedy my situation and welcome any advise on correct or improved approach to my problem. Here's some detail if it helps. I am developing a pipeline to design a microarray probes capable of distinguishing among splice variants in drosophila (using latest Flybase dmel_r5.1 annotation). So I 1) load a filtered selection of Flybase annotation using bp_seqfeature_load. (for testing purposes, I am using a single gene's worth of annotation, FBgn0017545.gff, attached). This is done as follows: > bp_seqfeature_load.PLS --create FBgn0017545.gff 2) analyze all the genes in the database, and create GFF3 output each feature of which has a 'Parent' that is a previously loaded gene (i.e. FBgn0017545). (These features represent the unique introns, splice sites, and exonic design targets. Output of this analysis, FBgn0017545_matd.gff, is also attached) 3) load these analysis results into the same database, as follows: > bp_seqfeature_load.PLS FBgn0017545_matd.gff It is at this point that I get the above error. However, I don't get any error and the data loads fine if I load the two files together, as follows: > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff FBgn0017545_matd.gff) So, I suspect that either I am misunderstanding when/how to use bp_seqfeature_load.PLS or else this use case has not yet arisen and must be provided for somehow. I am running against bioperl-live Thanks for your thoughts and assistance, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri From Kevin.M.Brown at asu.edu Tue Dec 19 21:46:19 2006 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Tue, 19 Dec 2006 14:46:19 -0700 Subject: [Bioperl-l] Bio::SimpleAlign Message-ID: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> I'm working on a script that plays around with alignments of sequences and one of the things I noticed is that the code for the match method does not seem to actually use the start/end information when creating the match between objects in the alignment. Maybe I'm misunderstanding what the alignment is supposed to hold in terms of sequence. The alignment objects I build up are based on the sequence of a gene and the sequences of the primers that amplify that gene. $alignments{$gene->id()}->add_seq( new Bio::LocatableSeq( -seq => $seq[0]->seq(), -id => $seq[0]->id(), -start => $start, -end => $start + $seq[0]->length() - 1, -strand => 1 ) ); $alignments{$gene->id()}->add_seq( new Bio::LocatableSeq( -seq => $seq[1]->seq(), -id => $seq[1]->id(), -start => $stop, -end => $stop + $seq[1]->length() - 1, -strand => -1 ) ); So, you can see I input a start and stop point for the primer, but when I use the match function all it does is match the first character of the gene sequence to the first char of the primer sequences, then the second gene char to the second in each primer, etc... This doesn't seem to fit with the documentation and seems odd that there would be holders for the start/stop points and not use them when doing things like matching of sequences in an alignment. From bix at sendu.me.uk Tue Dec 19 22:01:22 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 19 Dec 2006 22:01:22 +0000 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> References: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca> Message-ID: <45886132.7050505@sendu.me.uk> Rutger Vos wrote: > Aren't 1.5.2_100 and 1.0050021 supposed to be equivalent in in this weird > version-string-translation way that makes 5.5 and 5.005 equivalent also? Yes, 1.5.2_100 and 1.0050021 are equivalent. The equivalent of 5.5 is 5.500 however. From lstein at cshl.edu Tue Dec 19 21:58:24 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 19 Dec 2006 16:58:24 -0500 Subject: [Bioperl-l] bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation In-Reply-To: References: Message-ID: <6dce9a0b0612191358t4764bfe0g601cd22d09132e55@mail.gmail.com> Hi Malcom, Your second guess was right. The use case of augmenting an existing gene with additional splice forms isn't provided for. You can get the functionality by making direct calls to Bio::DB::SeqFeature::Store methods: my @genes = $db->get_features_by_name('FBgn0017545'); @genes == 1 or die "Didn't get exactly one gene"; my $parent = $genes[0]; my $parent = $genes[0]; my $chr = $parent->seq_id; my $start = $parent->start; my $end = $parent->end; my $strand = $parent->strand; my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', -source => 'added', -seq_id => '4r', -strand => $strand, -start => $start+10, -end => $end, ); $parent->add_SeqFeature($new_splice_form); for my $pos ([$start+10,$start+100],[$start+200,$end]) { my ($e_start,$e_end) = @$pos; my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', -store => $db, -seq_id => '4r', -strand => $strand, -start => $e_start, -end => $e_end); $new_splice_form->add_SeqFeature($exon); } I found a bug in updating the seqfeature database when I wrote this script, so you'll have to get the latest biperl live. I think you can use this to write a splice form updating script. In order to support the idea of adding new splice forms to an existing gene using the GFF3 format, I will have to either modify the loader, or write a separate script (probably better to do the latter). It shouldn't be hard if you'd like to give it a try. Lincoln On 12/19/06, Cook, Malcolm wrote: > > Lincoln and fellow Bio::DB::SeqFeature travelers, > > I find that using bp_seqfeature_load.PLS to load subfeatures of genes > already loaded using bp_seqfeature_load.PLS fails with > > ------------- EXCEPTION ------------- > MSG: FBgn0017545 doesn't have a primary id > STACK > Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 > STACK Bio::DB::SeqFeature::Store::GFF3Loader::load > /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 > STACK toplevel > /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo > ad.PLS:76 > > Where FBgn0017545 is the ID of a gene previously loaded. > > I am unsure how to remedy my situation and welcome any advise on correct > or improved approach to my problem. > > Here's some detail if it helps. I am developing a pipeline to design a > microarray probes capable of distinguishing among splice variants in > drosophila (using latest Flybase dmel_r5.1 annotation). So I > > 1) load a filtered selection of Flybase annotation using > bp_seqfeature_load. (for testing purposes, I am using a single gene's > worth of annotation, FBgn0017545.gff, attached). This is done as > follows: > > > bp_seqfeature_load.PLS --create FBgn0017545.gff > > 2) analyze all the genes in the database, and create GFF3 output each > feature of which has a 'Parent' that is a previously loaded gene (i.e. > FBgn0017545). (These features represent the unique introns, splice > sites, and exonic design targets. Output of this analysis, > FBgn0017545_matd.gff, is also attached) > > 3) load these analysis results into the same database, as follows: > > > bp_seqfeature_load.PLS FBgn0017545_matd.gff > > It is at this point that I get the above error. > > However, I don't get any error and the data loads fine if I load the two > files together, as follows: > > > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff > FBgn0017545_matd.gff) > > So, I suspect that either I am misunderstanding when/how to use > bp_seqfeature_load.PLS or else this use case has not yet arisen and must > be provided for somehow. > > I am running against bioperl-live > > Thanks for your thoughts and assistance, > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From rvosa at sfu.ca Wed Dec 20 04:23:20 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 19 Dec 2006 20:23:20 -0800 Subject: [Bioperl-l] suggestions for suitable 'taxon' object Message-ID: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From cjfields at uiuc.edu Wed Dec 20 06:16:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 00:16:47 -0600 Subject: [Bioperl-l] suggestions for suitable 'taxon' object In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> Message-ID: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote: > Hi all, > > I am looking for a bioperl object that can be abused to function as a > suitable 'taxon' object, where I mean 'taxon' as understood by the > NEXUS > file format (i.e. not strictly an entity from a taxonomy, but more > loosely > an OTU). > > The object would primarily function as a way to relate nodes in > trees to > sequences in an alignment (a foreign key that both nodes and > sequences refer > to), and secondarily as the keeper of the canonical name of the > OTU, such > that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node > named 'Homo > sapiens (constrained monophyly)' can still be understood to refer > to the > same thing - the OTU 'Homo sapiens sapiens' (for example). Alignment (SimpleAlign) objects contain Bio::LocatableSeq sequence objects; at the moment LocatableSeqs don't store their own annotation but they could easily be made or subclassed to be AnnotatableI (i.e. they can store annotation collections). I recently made SimpleAlign Annotatable; Jason has also made SimpleAlign implement FeatureHolderI, so alignments can store SeqFeatures as well; he may have his own designs here. There may be a wide variety of ways to go about this. I would probably do the following (bear in mind I'm a microbiologist, not a computer scientist). If one could add trees as annotation to the alignment (i.e. if trees could be Annotation objects and kept in the SimpleAlign's annotation collection), and each sequence in the alignment contained reference to a node object of the tree (i.e. if Bio::Taxon/Bio::Species objects could also be Annotation objects, but kept in a LocatableSeq annotation collection), both could refer to the same node object. This may not be exactly what you want, but maybe it's close? SimpleAlign->AnnoColln->Tree->OTU(Nodes) \----->LocSeqs-->AnnoColln-----/ I suppose this could also be done with Seqfeatures... > I was thinking that a (possibly expanded) Bio::Species might work > if there > was some sensible way of appending references to node and sequence > objects > to it (or otherwise associate them with each other), but I am > writing *to > solicit any and all suggestions*. I am looking for something > similar to > Bio::Phylo::Taxa::Taxon. > > Any and all comments and suggestions greatly appreciated! > > Best wishes, > > Rutger Vos Sendu would be the best one to speak about Bio::Taxon and Bio::Species and may have some ideas on the above. The current plan was to deprecate Bio::Species, but who knows? chris From heikki at sanbi.ac.za Wed Dec 20 10:25:08 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 20 Dec 2006 12:25:08 +0200 Subject: [Bioperl-l] Bio::SimpleAlign In-Reply-To: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu> Message-ID: <200612201225.08862.heikki@sanbi.ac.za> Kevin, Sequences that are added to the alignment are supposed to be *aligned*. SimpleAlign does not do it for you. It seems to me that you are adding sequences like this: nnnnnnnnnnnnnnnnnnnn 1 - 20, "a short gene" nnnnnn 21 - 26 "a short primer after the gene" when you should be doing this nnnnnnnnnnnnnnnnnnnn 1 - 20, "a short gene" --------------------nnnnnn 21 - 26 "a short primer after the gene" Note that the default way of displaying names in SimpleAlign is "name/start-end". The name usually are expected to refer to the sequence from which this subsequence is derived from. The displayname does not change if you add gaps. Yours, -Heikki On Tuesday 19 December 2006 23:46, Kevin Brown wrote: > I'm working on a script that plays around with alignments of sequences > and one of the things I noticed is that the code for the match method > does not seem to actually use the start/end information when creating > the match between objects in the alignment. Maybe I'm misunderstanding > what the alignment is supposed to hold in terms of sequence. The > alignment objects I build up are based on the sequence of a gene and the > sequences of the primers that amplify that gene. > > $alignments{$gene->id()}->add_seq( > new Bio::LocatableSeq( > -seq => $seq[0]->seq(), > -id => $seq[0]->id(), > -start => $start, > -end => $start + $seq[0]->length() - 1, > -strand => 1 > ) > ); If your sequence does not contain gaps and the numbering starts from one, you can let the object handle start/stop: my $a = new Bio::LocatableSeq( -seq => 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', -id => 'A00001', -strand => 1 } > $alignments{$gene->id()}->add_seq( > new Bio::LocatableSeq( > -seq => $seq[1]->seq(), > -id => $seq[1]->id(), > -start => $stop, > -end => $stop + $seq[1]->length() - 1, > -strand => -1 > ) > ); > > So, you can see I input a start and stop point for the primer, but when > I use the match function all it does is match the first character of the > gene sequence to the first char of the primer sequences, then the second > gene char to the second in each primer, etc... This doesn't seem to fit > with the documentation and seems odd that there would be holders for the > start/stop points and not use them when doing things like matching of > sequences in an alignment. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From ferraria at gmail.com Wed Dec 20 11:04:16 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 12:04:16 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> Message-ID: On 19/12/06, Chris Fields wrote: > > > On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote: > > > Hi all, > > > > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake > > machine with > > the cpan shell. > > I want to use the Bio::DB::EUtilities to retrieve data (id's) from > > NCBI > > 'gene' database (first step of my pipeline). > > > > But the installation of this package doesn't seem to be correct : > > The simple example given on the documentation doesn't work. (this > > one : > > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS) > > > > Here is the error message I got : > > "Can't use an undefined value as an ARRAY reference at > > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > In the UserAgent package, line 779 is in the private "_need_proxy" > > subroutine and corresponds to the code : ...if (@{ $self-> > > {'no_proxy'} }) > > ... > > > > If I comment this line in the UserAgent package and the > > corresponding "}", > > the example works. But obviously, I prefer to solve the problem in > > a regular > > way :) > > > > Indeed, my computer accesses the internet via a http proxy and I > > didn't tell > > this to BioPerl at any moment. > > As I read on the BioPerl Wiki site, I tried to configure an > > $http_proxy > > environment variable but it still doesn't work. > > > > One last maybe important information is that I saw during the > > installation > > that the tests 't/EUtilities' were skipped because of an unrevealed > > reason. > > > > > > So finally I got two questions : > > 1. Is there somebody who can figure out what is my problem ? > > 2. At the moment, is the Bio::DB::EUtilities package really > > efficient or > > using directly the NCBI eutilities with the LWP::Simple package > > could be an > > good alternative ? > > > > Many thanks in advance, > > Best Regards, > > Anthony Ferrari > > First things first: at the moment the BioPerl EUtilities interface is > very experimental (as specifically outlined in the POD), so I can't > really recommend it for production use until the API is cleaned up. > However, I do appreciate any feedback or comments re:EUtilities (the > reason it's out there in the 1.5.2 release). > > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > I carefully read this bug but that doesn't help because this has already been modified in the now given GenericWebDBI.pm So my problem does not come from a deep recursion loop. As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/EUtilities.t " to see what's really happening. And actually, all tests are skipped because of the same message error -> "Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." *** I tried the same command with the modified LWP::UserAgent package (which means I comment the line 779 and the corresponding '}') and all 453 tests passed. But not always. I made the tests several times and it often failed. And always on a test called "eXXX->cookie->cookie() query key" (ending with query key). In those cases, I got back a html message indicating that the error was thrown by the internal sever of NCBI. So I guess that sometimes it is just NCBI server fault (internal problem), and BioPerl is not implied.. But once more, I comment a line from a basic package so it is a bit hazardous. *** tony - a little bit lost. From smane at vbi.vt.edu Tue Dec 19 19:46:56 2006 From: smane at vbi.vt.edu (Shrinivasrao P. Mane) Date: Tue, 19 Dec 2006 14:46:56 -0500 Subject: [Bioperl-l] Using Muscle parameter within bioperl Message-ID: Hi, I need to run muscle using bioperl. This is how I do it in command line. muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet I used the following in perl script my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); The program runs and produces the result file but it doesn't create a log file nor does it stop sending output to STDOUT (-quiet). Could anybody help me with this? Thanks Mane From cjfields at uiuc.edu Wed Dec 20 14:09:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 08:09:56 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> Message-ID: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > > > I carefully read this bug but that doesn't help because this has > already been modified in the now given GenericWebDBI.pm > So my problem does not come from a deep recursion loop. > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > EUtilities.t " to see what's really happening. > And actually, all tests are skipped because of the same message error > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > *** > I tried the same command with the modified LWP::UserAgent package > (which means I comment the line 779 and the corresponding '}') and > all 453 tests passed. > But not always. I made the tests several times and it often > failed. And always on a test called "eXXX->cookie->cookie() query > key" (ending with query key). In those cases, I got back a html > message indicating that the error was thrown by the internal sever > of NCBI. So I guess that sometimes it is just NCBI server fault > (internal problem), and BioPerl is not implied.. > But once more, I comment a line from a basic package so it is a bit > hazardous. > *** > > tony - a little bit lost. I'm cc'ing Torsten as he has a bit more experience with proxies. EUtilities is set up to check for an env. proxy and also take a set proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy to say this was a bug in LWP, but I think the problem is that something is undefined (i.e. an env. variable), or username/password. From the bug report, Torsten set his proxy variables using the following: -------------------------------------- "Note: I am behind an _authenticating_ proxy. My $http_proxy and $HTTP_PROXY are both set to http://USER:PASS at proxy.monash.edu.au:80/" -------------------------------------- Note the lowercase for $http_proxy, which can make a difference. After the recursion fix, I'm assuming he made no changes to the env. settings, and according to the bug everything was fine (is that correct Tortsen?). Also LWP::UserAgent has this: -------------------------------------- "Load proxy settings from *_proxy environment variables. You might specify proxies like this (sh-syntax): gopher_proxy=http://proxy.my.place/ wais_proxy=http://proxy.my.place/ no_proxy="localhost,my.domain" export gopher_proxy wais_proxy no_proxy csh or tcsh users should use the setenv command to define these environment variables. On systems with case insensitive environment variables there exists a name clash between the CGI environment variables and the HTTP_PROXY environment variable normally picked up by env_proxy(). Because of this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY environment variable can be used instead." -------------------------------------- chris From bix at sendu.me.uk Wed Dec 20 14:08:16 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 14:08:16 +0000 Subject: [Bioperl-l] Using Muscle parameter within bioperl In-Reply-To: References: Message-ID: <458943D0.10400@sendu.me.uk> Shrinivasrao P. Mane wrote: > Hi, > I need to run muscle using bioperl. This is how I do it in command line. > > muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet > > I used the following in perl script > > my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => > 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); > > The program runs and produces the result file but it doesn't create a > log file nor does it stop sending output to STDOUT (-quiet). > Could anybody help me with this? The Muscle arguments don't take dashed args. To make switches active you need to set them to some true value. So (-verbose => 1, quiet => 1, log => 'inv.log'). Verbose may not do what you want since it is both a Bioperl option and a Muscle option; if you want the latter try using verbose => 1. From bix at sendu.me.uk Wed Dec 20 14:51:33 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 14:51:33 +0000 Subject: [Bioperl-l] suggestions for suitable 'taxon' object In-Reply-To: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu> Message-ID: <45894DF5.1060503@sendu.me.uk> Chris Fields wrote: > On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote: > >> Hi all, >> >> I am looking for a bioperl object that can be abused to function as >> a suitable 'taxon' object, where I mean 'taxon' as understood by >> the NEXUS file format (i.e. not strictly an entity from a taxonomy, >> but more loosely an OTU). >> >> The object would primarily function as a way to relate nodes in >> trees to sequences in an alignment (a foreign key that both nodes >> and sequences refer to), and secondarily as the keeper of the >> canonical name of the OTU, such that a sequence named >> 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo sapiens >> (constrained monophyly)' can still be understood to refer to the >> same thing - the OTU 'Homo sapiens sapiens' (for example). I haven't had time to give your suggestions consideration, but I can say that I'm having to do the same thing for a bioperl-run module and my work-around is simply to set a custom name on my Bio::Taxon objects. To explain, I have the benefit that my tree is made up of Bio::Taxon objects, so I call $taxon->name('seq_id', $seq->id). Then when I want to know which of my sequences corresponds to a particular taxon, I work out which of them has the id given by shift @{$taxon->name('seq_id')}. Hardly ideal, but it works for now. >> I was thinking that a (possibly expanded) Bio::Species might work >> if there was some sensible way of appending references to node and >> sequence objects to it (or otherwise associate them with each >> other), but I am writing *to solicit any and all suggestions*. I am >> looking for something similar to Bio::Phylo::Taxa::Taxon. > > Sendu would be the best one to speak about Bio::Taxon and > Bio::Species and may have some ideas on the above. The current plan > was to deprecate Bio::Species, but who knows? Given that we do plan to deprecate Bio::Species, I'd resist the temptation to expand on it. Use Bio::Taxon as a base if it has stuff you need, or base straight from Bio::Tree::Node if not. From ferraria at gmail.com Wed Dec 20 15:40:34 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 16:40:34 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> References: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu> <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu> Message-ID: Defining a "no_proxy" environment variable in my '.bashrc' file solved my problem. I set it to "localhost". It indeed corresponds to the line... [ ...if (@{ $self->{'no_proxy'} }) ... ] (I guess!) I really don't know why we are compelled to do this, but let's say that's the way it is. It works now ! Thanks a lot. Tony On 20/12/06, Chris Fields wrote: > > > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > > > You might check out this bug report, which relates directly to your > > issue: > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > > > After I worked out the proxy issue Torsten got it working. Let me > > know if this doesn't help or fix the problem. > > > > chris > > > > > > I carefully read this bug but that doesn't help because this has > > already been modified in the now given GenericWebDBI.pm > > So my problem does not come from a deep recursion loop. > > > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > > EUtilities.t " to see what's really happening. > > And actually, all tests are skipped because of the same message error > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > *** > > I tried the same command with the modified LWP::UserAgent package > > (which means I comment the line 779 and the corresponding '}') and > > all 453 tests passed. > > But not always. I made the tests several times and it often > > failed. And always on a test called "eXXX->cookie->cookie() query > > key" (ending with query key). In those cases, I got back a html > > message indicating that the error was thrown by the internal sever > > of NCBI. So I guess that sometimes it is just NCBI server fault > > (internal problem), and BioPerl is not implied.. > > But once more, I comment a line from a basic package so it is a bit > > hazardous. > > *** > > > > tony - a little bit lost. > > I'm cc'ing Torsten as he has a bit more experience with proxies. > > EUtilities is set up to check for an env. proxy and also take a set > proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy > to say this was a bug in LWP, but I think the problem is that > something is undefined (i.e. an env. variable), or username/password. > > From the bug report, Torsten set his proxy variables using the > following: > > -------------------------------------- > "Note: I am behind an _authenticating_ proxy. > My $http_proxy and $HTTP_PROXY are both set to > http://USER:PASS at proxy.monash.edu.au:80/" > -------------------------------------- > > Note the lowercase for $http_proxy, which can make a difference. > After the recursion fix, I'm assuming he made no changes to the env. > settings, and according to the bug everything was fine (is that > correct Tortsen?). > > Also LWP::UserAgent has this: > > -------------------------------------- > "Load proxy settings from *_proxy environment variables. You might > specify proxies like this (sh-syntax): > > gopher_proxy=http://proxy.my.place/ > wais_proxy=http://proxy.my.place/ > no_proxy="localhost,my.domain" > export gopher_proxy wais_proxy no_proxy > > csh or tcsh users should use the setenv command to define these > environment variables. > > On systems with case insensitive environment variables there exists a > name clash between the CGI environment variables and the HTTP_PROXY > environment variable normally picked up by env_proxy(). Because of > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY > environment variable can be used instead." > -------------------------------------- > > chris > From cjfields at uiuc.edu Wed Dec 20 16:10:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 10:10:48 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: Message-ID: <007901c72451$6ad540a0$15327e82@pyrimidine> Just to clarify: does it work it you don't have any proxy env. settings? chris _____ From: Anthony Ferrari [mailto:ferraria at gmail.com] Sent: Wednesday, December 20, 2006 9:41 AM To: Chris Fields Cc: bioperl-l List; Torsten Seemann Subject: Re: [Bioperl-l] Problem with : EUtilities - Proxy Defining a "no_proxy" environment variable in my '.bashrc' file solved my problem. I set it to "localhost". It indeed corresponds to the line... [ ...if (@{ $self->{'no_proxy'} }) ... ] (I guess!) I really don't know why we are compelled to do this, but let's say that's the way it is. It works now ! Thanks a lot. Tony On 20/12/06, Chris Fields wrote: On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > You might check out this bug report, which relates directly to your > issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > After I worked out the proxy issue Torsten got it working. Let me > know if this doesn't help or fix the problem. > > chris > > > I carefully read this bug but that doesn't help because this has > already been modified in the now given GenericWebDBI.pm > So my problem does not come from a deep recursion loop. > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > EUtilities.t " to see what's really happening. > And actually, all tests are skipped because of the same message error > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > *** > I tried the same command with the modified LWP::UserAgent package > (which means I comment the line 779 and the corresponding '}') and > all 453 tests passed. > But not always. I made the tests several times and it often > failed. And always on a test called "eXXX->cookie->cookie() query > key" (ending with query key). In those cases, I got back a html > message indicating that the error was thrown by the internal sever > of NCBI. So I guess that sometimes it is just NCBI server fault > (internal problem), and BioPerl is not implied.. > But once more, I comment a line from a basic package so it is a bit > hazardous. > *** > > tony - a little bit lost. I'm cc'ing Torsten as he has a bit more experience with proxies. EUtilities is set up to check for an env. proxy and also take a set proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy to say this was a bug in LWP, but I think the problem is that something is undefined ( i.e. an env. variable), or username/password. >From the bug report, Torsten set his proxy variables using the following: -------------------------------------- "Note: I am behind an _authenticating_ proxy. My $http_proxy and $HTTP_PROXY are both set to http://USER:PASS at proxy.monash.edu.au:80/" -------------------------------------- Note the lowercase for $http_proxy, which can make a difference. After the recursion fix, I'm assuming he made no changes to the env. settings, and according to the bug everything was fine (is that correct Tortsen?). Also LWP::UserAgent has this: -------------------------------------- "Load proxy settings from *_proxy environment variables. You might specify proxies like this (sh-syntax): gopher_proxy=http://proxy.my.place/ wais_proxy= http://proxy.my.place/ no_proxy="localhost,my.domain" export gopher_proxy wais_proxy no_proxy csh or tcsh users should use the setenv command to define these environment variables. On systems with case insensitive environment variables there exists a name clash between the CGI environment variables and the HTTP_PROXY environment variable normally picked up by env_proxy(). Because of this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY environment variable can be used instead." -------------------------------------- chris From ferraria at gmail.com Wed Dec 20 16:59:49 2006 From: ferraria at gmail.com (Anthony Ferrari) Date: Wed, 20 Dec 2006 17:59:49 +0100 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <007901c72451$6ad540a0$15327e82@pyrimidine> References: <007901c72451$6ad540a0$15327e82@pyrimidine> Message-ID: First, I got a $http_proxy env. variable automatically defined by the BioPerl installation (I don't define and export it in my .bash_profile). So when I'm logging in, $http_proxy=http://ip_adress:port/ Next step : two solutions : 1) defining an $no_proxy env.variable in my .bash_profile. It can be set to 'whatever'. 2) If I do not define '$no_proxy'; to make it work, I must call the no_proxy() method on each Bio::DB::EUtilities object I create before I can call the get_response() method on it. (The bug is in the 'get_response' call) And finally without 1) or 2) it doesn't work. Tony On 20/12/06, Chris Fields wrote: > > Just to clarify: does it work it you don't have any proxy env. settings? > One thing I didn't point out previously is that Bio::DB::GenericWebDBI > inherits LWP::UserAgent. You should be able to use $eutil->no_proxy() > instead of setting it in your env. > chris > > ------------------------------ > *From:* Anthony Ferrari [mailto:ferraria at gmail.com] > *Sent:* Wednesday, December 20, 2006 9:41 AM > *To:* Chris Fields > *Cc:* bioperl-l List; Torsten Seemann > *Subject:* Re: [Bioperl-l] Problem with : EUtilities - Proxy > > Defining a "no_proxy" environment variable in my '.bashrc' file solved my > problem. I set it to "localhost". > > It indeed corresponds to the line... [ ...if (@{ > $self->{'no_proxy'} }) ... ] (I guess!) > > > I really don't know why we are compelled to do this, but let's say that's > the way it is. > > It works now ! > > Thanks a lot. > > Tony > > > > > On 20/12/06, Chris Fields wrote: > > > > > > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote: > > > > > You might check out this bug report, which relates directly to your > > > issue: > > > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109 > > > > > > After I worked out the proxy issue Torsten got it working. Let me > > > know if this doesn't help or fix the problem. > > > > > > chris > > > > > > > > > I carefully read this bug but that doesn't help because this has > > > already been modified in the now given GenericWebDBI.pm > > > So my problem does not come from a deep recursion loop. > > > > > > As Torsten did, I tried the command " BIOPERLDEBUG=1 perl -I. -w t/ > > > EUtilities.t " to see what's really happening. > > > And actually, all tests are skipped because of the same message error > > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ > > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779." > > > > > > *** > > > I tried the same command with the modified LWP::UserAgent package > > > (which means I comment the line 779 and the corresponding '}') and > > > all 453 tests passed. > > > But not always. I made the tests several times and it often > > > failed. And always on a test called "eXXX->cookie->cookie() query > > > key" (ending with query key). In those cases, I got back a html > > > message indicating that the error was thrown by the internal sever > > > of NCBI. So I guess that sometimes it is just NCBI server fault > > > (internal problem), and BioPerl is not implied.. > > > But once more, I comment a line from a basic package so it is a bit > > > hazardous. > > > *** > > > > > > tony - a little bit lost. > > > > I'm cc'ing Torsten as he has a bit more experience with proxies. > > > > EUtilities is set up to check for an env. proxy and also take a set > > proxy with $agent->proxy() (see GenericWebDBI POD). It would be easy > > to say this was a bug in LWP, but I think the problem is that > > something is undefined ( i.e. an env. variable), or username/password. > > > > From the bug report, Torsten set his proxy variables using the > > following: > > > > -------------------------------------- > > "Note: I am behind an _authenticating_ proxy. > > My $http_proxy and $HTTP_PROXY are both set to > > http://USER:PASS at proxy.monash.edu.au:80/" > > -------------------------------------- > > > > Note the lowercase for $http_proxy, which can make a difference. > > After the recursion fix, I'm assuming he made no changes to the env. > > settings, and according to the bug everything was fine (is that > > correct Tortsen?). > > > > Also LWP::UserAgent has this: > > > > -------------------------------------- > > "Load proxy settings from *_proxy environment variables. You might > > specify proxies like this (sh-syntax): > > > > gopher_proxy=http://proxy.my.place/ > > wais_proxy= http://proxy.my.place/ > > no_proxy="localhost,my.domain" > > export gopher_proxy wais_proxy no_proxy > > > > csh or tcsh users should use the setenv command to define these > > environment variables. > > > > On systems with case insensitive environment variables there exists a > > name clash between the CGI environment variables and the HTTP_PROXY > > environment variable normally picked up by env_proxy(). Because of > > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY > > environment variable can be used instead." > > -------------------------------------- > > > > chris > > > > From cjfields at uiuc.edu Wed Dec 20 18:28:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 12:28:09 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: Message-ID: <000301c72464$9a12a070$15327e82@pyrimidine> > First, I got a $http_proxy env. variable automatically > defined by the BioPerl installation (I don't define and > export it in my .bash_profile). > So when I'm logging in, $http_proxy=http://ip_adress:port/ BioPerl can't permanently set any env. variables out of the box since that would mean modifying your local .bash_profile or the system profile. If you're a user on a system where you're not the sysadmin, then it's more likely the sysadmin has set up user accounts with an already-defined $http_proxy variable in the system .bash_profile (which is passed on to all users). The problem I can see (going by what you have above) is there is no username/password defined, only the address (IP:Port). I am assuming LWP is expecting some form of authentication when a proxy is env. defined w/o username/password included. If so, you'll have to supply those yourself, either by redefining $http_proxy to include it in your local .bash_profile, export $http_proxy='http://USER:PASS at proxy.monash.edu.au:80/' by using $agent->proxy() for including all proxy information, or by using $agent->authentication() so that a proxy can authorize any outgoing/incoming requests. The first may be preferrable if you are able to do so since you wouldn't have to authenticate every agent. Note that this would also explain why you had an LWP problem with an undefined array ref: the LWP agent is likely expecting some form of authentication, probably in the form [username, password], if a proxy env. variable is found. > Next step : two solutions : > 1) defining an $no_proxy env.variable in my .bash_profile. > It can be set to 'whatever'. > > 2) If I do not define '$no_proxy'; to make it work, I must call the > no_proxy() method on each Bio::DB::EUtilities object I create > before I can call the get_response() method on it. > > (The bug is in the 'get_response' call) If you mean when the request is calling proxy_authorization_basic(), that's not a bug. If we didn't authorize then it likely wouldn't work for properly set up proxies (Torsten's worked). Note that it's supposed to be passing a username/password from $self->authentication(). The fact that you can set $no_proxy to anything suggests there is no proxy in place. > And finally without 1) or 2) it doesn't work. > > Tony We can't guarantee that defining no_proxy will always work on your system, either. It's possible a proxy was added systemwide but a firewall hasn't been put in place yet; once it goes up and all requests need to be authorized, then you'll run into problems again. Conversely, maybe this was defined at some point systemwide in the .bash_profile but wasn't removed. The only one who would know is the sysadmin. If you aren't the sysadmin, you can contact them to find out about how to properly set up your proxy, or whether it is even necessary (maybe they neglected to remove the proxy definition from the system .bash_profile). Who knows? chris From bix at sendu.me.uk Wed Dec 20 21:03:03 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 20 Dec 2006 21:03:03 +0000 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine> References: <000301c72464$9a12a070$15327e82@pyrimidine> Message-ID: <4589A507.60106@sendu.me.uk> Chris Fields wrote: >> First, I got a $http_proxy env. variable automatically >> defined by the BioPerl installation (I don't define and >> export it in my .bash_profile). >> So when I'm logging in, $http_proxy=http://ip_adress:port/ > > BioPerl can't permanently set any env. variables out of the box since True, and it doesn't try to set one temporarily either. To clarify some of the other points Chris made, the proxy variable certainly doesn't need username and password to be defined (from LWPs point of view), since not all proxies authenticate. Of course accesses won't work if authentication is actually required and these aren't set. There's no reason that no_proxy should have to be set. It is used to say what domains shouldn't be proxied. Either this is a real LWP bug, or somehow EUtilities or one of its bases is doing something wrong. It should be investigated... It would be very informative if Anthony could log in when he hasn't done anything to his environment variables (and so where the original problem manifests) and give us the results of: perl -e 'while (($key, $val) = each %ENV) { print "$key => $val\n" }' From avilella at gmail.com Wed Dec 20 14:07:17 2006 From: avilella at gmail.com (Albert Vilella) Date: Wed, 20 Dec 2006 14:07:17 +0000 Subject: [Bioperl-l] Using Muscle parameter within bioperl In-Reply-To: References: Message-ID: <358f4d650612200607m4324b8f1r91d2d917cd4951bd@mail.gmail.com> Try something like: my @params =('verbose'=>0, 'quiet'=>1, 'log'=>'/tmp/inv.log'); my $factory = Bio::Tools::Run::Alignment::Muscle->new(@params); it works for me with muscle 3.6. The log only gives me a start, commandstring and end. I dunno if that is what muscle is supposed to spit out. Albert. On 12/19/06, Shrinivasrao P. Mane wrote: > Hi, > I need to run muscle using bioperl. This is how I do it in command line. > > muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet > > I used the following in perl script > > my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format => > 'clustalw', -verbose=>'', -quiet=>'', -log='inv.log'); > > The program runs and produces the result file but it doesn't create a > log file nor does it stop sending output to STDOUT (-quiet). > Could anybody help me with this? > Thanks > Mane > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Dec 20 22:46:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 16:46:35 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <4589A507.60106@sendu.me.uk> Message-ID: <000c01c72488$b6a690b0$15327e82@pyrimidine> > Chris Fields wrote: > >> First, I got a $http_proxy env. variable automatically > defined by the > >> BioPerl installation (I don't define and export it in my > >> .bash_profile). > >> So when I'm logging in, > $http_proxy=http://ip_adress:port/ > > > > BioPerl can't permanently set any env. variables out of the > box since > > True, and it doesn't try to set one temporarily either. > > To clarify some of the other points Chris made, the proxy > variable certainly doesn't need username and password to be > defined (from LWPs point of view), since not all proxies > authenticate. Of course accesses won't work if authentication > is actually required and these aren't set. > > There's no reason that no_proxy should have to be set. It is > used to say what domains shouldn't be proxied. Either this is > a real LWP bug, or somehow EUtilities or one of its bases is > doing something wrong. It should be investigated... Actually, after some investigation I repeated the error and committed a fix. If I set (on WinXP) HTTP_PROXY to a dummy variable I get the same error: Can't use an undefined value as an ARRAY reference at C:/Perl/lib/LWP/UserAgent.pm line 787. It's EUtilities-specific as other WebAgents that have proxy settings do not have the same problem, though I haven't checked any WebAgent-based classes. I think this may also partly be an LWP bug as setting env_proxy to TRUE/FALSE doesn't seem to have an effect, but instantiating with it (env_proxy => 1) in the constructor fixes the problem. Anthony, I have committed a fix to CVS to GenericWebDBI and EUtilities. Could you try it out? -chris From cjfields at uiuc.edu Wed Dec 20 23:19:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 20 Dec 2006 17:19:59 -0600 Subject: [Bioperl-l] Problem with : EUtilities - Proxy In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine> Message-ID: <000001c7248d$5e7df450$15327e82@pyrimidine> > > First, I got a $http_proxy env. variable automatically > defined by the > > BioPerl installation (I don't define and export it in my > > .bash_profile). > > So when I'm logging in, > $http_proxy=http://ip_adress:port/ Anthony, Sorry about the prior long-winded response. I managed to reproduce the error about five minutes after I responded and managed to trace the problem back to GenericWebDBI. The issue seems to be with the LWP::UserAgent env_proxy method not setting correctly post-instantiation; setting to 0 or 1 doesn't seem to do anything. If I add it to the list of args for chained instantiation in the constructor: my $self = $class->SUPER::new(@args, env_proxy => 1); it suddenly works like a charm. Hard to know why it's being fussy... I'm going to try reproducing this on a few platforms and check to see if it has been reported as an LWP bug. I have also committed a fix to CVS if you want to test it out. Chris From jnewcomer at jhu.edu Thu Dec 21 01:56:10 2006 From: jnewcomer at jhu.edu (Joe Newcomer) Date: Wed, 20 Dec 2006 20:56:10 -0500 Subject: [Bioperl-l] a stupid question Message-ID: <002101c724a3$2ff80100$bd59dc80@aap.jhu.edu> Hello Paul Leo, I am with Johns Hopkins University Advanced Academic Programs. I am trying to contact a student named Paul Leo who has registered for Protein Bioinformatics. If this is you please email me. I would like to send you information about the spring course. Respectfully, Joe Newcomer (410) 516-5047 Online Education From anhthu.tieu at gsf.de Thu Dec 21 10:10:47 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 11:10:47 +0100 Subject: [Bioperl-l] imagemaps with heterogeneous_segments Message-ID: <458A5DA7.1010802@gsf.de> Hi, I use bioperl 1.5.2 and have been wondering whether it is possible to apply the image_and_map function with the glyph option "heterogenous_segments". Up to now I can successfully create an underlying imagemap for the entire track. However, what I want is to create an imagemap for each single segment on my track/glyph. Does anyone know who to realise this? Any help is appreciated. Thanks a lot. Anh Thu From anhthu.tieu at gsf.de Thu Dec 21 10:12:36 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 11:12:36 +0100 Subject: [Bioperl-l] imagemaps with heterogeneous_segments Message-ID: <458A5E14.8060409@gsf.de> Hi, I use bioperl 1.5.2 and have been wondering whether it is possible to apply the image_and_map function with the glyph option "heterogenous_segments". Up to now I can successfully create an underlying imagemap for the entire track. However, what I want is to create an imagemap for each single segment on my track/glyph. Does anyone know who to realise this? Any help is appreciated. Thanks a lot. Anh Thu From somil.sharma1 at gmail.com Thu Dec 21 06:22:24 2006 From: somil.sharma1 at gmail.com (Somil Sharma) Date: Thu, 21 Dec 2006 14:22:24 +0800 Subject: [Bioperl-l] problem Message-ID: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> hello *i run this program* *#!/use/bin/perl* *use Bio::DB::GenBank;* *$gb = new Bio::DB::GenBank; $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); print $seq1; * *and got this error on cmd line--* ---------- *EXCEPTION ------------- MSG: WebDBSeqI Request Error: 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error) Content-Type: text/plain Client-Date: Thu, 21 Dec 2006 06:28:33 GMT Client-Warning: Internal response* *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)* *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685 STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491 STACK Bio::DB::WebDBSeqI::get_Stream_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27 STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145 STACK toplevel C:\Perl\a2.pl:5* plz see if u can help me out. my ppm is also not able to install Bioperl so i did that also manually. waiting for ur reply From granjeau at tagc.univ-mrs.fr Thu Dec 21 11:14:25 2006 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Thu, 21 Dec 2006 12:14:25 +0100 Subject: [Bioperl-l] BioFetch: Adding databases Message-ID: <458A6C91.7090000@tagc.univ-mrs.fr> Hello! I needed to query the Unisave database at EBI. Up to date, the only way to access it is the dbfetch web service (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet defined in the BioFetch package (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote these few lines to make it work, but I don't think it fits a good programming practice. May be it makes sense to defined a method to add databases to FORMATMAP, in order to follow the dbfetch service evolutions. Cheers, --Samuel use Bio::DB::BioFetch; $Bio::DB::BioFetch::FORMATMAP{unisave} = { default => 'swiss', swissprot => 'swiss', fasta => 'fasta', namespace => 'unisave', }; my $bf = new Bio::DB::BioFetch(-db=>'unisave'); my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); print $seq->display_id(); print $seq->seq(); From cain at cshl.edu Thu Dec 21 13:56:21 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 21 Dec 2006 08:56:21 -0500 Subject: [Bioperl-l] problem In-Reply-To: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> References: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com> Message-ID: <1166709381.3739.47.camel@localhost.localdomain> Hello, It looks to me like you have a networking problem that doesn't have anything to do with BioPerl. When I run your script, I get: Bio::Seq::RichSeq=HASH(0x97013e0) Fairly quickly, too. Can you get to http://eutils.ncbi.nlm.nih.gov/ in a browser without proxy settings? As an aside, you probably don't really want the HASH stuff above, so I modified your script to look like this, with warnings and strict to make future debugging easier: #!/use/bin/perl -w use strict; use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank; my $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); print $seq1->seq; Scott On Thu, 2006-12-21 at 14:22 +0800, Somil Sharma wrote: > hello > > *i run this program* > > *#!/use/bin/perl* > > *use Bio::DB::GenBank;* > > *$gb = new Bio::DB::GenBank; > $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); > print $seq1; > * > > *and got this error on cmd line--* > > ---------- *EXCEPTION ------------- > MSG: WebDBSeqI Request Error: > 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error) > Content-Type: text/plain > Client-Date: Thu, 21 Dec 2006 06:28:33 GMT > Client-Warning: Internal response* > > *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)* > > *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685 > STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491 > STACK Bio::DB::WebDBSeqI::get_Stream_by_id > C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27 > STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145 > STACK toplevel C:\Perl\a2.pl:5* > > plz see if u can help me out. > > my ppm is also not able to install Bioperl so i did that also manually. > > waiting for ur reply > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Thu Dec 21 14:28:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 21 Dec 2006 08:28:07 -0600 Subject: [Bioperl-l] BioFetch: Adding databases In-Reply-To: <458A6C91.7090000@tagc.univ-mrs.fr> References: <458A6C91.7090000@tagc.univ-mrs.fr> Message-ID: <193C6D1C-6374-4A86-9FBD-7FA994D5FDDF@uiuc.edu> I've added this to the BioFetch FORMATMAP as 'unisave' and committed to CVS. Thanks! chris On Dec 21, 2006, at 5:14 AM, Samuel GRANJEAUD - IR/IFR137 wrote: > Hello! > > I needed to query the Unisave database at EBI. Up to date, the only > way > to access it is the dbfetch web service > (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet > defined > in the BioFetch package > (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote > these few lines to make it work, but I don't think it fits a good > programming practice. May be it makes sense to defined a method to add > databases to FORMATMAP, in order to follow the dbfetch service > evolutions. > > Cheers, > --Samuel > > use Bio::DB::BioFetch; > $Bio::DB::BioFetch::FORMATMAP{unisave} = { > default => 'swiss', > swissprot => 'swiss', > fasta => 'fasta', > namespace => 'unisave', > }; > my $bf = new Bio::DB::BioFetch(-db=>'unisave'); > my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); > > print $seq->display_id(); > print $seq->seq(); > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From anhthu.tieu at gsf.de Thu Dec 21 14:31:45 2006 From: anhthu.tieu at gsf.de (Anh-Thu Tieu) Date: Thu, 21 Dec 2006 15:31:45 +0100 Subject: [Bioperl-l] multiple glyph elements in one track Message-ID: <458A9AD1.50907@gsf.de> Hello, I use bioperl 1.5.2. Does anyone know how I could create two seperate glyph elements on the same track with the Bio::Graphics::Panel module? My aim is to have two (e.g. two different) clickable imagemap elements on the same track. Until now I can merely create two glyph elements (transcript2 or generic options) per track with only one imagemap element (e.g. the same imagemap element is used for the entire track as the entire (=both elements) glyph's coordinates are returned to the image_and_map function as one set of coordinate). Thank you for your help. Best regards, Anh Thu From cain at cshl.edu Thu Dec 21 14:47:32 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 21 Dec 2006 09:47:32 -0500 Subject: [Bioperl-l] multiple glyph elements in one track In-Reply-To: <458A9AD1.50907@gsf.de> References: <458A9AD1.50907@gsf.de> Message-ID: <1166712453.3739.53.camel@localhost.localdomain> Hello Anh Thu, You can provide a callback for the glyph argument that returns different glyphs depending on what you want to do (ie, how you've coded your callback). This example is from the perldoc for Bio::Graphics::Panel: $panel->add_track(\@exons, -glyph => sub { my $feature = shift; $feature->source_tag eq ?curated? ? ?ellipse? : ?generic?; } ); Scott On Thu, 2006-12-21 at 15:31 +0100, Anh-Thu Tieu wrote: > Hello, > > I use bioperl 1.5.2. Does anyone know how I could create two seperate > glyph elements on the same track with the Bio::Graphics::Panel module? > My aim is to have two (e.g. two different) clickable imagemap elements > on the same track. Until now I can merely create two glyph elements > (transcript2 or generic options) per track with only one imagemap > element (e.g. the same imagemap element is used for the entire track as > the entire (=both elements) glyph's coordinates are returned to the > image_and_map function as one set of coordinate). > > Thank you for your help. > > Best regards, > > Anh Thu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cain.cshl at gmail.com Thu Dec 21 20:03:48 2006 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 21 Dec 2006 15:03:48 -0500 Subject: [Bioperl-l] problems installing bioperl In-Reply-To: <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz> References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz> <45880167.9010605@sendu.me.uk> <1166542310.6981.119.camel@localhost.localdomain> <1166604008.4588f6e87cccc@www.studentmail.otago.ac.nz> <1166621113.3739.11.camel@localhost.localdomain> <1166642653.45898dddbd8cf@www.studentmail.otago.ac.nz> <1166643051.3739.28.camel@localhost.localdomain> <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz> Message-ID: <1166731428.3739.71.camel@localhost.localdomain> Hi Stephan, About your bioperl mail: did you cancel it, or did it just disappear? If the latter, I might have accidentally deleted it, sorry :-/ So 'GBrowse is running' means that you can see the sample yeast chr1 database, browse around, etc, right? I still don't know what is up with the warning but my guess is that everything is working there. As for your question about writing a callback, the reason it's not working is that the attributes method returns a list (typically but not always with only one element), so what you are really doing in your test is this "number of elements in the list > 1200", which is almost always going to be false. You should change it to this: my $feature = shift; my ($score) = $feature->attributes('score'); if ($score > 1200) { ...etc... Finally, if you really want to test that you are using the correct bioperl, you can put this simple cgi in your cgi-bin directory as test_biographics.pl, set it as world executable and go to http://localhost/cgi-bin/test_biographics.pl (and, yes, I use strict and warnings even when the script is only 10 lines long :-) : #!/usr/bin/perl use strict; use warnings; use Bio::Graphics::Panel; use CGI qw/:standard/; print header(), start_html, p("Bio::Graphics::Panel api_version is ".Bio::Graphics::Panel->api_version), p("It should be 1.654 for BioPerl 1.5.2"), end_html; Scott On Fri, 2006-12-22 at 08:27 +1300, Stephan Roessner wrote: > Hi Scott, > > responded to group but did get through. > So I reply back to you. > > I installed Class-Base-0.03 using CPAN. > > Reinstalling GBrowse gives me still a warning like: > Warning: prerequisite Bio::Perl 1.52 not found. We have 1.0050021. > Writing Makefile for Bio::Graphocs::Browser::CAlign > Writing Makefile for Generic-Genome-Browser. > > GBrowse is running but I cannot access attributes and/or the score column > of .gff files. Is this related to the warning? > > To get an attribute I use > > my $feature = shift; > if ($feature->attributes('score') > 1200) { > return 'blue'; > } else { > return 'pink'; > } > But I retrieve not data using $feature-> > > Can I somehaow verify what bioperl version GBrowse is using? > > Stephan, > > > > Quoting Scott Cain : > > > Stephan, > > > > Yes, it is in cpan: > > > > http://search.cpan.org/~abw/Class-Base-0.03/lib/Class/Base.pm > > > > The cpan shell should be able to install it. > > > > Whether or not that works, please respond to the mailing list so that > > the rest of the conversation can be archived. > > > > Scott > > > > > > On Thu, 2006-12-21 at 08:24 +1300, Stephan Roessner wrote: > > > Hi Scott, > > > > > > No I didn't. > > > I had a look and couldn't find it. > > > It is not part of CPAN? > > > > > > Stephan > > > > > > > > > Quoting Scott Cain : > > > > > > > Stephan, > > > > > > > > Did you install Class::Base? It was inadvertantly left out the > > > > install > > > > document, but is required. > > > > > > > > Scott > > > > > > > > > > > > On Wed, 2006-12-20 at 21:40 +1300, Stephan Roessner wrote: > > > > > Hi all, > > > > > > > > > > I did sudo ./Build install --uninst 1 and got the error > > > > > * ERROR: Confiduration was initially created with MOdule::Build > > > > version > > > > > '0.2805', but we are now using '0.2806'. ... > > > > > > > > > > So I ran perl Build.PL and got the message > > > > > Creating new 'Buid' script for 'bioperl' verion '1.0050021'. > > > > > > > > > > I did run sudo ./Build install --uninst 1 again. > > > > > Seems to be fine with no error messages. > > > > > > > > > > When I run perl Makefile.PL for GBrowse 1.66-RC2 it results in > > > > > > > > > > Warning: prerequisite Bio::Perl 1.52 not found. We have > > 1.0050021. > > > > > Warning: prerequisite Class::Base 0 not found. > > > > > Writing Makefile for Bio::Graphocs::Browser::CAlign > > > > > Writing Makefile for Generic-Genome-Browser > > > > > > > > > > GBrowse is running but I have really troubles with aggregators > > trying > > > > to > > > > > use xyplot. It does not plot anything. So I thought the bioperl > > could > > > > be > > > > > the problem. > > > > > > > > > > Stephan > > > > > > > > > > > > > > > > > > > > Quoting Scott Cain : > > > > > > > > > > > I really don't think the BioPerl version detection is wrong. > > I > > > > > > actually > > > > > > don't check Bio::Root::Version::VERSION in Makefile.PL, I > > check > > > > > > Bio::Graphics::Panel->api_version. When it doesn't find the > > > > correct > > > > > > api_version, it gives a warning the BioPerl 1.5.2 is not > > installed. > > > > I > > > > > > have seen this happen when more than one BioPerl instance is > > > > installed > > > > > > and `perl Makefile.PL` finds the wrong one first. My > > suggestion is > > > > to > > > > > > try reinstalling BioPerl and providing the --uninst 1 argument > > to > > > > > > remove > > > > > > older versions of BioPerl: > > > > > > > > > > > > sudo ./Build install --uninst 1 > > > > > > > > > > > > Scott > > > > > > > > > > > > > > > > > > On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote: > > > > > > > Stephan Roessner wrote: > > > > > > > > Dear support team, > > > > > > > > > > > > > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be > > able > > > > to > > > > > > use > > > > > > > > gbrowse. > > > > > > > > The installation seems to work (except of the test > > failures) > > > > but > > > > > > the > > > > > > > > gbrowse installation tells me that BIO::pERL 1.0050021 is > > > > > > installed, but > > > > > > > > of course it requires 1.52. > > > > > > > > > > > > > > > > Is there a chance to find out what went wrong? > > > > > > > > > > > > > > Nothing went wrong with the Bioperl installation (well, > > expect > > > > there > > > > > > > shouldn't have been any test failures - can you post those > > > > please?). > > > > > > > gbrowse simply defined its Bioperl requirement incorrectly. > > If > > > > you > > > > > > tell > > > > > > > me exactly where you downloaded gbrowse from and how you > > went > > > > about > > > > > > > installing it, and provide the exact, complete error message > > you > > > > got > > > > > > > from it, I might be able help the authors fix the problem. > > > > > > > > > > > > > > Or I'm pretty sure they can figure it our for themselves :) > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > Scott Cain, Ph. D. > > > > > > cain at cshl.edu > > > > > > GMOD Coordinator (http://www.gmod.org/) > > > > > > 216-392-3087 > > > > > > Cold Spring Harbor Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > ------------------------------------------------------------------------ > > > > Scott Cain, Ph. D. > > > > cain.cshl at gmail.com > > > > GMOD Coordinator (http://www.gmod.org/) > > > > 216-392-3087 > > > > Cold Spring Harbor Laboratory > > > > > > > > > > > > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. > > cain.cshl at gmail.com > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rvosa at sfu.ca Sat Dec 23 22:17:37 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Sat, 23 Dec 2006 14:17:37 -0800 Subject: [Bioperl-l] [Summary] Re: suggestions for suitable 'taxon' object In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca> Message-ID: <458DAB01.6080200@sfu.ca> The replies I've received so far indicate I should look into Bio::Taxon. I will probably come back with further questions/discussions as to how to link and cross reference taxa, sequences and nodes, but for now I should first look at the Bio::Taxon api (and unpack my other Christmas gifts). Thank you for all comments and suggestions. Happy holidays! Rutger Rutger Vos wrote: > Hi all, > > I am looking for a bioperl object that can be abused to function as a > suitable 'taxon' object, where I mean 'taxon' as understood by the NEXUS > file format (i.e. not strictly an entity from a taxonomy, but more loosely > an OTU). > > The object would primarily function as a way to relate nodes in trees to > sequences in an alignment (a foreign key that both nodes and sequences refer > to), and secondarily as the keeper of the canonical name of the OTU, such > that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo > sapiens (constrained monophyly)' can still be understood to refer to the > same thing - the OTU 'Homo sapiens sapiens' (for example). > > I was thinking that a (possibly expanded) Bio::Species might work if there > was some sensible way of appending references to node and sequence objects > to it (or otherwise associate them with each other), but I am writing *to > solicit any and all suggestions*. I am looking for something similar to > Bio::Phylo::Taxa::Taxon. > > Any and all comments and suggestions greatly appreciated! > > Best wishes, > > Rutger Vos > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger A. Vos Postdoctoral research fellow University of British Columbia Personal site: http://www.sfu.ca/~rvosa CIPRES: http://www.phylo.org Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From paul.boutros at utoronto.ca Sun Dec 24 03:36:59 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Sat, 23 Dec 2006 22:36:59 -0500 Subject: [Bioperl-l] Bio::Graphics::Glyph::dna Message-ID: <20061223223659.7sgfofa44mw4okks@webmail.utoronto.ca> Hi, I've been trying to get the dna glyph working and have had some problems. I'm using a fasta file, and am having some problems. This is ActiveState perl 5.8.8 (build 819) and BioPerl 1.5.2 on WinXP. I'm starting with a FASTA file, so I've tried: $panel->add_track( $seq, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); where $seq is a Bio::Seq object and I've tried it using a GFF $segment: my $db = Bio::DB::GFF->new( -adaptor=> 'berkeleydb', -create => 1, -dsn => 'temp' ); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary)_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); From paul.boutros at utoronto.ca Sun Dec 24 03:46:27 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Sat, 23 Dec 2006 22:46:27 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? Message-ID: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> Hello, I'm trying to get the dna glyph of Bio::Graphics to work and am having some problems. I'm starting with a fasta file, and I am running perl 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 If I try simply using a Bio::Seq object like this: $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: Can't locate object method "start" via package "Bio::Seq" at C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. And if I try creating a Bio::DB::GFFSegment object like this: my $db = Bio::DB::GFF->new( -adaptor => 'berkeleydb', -create => 1, -dsn => '/usr/local/share/gff/dmel' ); $db->initialize(1); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not implemented b y package Bio::DB::GFF::Segment. This is not your fault - author of Bio::DB::GFF::Segment should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::RootI::throw_not_implemented C:/Perl/site/lib/Bio/Root/RootI.pm:522 STACK: Bio::FeatureHolderI::get_SeqFeatures C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 STACK: Bio::Graphics::Glyph::_subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 STACK: Bio::Graphics::Glyph::subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Panel::_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 STACK: Bio::Graphics::Panel::_do_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 STACK: Bio::Graphics::Panel::add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 STACK: create_figure.pl:147 ---------------------------------------------------------------- I'm really unsure what to try next, any suggestions much appreciated! Paul From lstein at cshl.edu Sun Dec 24 17:23:18 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Sun, 24 Dec 2006 12:23:18 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? In-Reply-To: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> References: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca> Message-ID: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com> Hi, You need to use either a Bio::SeqFeature::Generic object (with an attached Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to create Bio::DB::GFF::Segment objects directly. e.g. my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000); my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800); $feature->attach_seq($dna); Best, Lincoln On 12/23/06, Paul Boutros wrote: > > Hello, > > I'm trying to get the dna glyph of Bio::Graphics to work and am having > some problems. I'm starting with a fasta file, and I am running perl > 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 > > If I try simply using a Bio::Seq object like this: > $panel->add_track( > $segment, > -glyph => 'dna', > -do_gc => 'true', > -gc_window => 'auto' > ); > > I get the error: > Can't locate object method "start" via package "Bio::Seq" at > C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. > > > And if I try creating a Bio::DB::GFFSegment object like this: > my $db = Bio::DB::GFF->new( > -adaptor => 'berkeleydb', > -create => 1, > -dsn => '/usr/local/share/gff/dmel' > ); > > $db->initialize(1); > > $db->load_sequence_string( > $seq->primary_id(), > $seq->seq() > ); > > my $segment = Bio::DB::GFF::Segment->new( > $db, > $seq->primary_id(), > $seq->primary_id(), > 1, > $seq->length() > ); > > $panel->add_track( > $segment, > -glyph => 'dna', > -do_gc => 'true', > -gc_window => 'auto' > ); > > I get the error: > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not > implemented b > y package Bio::DB::GFF::Segment. > This is not your fault - author of Bio::DB::GFF::Segment should be blamed! > > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::RootI::throw_not_implemented > C:/Perl/site/lib/Bio/Root/RootI.pm:522 > STACK: Bio::FeatureHolderI::get_SeqFeatures > C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 > STACK: Bio::Graphics::Glyph::_subfeat > C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 > STACK: Bio::Graphics::Glyph::subfeat > C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 > STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 > STACK: Bio::Graphics::Glyph::Factory::make_glyph > C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 > STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 > STACK: Bio::Graphics::Glyph::Factory::make_glyph > C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 > STACK: Bio::Graphics::Panel::_add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 > STACK: Bio::Graphics::Panel::_do_add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 > STACK: Bio::Graphics::Panel::add_track > C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 > STACK: create_figure.pl:147 > ---------------------------------------------------------------- > > I'm really unsure what to try next, any suggestions much appreciated! > Paul > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From tgenahmet at gmail.com Wed Dec 27 21:38:43 2006 From: tgenahmet at gmail.com (Ahmet Kurdoglu) Date: Wed, 27 Dec 2006 14:38:43 -0700 Subject: [Bioperl-l] get mRNA details for a gene Message-ID: <9d8d0e2a0612271338t7cb15a63v5a08f624888b3f7b@mail.gmail.com> Hi, This is my first message to the list. I hope I get it right. Here is what I'm trying to accomplish: Get the mRNA details for a given gene (ex. DNASE2B) from its GenBank file. Using the web-interface I can search with this query: DNASE2B [sym] AND homo sapiens [ORGN] (returns only one result if you search 'gene' database) and get the GenBank file by clicking on NC_000001.9 and I can see the details for its two mRNAs. (I eventually need to get exon locations for both of its transcripts) However trying to do this in Perl has proved to be very difficult for me. I've tried various methods, including get_Seq_by_id, get_Seq_by_gi, and get_Stream_by_query. Before I explain in detail what I did I'd like to hear your ideas on how to accomplish this. Thank you. From sdavis2 at mail.nih.gov Thu Dec 28 21:57:03 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 28 Dec 2006 16:57:03 -0500 Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers In-Reply-To: References: Message-ID: <45943DAF.70100@mail.nih.gov> Michael Muratet US-Huntsville wrote: > Sean > > Thanks. I did consider the bioconductor package and downloaded your > write-up after it was recommended by the GEO folks. I've looked at R a > few times, but I never got proficient at it. I'm a lot better with perl. > > I've been looking at MINiML, too. It looked like it might be easier to > parse the SOFT file since the data is in-line with the attributes and > I'd have to use a SAX parser (not enough memory for DOM) for MINiML. > > NCBI must have parsers to get the data into their databases. Do you know > what they use? > Michael, You might want to look more specifically at the MINiML format specs. The data tables are stored as separate tab-delimited files with an external reference in the XML, so DOM parsing is possible with just a few kB of memory. Of course, to read in all of the data into memory at once will take a large amount of memory for some datasets. If you are going to load into a database, I would suggest reading the MINiML using DOM and then stepping through the data files one at a time, loading as you go. As for their parsers, I'm not sure what language they use, but writing a parser for either SOFT or MINiML isn't at all difficult. GEO uses a very simplified MAGE schema. As for R vs. perl, if you are planning on doing analyses of microarray data, I would highly suggest looking again at the R/bioconductor project. It will save you reinventing many wheels, such as getting annotation like gene ontology and pathways, doing stats, plotting, maintaining MIAME-compliant data structures, converting from multiple microarray formats, etc. Sean From allenday at ucla.edu Thu Dec 28 23:21:07 2006 From: allenday at ucla.edu (Allen Day) Date: Thu, 28 Dec 2006 15:21:07 -0800 Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers In-Reply-To: <45943DAF.70100@mail.nih.gov> References: <45943DAF.70100@mail.nih.gov> Message-ID: <5c24dcc30612281521o58b9f256sfa36c403f4c30bfa@mail.gmail.com> > As for R vs. perl, if you are planning on doing analyses of microarray > data, I would highly suggest looking again at the R/bioconductor > project. It will save you reinventing many wheels, such as getting > annotation like gene ontology and pathways, doing stats, plotting, > maintaining MIAME-compliant data structures, converting from multiple > microarray formats, etc. I'll second this statement WRT the data analysis. I'm doing all my analysis in R, Perl is just not good at dealing with large matrices or plotting. OTOH, I have also found that R is particularly weak when it comes to pipelining data and system interfacing. If your goal is to do ETL to a local database you're better off using Perl. I've found they're both about equally clunky for dealing with the experimental metadata, with a slight preference for Perl. That's more a reflection of the baroque MAGE model though than the programming languages themselves. -Allen > > Sean > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Paul.Boutros at utoronto.ca Sat Dec 30 07:43:32 2006 From: Paul.Boutros at utoronto.ca (Paul Boutros) Date: Sat, 30 Dec 2006 02:43:32 -0500 Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? In-Reply-To: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com> Message-ID: <000c01c72be6$34d07e60$ec02a8c0@main> Hi Lincoln, Thanks, that worked like a charm! Can I suggest adding the example/explanation you gave me to the pod for Bio::Graphics::Glyph::dna? Here's a patch against the 1.5.2 version of dna.pm to do that. Paul 266c266,274 < in response to the dna() method. --- > in response to the dna() method. For example, you can use a > Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq > like this: > my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 ); > my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end => 800 ); > $feature->attach_seq($dna); > $panel->add_track( $feature, -glyph => 'dna' ); > > A Bio::Graphics::Feature object may also be used. _____ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Sunday, December 24, 2006 12:23 PM To: Paul.Boutros at utoronto.ca Cc: BioPerl Mailing List Subject: Re: [Bioperl-l] How to use Bio::Graphics::Glyph::dna? Hi, You need to use either a Bio::SeqFeature::Generic object (with an attached Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to create Bio::DB::GFF::Segment objects directly. e.g. my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000); my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800); $feature->attach_seq($dna); Best, Lincoln On 12/23/06, Paul Boutros wrote: Hello, I'm trying to get the dna glyph of Bio::Graphics to work and am having some problems. I'm starting with a fasta file, and I am running perl 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 If I try simply using a Bio::Seq object like this: $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: Can't locate object method "start" via package "Bio::Seq" at C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164. And if I try creating a Bio::DB::GFFSegment object like this: my $db = Bio::DB::GFF->new( -adaptor => 'berkeleydb', -create => 1, -dsn => '/usr/local/share/gff/dmel' ); $db->initialize(1); $db->load_sequence_string( $seq->primary_id(), $seq->seq() ); my $segment = Bio::DB::GFF::Segment->new( $db, $seq->primary_id(), $seq->primary_id(), 1, $seq->length() ); $panel->add_track( $segment, -glyph => 'dna', -do_gc => 'true', -gc_window => 'auto' ); I get the error: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not implemented b y package Bio::DB::GFF::Segment. This is not your fault - author of Bio::DB::GFF::Segment should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::RootI::throw_not_implemented C:/Perl/site/lib/Bio/Root/RootI.pm:522 STACK: Bio::FeatureHolderI::get_SeqFeatures C:/Perl/site/lib/Bio/FeatureHolderI.pm:101 STACK: Bio::Graphics::Glyph::_subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 STACK: Bio::Graphics::Glyph::subfeat C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81 STACK: Bio::Graphics::Glyph::Factory::make_glyph C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 STACK: Bio::Graphics::Panel::_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:388 STACK: Bio::Graphics::Panel::_do_add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:360 STACK: Bio::Graphics::Panel::add_track C:/Perl/site/lib/Bio/Graphics/Panel.pm:288 STACK: create_figure.pl:147 ---------------------------------------------------------------- I'm really unsure what to try next, any suggestions much appreciated! Paul _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From er at xs4all.nl Sun Dec 31 00:05:16 2006 From: er at xs4all.nl (Erik) Date: Sun, 31 Dec 2006 01:05:16 +0100 (CET) Subject: [Bioperl-l] acquiring a local refseq + index Message-ID: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Hi all, I downloaded the refseq files (.gbff) and want to index the lot with Bio::DB::Flat. It turns out that there are many cases where the SOURCE and ORGANISM lines are messed up, sometimes to a degree where the indexing fails on a Bio::SeqIO::genbank error. I'd like to change Bio::SeqIO::genbank to let this parsing go at least so far as to make the indexing of the refseq files possible, and hopefully improving the taxonomic output ($seq->species->binomial is often mutilated at the moment). Is it still worthwhile to change parsing modules like Bio::SeqIO::genbank? Is anyone already working on a rewrite? Because if this is the case I may be better off writing my own indexing scheme? Below is (outline of) my indexing program, which uses Bio::DB::Flat::DBD. If anyone knows of a better way to get a locally searchable refseq flat file index, I would be very interested. Thanks for your help, Erikjan ------------- use Bio::DB::Flat; my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; my $db=Bio::DB::Flat->new( -directory => $refseq_dir, -dbname => 'refseq', -format => 'genbank', -index => 'bdb', -write_flag => 1, ); my @files = getfiles($refseq_dir); for my $f (@files) { db->build_index($f); } From hlapp at gmx.net Sun Dec 31 01:48:33 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 30 Dec 2006 20:48:33 -0500 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Message-ID: Can you send examples and the resulting error messages? Also, I'm assuming you running the 1.5.2 release of Bioperl; if not that's what I would try first. -hilmar On Dec 30, 2006, at 7:05 PM, Erik wrote: > Hi all, > > I downloaded the refseq files (.gbff) and want to index the lot with > Bio::DB::Flat. > > It turns out that there are many cases where the SOURCE and > ORGANISM lines > are messed up, sometimes to a degree where the indexing fails on a > Bio::SeqIO::genbank error. > > I'd like to change Bio::SeqIO::genbank to let this parsing go at > least so > far as to make the indexing of the refseq files possible, and > hopefully > improving the taxonomic output ($seq->species->binomial is often > mutilated > at the moment). > > Is it still worthwhile to change parsing modules like > Bio::SeqIO::genbank? > Is anyone already working on a rewrite? Because if this is the > case I may > be better off writing my own indexing scheme? > > Below is (outline of) my indexing program, which uses > Bio::DB::Flat::DBD. > If anyone knows of a better way to get a locally searchable refseq > flat > file index, I would be very interested. > > Thanks for your help, > > Erikjan > > > ------------- > use Bio::DB::Flat; > > my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; > my $db=Bio::DB::Flat->new( > -directory => $refseq_dir, > -dbname => 'refseq', > -format => 'genbank', > -index => 'bdb', > -write_flag => 1, > ); > my @files = getfiles($refseq_dir); > for my $f (@files) { > db->build_index($f); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sun Dec 31 02:33:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 30 Dec 2006 20:33:23 -0600 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> Message-ID: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> Agree with Hilmar, in that we need examples. If you are referring to your submitted bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2167 we could add this in as long as it passes (I'll try giving it a workout with my local bacterial seqs tonight or tomorrow). However, in the not-too-distant future your patch would likely be rendered obsolete, as any parsing in Bio::SeqIO modules pertaining to Bio::Species-related matters will be deprecated in favor of simple parsing (more foolproof, less uncertainty) and Bio::Taxon (which has optional db lookups using NCBI Taxonomy). Bio::Species and anything related to it are considered marked for deprecation. Fair warning... chris On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote: > Can you send examples and the resulting error messages? Also, I'm > assuming you running the 1.5.2 release of Bioperl; if not that's what > I would try first. > > -hilmar > > On Dec 30, 2006, at 7:05 PM, Erik wrote: > >> Hi all, >> >> I downloaded the refseq files (.gbff) and want to index the lot with >> Bio::DB::Flat. >> >> It turns out that there are many cases where the SOURCE and >> ORGANISM lines >> are messed up, sometimes to a degree where the indexing fails on a >> Bio::SeqIO::genbank error. >> >> I'd like to change Bio::SeqIO::genbank to let this parsing go at >> least so >> far as to make the indexing of the refseq files possible, and >> hopefully >> improving the taxonomic output ($seq->species->binomial is often >> mutilated >> at the moment). >> >> Is it still worthwhile to change parsing modules like >> Bio::SeqIO::genbank? >> Is anyone already working on a rewrite? Because if this is the >> case I may >> be better off writing my own indexing scheme? >> >> Below is (outline of) my indexing program, which uses >> Bio::DB::Flat::DBD. >> If anyone knows of a better way to get a locally searchable refseq >> flat >> file index, I would be very interested. >> >> Thanks for your help, >> >> Erikjan >> >> >> ------------- >> use Bio::DB::Flat; >> >> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; >> my $db=Bio::DB::Flat->new( >> -directory => $refseq_dir, >> -dbname => 'refseq', >> -format => 'genbank', >> -index => 'bdb', >> -write_flag => 1, >> ); >> my @files = getfiles($refseq_dir); >> for my $f (@files) { >> db->build_index($f); >> } >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Dec 31 19:36:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 31 Dec 2006 13:36:47 -0600 Subject: [Bioperl-l] acquiring a local refseq + index In-Reply-To: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl> <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu> Message-ID: <37FB5BDF-25A9-44F0-9E82-964684A73A58@uiuc.edu> As a followup, I have committed the fix Erik had in Bugzilla. I don't know if this helps with the below issue Erik describes (they sound unrelated). chris On Dec 30, 2006, at 8:33 PM, Chris Fields wrote: > Agree with Hilmar, in that we need examples. If you are referring to > your submitted bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2167 > > we could add this in as long as it passes (I'll try giving it a > workout with my local bacterial seqs tonight or tomorrow). However, > in the not-too-distant future your patch would likely be rendered > obsolete, as any parsing in Bio::SeqIO modules pertaining to > Bio::Species-related matters will be deprecated in favor of simple > parsing (more foolproof, less uncertainty) and Bio::Taxon (which has > optional db lookups using NCBI Taxonomy). Bio::Species and anything > related to it are considered marked for deprecation. Fair warning... > > chris > > On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote: > >> Can you send examples and the resulting error messages? Also, I'm >> assuming you running the 1.5.2 release of Bioperl; if not that's what >> I would try first. >> >> -hilmar >> >> On Dec 30, 2006, at 7:05 PM, Erik wrote: >> >>> Hi all, >>> >>> I downloaded the refseq files (.gbff) and want to index the lot with >>> Bio::DB::Flat. >>> >>> It turns out that there are many cases where the SOURCE and >>> ORGANISM lines >>> are messed up, sometimes to a degree where the indexing fails on a >>> Bio::SeqIO::genbank error. >>> >>> I'd like to change Bio::SeqIO::genbank to let this parsing go at >>> least so >>> far as to make the indexing of the refseq files possible, and >>> hopefully >>> improving the taxonomic output ($seq->species->binomial is often >>> mutilated >>> at the moment). >>> >>> Is it still worthwhile to change parsing modules like >>> Bio::SeqIO::genbank? >>> Is anyone already working on a rewrite? Because if this is the >>> case I may >>> be better off writing my own indexing scheme? >>> >>> Below is (outline of) my indexing program, which uses >>> Bio::DB::Flat::DBD. >>> If anyone knows of a better way to get a locally searchable refseq >>> flat >>> file index, I would be very interested. >>> >>> Thanks for your help, >>> >>> Erikjan >>> >>> >>> ------------- >>> use Bio::DB::Flat; >>> >>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete'; >>> my $db=Bio::DB::Flat->new( >>> -directory => $refseq_dir, >>> -dbname => 'refseq', >>> -format => 'genbank', >>> -index => 'bdb', >>> -write_flag => 1, >>> ); >>> my @files = getfiles($refseq_dir); >>> for my $f (@files) { >>> db->build_index($f); >>> } >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign