From alan.bridge at isb-sib.ch Sun Dec 2 13:29:48 2007 From: alan.bridge at isb-sib.ch (Alan Bridge) Date: Sun, 02 Dec 2007 19:29:48 +0100 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast Message-ID: <4752F99C.9050504@isb-sib.ch> Hello, I was just wondering if, when performing a RemoteBlast, it would be possible to specify the entire UniProt database (i.e. Swiss-Prot + TrEMBL), or even just TrEMBL. It seems that currently, you can only specify Swiss-Prot (the annotated portion of UniProt, which is much smaller than its automatically annotated counterpart, TrEMBL). Any hints on how to expand the search space to include TrEMBL would be really appreciated. Regards, Alan Bridge my $prog = 'blastp'; my $db = 'swissprot'; # use TrEMBL ? my $e_val= '1e-10'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); -- Alan Bridge PhD Swiss-Prot annotator Swiss Institute of Bioinformatics (SIB) 1, rue Michel Servet CH-1211 Geneva 4 Switzerland Tel: (+41 22) 379 58 90 Fax: (+41 22) 379 58 58 http://www.expasy.org/ From avilella at gmail.com Mon Dec 3 06:39:59 2007 From: avilella at gmail.com (Albert Vilella) Date: Mon, 3 Dec 2007 11:39:59 +0000 Subject: [Bioperl-l] Query about SLAC.pm module In-Reply-To: References: Message-ID: <358f4d650712030339w2f3de057ge5614e60a3f6658c@mail.gmail.com> [CCing to the bioperl ml] Sorry, there were some bits left in the pod header referring to PAML objects that aren't quite true. I've updated now the PODs. The Hyphy executions return hashes: If you run the SLAC test in t/Hyphy.t you will se that the $results are something like: DB<3> x 2 $results 0 HASH(0x8df3110) 'E[NS Sites]' => ARRAY(0x8e6cff4) 'E[S Sites]' => ARRAY(0x8e6ceb0) 'Observed NS Changes' => ARRAY(0x8e7b380) 'Observed S Changes' => ARRAY(0x8e7b344) 'Observed S. Prop.' => ARRAY(0x8e6d018) 'P{S geq. observed}' => ARRAY(0x8e6d360) 'P{S leq. observed}' => ARRAY(0x8e6d33c) 'P{S}' => ARRAY(0x8e6d03c) 'Scaled dN-dS' => ARRAY(0x8e6d384) 'dN' => ARRAY(0x8e6d084) 'dN-dS' => ARRAY(0x8e6d0a8) 'dS' => ARRAY(0x8e6d060) DB<4> x $rc which correspond to the csv file that hyphy produces. Cheers, Albert. On Dec 3, 2007 10:04 AM, Johan Nilsson wrote: > > Dear Dr. Vilella, > > Please allow me to introduce myself. My name is Johan Nilsson and I am a > postdoctoral researcher in bioinformatics. > > I was planning to perform a large-scale analysis for positively selected > protein coding genes using any appropriate method from the Hyphy package, > and I thought your bioperl wrappers 'SLAC.pm', 'FEL.pm' etc. should be very > useful for this. > > IF I interpreted the documents of e.g. the SLAC module correctly, running > $slac->run($aln,$tree) will return a > Bio::Tools::Phylo::PAML object. However, when I try to retrieve any results > from the obtained hashref (running my script on the test files provided > with bioperl ...t/hyphy1.tree and ...t/hyphy1.fasta), the script complains > that it is not blessed (e.g. 'Can't call method "get_seqs" on unblessed > reference'). > > I am fairly new to bioperl, so please appologise if this question was a > stupid one :) > > Thanks in advance! > > Yours Sincerely > /Johan > > -- > Johan Nilsson, Ph.D. > School of Life Sciences > S?dert?rns University College > S-141 89 Huddinge, Sweden > E-mail: johan.nilsson at sh.se > Phone: +46 8 608 47 05, +46 70 456 10 51 > > From cjfields at uiuc.edu Mon Dec 3 09:04:06 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 3 Dec 2007 08:04:06 -0600 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast In-Reply-To: <4752F99C.9050504@isb-sib.ch> References: <4752F99C.9050504@isb-sib.ch> Message-ID: You are limited to the databases hosted on the NCBI server, so it's really up to them; RemoteBlast is an interface to NCBI's WebBlast using URLAPI. A list of current databases can be found here: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html chris On Dec 2, 2007, at 12:29 PM, Alan Bridge wrote: > Hello, > > I was just wondering if, when performing a RemoteBlast, it would be > possible to specify the entire UniProt database (i.e. Swiss-Prot + > TrEMBL), or even just TrEMBL. > > It seems that currently, you can only specify Swiss-Prot (the > annotated > portion of UniProt, which is much smaller than its automatically > annotated counterpart, TrEMBL). Any hints on how to expand the search > space to include TrEMBL would be really appreciated. > > Regards, Alan Bridge > > my $prog = 'blastp'; > my $db = 'swissprot'; # use TrEMBL ? > my $e_val= '1e-10'; > > my @params = ( '-prog' => $prog, '-data' => $db, '-expect' > => $e_val, '-readmethod' => 'SearchIO' ); > > -- > Alan Bridge PhD > Swiss-Prot annotator > Swiss Institute of Bioinformatics (SIB) > 1, rue Michel Servet > CH-1211 Geneva 4 > Switzerland > > Tel: (+41 22) 379 58 90 > Fax: (+41 22) 379 58 58 > > http://www.expasy.org/ From bioperl at boekhoff.info Mon Dec 3 14:14:24 2007 From: bioperl at boekhoff.info (Sven Boekhoff) Date: Mon, 03 Dec 2007 20:14:24 +0100 Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST reload Message-ID: <47545590.1000703@boekhoff.info> HI! I just started working with Perl and BioPerl. I'm quite impressed what can be easily done with this module. Today I found that my second CPU ist not used, but the first one run's at 100%. I tried to include the "-a"-parameter, but I was not successful: my @params = ( -database => 'my_db', -a => '2', -outfile => 'blast1.out' ); How do I have to use it? Second question: In my perlscript I start BLAST-searches in a loop. Everytime BLAST has finished its search, the memory is cleared and BLAST is started again. I think most of the time is used to reload the database. Is it somehow possible to keep the database loaded (e.g. by starting a second search) or is BLAST reloaded anyway? Thanks for your help! Sven www.boekhoff.info From bix at sendu.me.uk Mon Dec 3 19:05:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 04 Dec 2007 00:05:23 +0000 Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST reload In-Reply-To: <47545590.1000703@boekhoff.info> References: <47545590.1000703@boekhoff.info> Message-ID: <475499C3.20801@sendu.me.uk> Sven Boekhoff wrote: > HI! > I just started working with Perl and BioPerl. I'm quite impressed what > can be easily done with this module. Today I found that my second CPU > ist not used, but the first one run's at 100%. I tried to include the > "-a"-parameter, but I was not successful: > > my @params = ( > -database => 'my_db', > -a => '2', > -outfile => 'blast1.out' > ); > > How do I have to use it? This should work in the CVS version of StandAloneBlast. In other versions, perhaps try using $object->a(2); > Second question: In my perlscript I start BLAST-searches in a loop. > Everytime BLAST has finished its search, the memory is cleared and BLAST > is started again. I think most of the time is used to reload the > database. Is it somehow possible to keep the database loaded (e.g. by > starting a second search) or is BLAST reloaded anyway? I hope someone will correct me for being wrong, but I think you'd have to that with a 2-way pipe. StandAloneBlast only uses output to a file and input from that file, finishing with the executable inbetween. I've thought about improving it with a 2-way pipe, but never got around to it, being apprehensive about stability on all platforms. The more obvious solution, which may be possible depending on exactly what you're doing, is to avoid the loop and just supply Blast all your input in one go. From Russell.Smithies at agresearch.co.nz Mon Dec 3 19:49:21 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 4 Dec 2007 13:49:21 +1300 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: <475499C3.20801@sendu.me.uk> References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> Message-ID: Hi all, It' trying to read .ace files but keep getting an error that I don't know the cause of. Really basic example code: #!/usr/local/bin/perl -w use lib "/data/home/smithiesr/bioperl-live"; use Bio::Assembly::IO; use Data::Dumper; $ace = "CLP0001001240-cE15_20030319.ace"; $io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace"); $assembly = $io->next_assembly; foreach $contig ($assembly->all_contigs) { print Dumper $contig; } Gives this error; [smithiesr at impala ace_phrap]$ perl bp_read_ace.pl Can't call method "get_consensus_sequence" on an undefined value at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170, line 42. Which relates to this bit in ace.pm: # Loading contig qualities... (Base Quality field) /^BQ/ && do { my $consensus = $contigOBJ->get_consensus_sequence()->seq(); Is this caused by a dud ace file or a problem with Bio::Assembly::IO:ace or is the Contig object not getting created? Any ideas? Thanx, Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at uiuc.edu Mon Dec 3 21:15:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 3 Dec 2007 20:15:58 -0600 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> Message-ID: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> This seems similar to the 'too many open filehandles issue' documented here: http://bugzilla.open-bio.org/show_bug.cgi?id=2320 It unfortunately is due to having an open DB_File for every contig, and is a problem with the Bio::Assembly implementation that isn't easily fixed. Changing the open filehandle limit using ulimit is the only known fix: ulimit -n 10000 chris On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote: > Hi all, > > It' trying to read .ace files but keep getting an error that I don't > know the cause of. > Really basic example code: > > #!/usr/local/bin/perl -w > > use lib "/data/home/smithiesr/bioperl-live"; > use Bio::Assembly::IO; > use Data::Dumper; > > $ace = "CLP0001001240-cE15_20030319.ace"; > > $io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace"); > $assembly = $io->next_assembly; > > foreach $contig ($assembly->all_contigs) { > print Dumper $contig; > } > > Gives this error; > [smithiesr at impala ace_phrap]$ perl bp_read_ace.pl > Can't call method "get_consensus_sequence" on an undefined value > at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170, > line 42. > > Which relates to this bit in ace.pm: > # Loading contig qualities... (Base Quality field) > /^BQ/ && do { > my $consensus = $contigOBJ->get_consensus_sequence()->seq(); > > Is this caused by a dud ace file or a problem with > Bio::Assembly::IO:ace > or is the Contig object not getting created? > Any ideas? > > Thanx, > > Russell Smithies > > Bioinformatics Software Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From florent.angly at gmail.com Mon Dec 3 21:25:24 2007 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 03 Dec 2007 18:25:24 -0800 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> Message-ID: <4754BA94.7090600@gmail.com> Would this issue cause an excessive memory usage? Because I was getting a high memory usage when parsing some TIGR Assembler files and was wondering if the tigr parser was responsible for that or the parent assembly IO module. I'd definitely be interested in a fix of the Bio::Assembly implementation if it's the assembly IO module's fault.... Florent Chris Fields wrote: > This seems similar to the 'too many open filehandles issue' documented > here: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2320 > > It unfortunately is due to having an open DB_File for every contig, > and is a problem with the Bio::Assembly implementation that isn't > easily fixed. Changing the open filehandle limit using ulimit is the > only known fix: > > ulimit -n 10000 > > chris > > On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote: > > >> Hi all, >> >> It' trying to read .ace files but keep getting an error that I don't >> know the cause of. >> Really basic example code: >> >> #!/usr/local/bin/perl -w >> >> use lib "/data/home/smithiesr/bioperl-live"; >> use Bio::Assembly::IO; >> use Data::Dumper; >> >> $ace = "CLP0001001240-cE15_20030319.ace"; >> >> $io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace"); >> $assembly = $io->next_assembly; >> >> foreach $contig ($assembly->all_contigs) { >> print Dumper $contig; >> } >> >> Gives this error; >> [smithiesr at impala ace_phrap]$ perl bp_read_ace.pl >> Can't call method "get_consensus_sequence" on an undefined value >> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170, >> line 42. >> >> Which relates to this bit in ace.pm: >> # Loading contig qualities... (Base Quality field) >> /^BQ/ && do { >> my $consensus = $contigOBJ->get_consensus_sequence()->seq(); >> >> Is this caused by a dud ace file or a problem with >> Bio::Assembly::IO:ace >> or is the Contig object not getting created? >> Any ideas? >> >> Thanx, >> >> Russell Smithies >> >> Bioinformatics Software Developer >> T +64 3 489 9085 >> E russell.smithies at agresearch.co.nz >> >> Invermay Research Centre >> Puddle Alley, >> Mosgiel, >> New Zealand >> T +64 3 489 3809 >> F +64 3 489 9174 >> www.agresearch.co.nz >> >> = >> ====================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use of, >> or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> = >> ====================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Mon Dec 3 21:32:43 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 4 Dec 2007 15:32:43 +1300 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> Message-ID: Thanx Chris, I'm only writing a simple .ace viewer to display assembled contigs in a Bio::Graphics::Panel so I'll parse the coords from the .ace files "manually". Unless anyone else has a better idea ? (and some example code ;-) Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, 4 December 2007 3:16 p.m. > To: Smithies, Russell > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Assembly::IO problems reading .ace files > > This seems similar to the 'too many open filehandles issue' documented > here: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2320 > > It unfortunately is due to having an open DB_File for every contig, > and is a problem with the Bio::Assembly implementation that isn't > easily fixed. Changing the open filehandle limit using ulimit is the > only known fix: > > ulimit -n 10000 > > chris > > On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote: > > > Hi all, > > > > It' trying to read .ace files but keep getting an error that I don't > > know the cause of. > > Really basic example code: > > > > #!/usr/local/bin/perl -w > > > > use lib "/data/home/smithiesr/bioperl-live"; > > use Bio::Assembly::IO; > > use Data::Dumper; > > > > $ace = "CLP0001001240-cE15_20030319.ace"; > > > > $io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace"); > > $assembly = $io->next_assembly; > > > > foreach $contig ($assembly->all_contigs) { > > print Dumper $contig; > > } > > > > Gives this error; > > [smithiesr at impala ace_phrap]$ perl bp_read_ace.pl > > Can't call method "get_consensus_sequence" on an undefined value > > at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170, > > line 42. > > > > Which relates to this bit in ace.pm: > > # Loading contig qualities... (Base Quality field) > > /^BQ/ && do { > > my $consensus = $contigOBJ->get_consensus_sequence()->seq(); > > > > Is this caused by a dud ace file or a problem with > > Bio::Assembly::IO:ace > > or is the Contig object not getting created? > > Any ideas? > > > > Thanx, > > > > Russell Smithies > > > > Bioinformatics Software Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > = > > > ============================================================= > ========= > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > > ============================================================= > ========= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at uiuc.edu Tue Dec 4 00:10:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 3 Dec 2007 23:10:57 -0600 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: <4754BA94.7090600@gmail.com> References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> <4754BA94.7090600@gmail.com> Message-ID: <4F867A88-C0DC-4DF7-9F47-C38712920183@uiuc.edu> Yes, it's possible this would cause memory issues as each Bio::Assembly::Contig instance would have a Bio::SeqFeature::Collection attached (each Collection having a tied DB hash, which would be an open filehandle), So if you had over 1000 contigs open at any one time (in a parsed scaffold, for instance) you would have 1000 open file handles. Not very efficient. My thought was to have each Bio::Assembly::Scaffold instance carry a single Bio::SeqFeature::CollectionI (it could be a Bio::SeqFeature::Collection, Bio::DB::SeqFeature::Store, or any other CollectionI, whatever's easiest). Each Contig would be passed (and store) a reference to the Scaffold SF::Collection and pull features from there; just haven't had time to mess with it. I don't think anyone's tackling it, so feel free to code away! chris On Dec 3, 2007, at 8:25 PM, Florent Angly wrote: > Would this issue cause an excessive memory usage? Because I was > getting a high memory usage when parsing some TIGR Assembler files > and was wondering if the tigr parser was responsible for that or the > parent assembly IO module. > I'd definitely be interested in a fix of the Bio::Assembly > implementation if it's the assembly IO module's fault.... > Florent > > Chris Fields wrote: >> This seems similar to the 'too many open filehandles issue' >> documented here: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2320 >> >> It unfortunately is due to having an open DB_File for every >> contig, and is a problem with the Bio::Assembly implementation >> that isn't easily fixed. Changing the open filehandle limit using >> ulimit is the only known fix: >> >> ulimit -n 10000 >> >> chris >> >> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote: >> >> >>> Hi all, >>> >>> It' trying to read .ace files but keep getting an error that I don't >>> know the cause of. >>> Really basic example code: >>> >>> #!/usr/local/bin/perl -w >>> >>> use lib "/data/home/smithiesr/bioperl-live"; >>> use Bio::Assembly::IO; >>> use Data::Dumper; >>> >>> $ace = "CLP0001001240-cE15_20030319.ace"; >>> >>> $io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace"); >>> $assembly = $io->next_assembly; >>> >>> foreach $contig ($assembly->all_contigs) { >>> print Dumper $contig; >>> } >>> >>> Gives this error; >>> [smithiesr at impala ace_phrap]$ perl bp_read_ace.pl >>> Can't call method "get_consensus_sequence" on an undefined value >>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line >>> 170, >>> line 42. >>> >>> Which relates to this bit in ace.pm: >>> # Loading contig qualities... (Base Quality field) >>> /^BQ/ && do { >>> my $consensus = $contigOBJ->get_consensus_sequence()->seq(); >>> >>> Is this caused by a dud ace file or a problem with >>> Bio::Assembly::IO:ace >>> or is the Contig object not getting created? >>> Any ideas? >>> >>> Thanx, >>> >>> Russell Smithies >>> >>> Bioinformatics Software Developer >>> T +64 3 489 9085 >>> E russell.smithies at agresearch.co.nz >>> >>> Invermay Research Centre >>> Puddle Alley, >>> Mosgiel, >>> New Zealand >>> T +64 3 489 3809 >>> F +64 3 489 9174 >>> www.agresearch.co.nz >>> >>> = >>> = >>> = >>> ==================================================================== >>> Attention: The information contained in this message and/or >>> attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or >>> privileged >>> material. Any review, retransmission, dissemination or other use >>> of, or >>> taking of any action in reliance upon, this information by persons >>> or >>> entities other than the intended recipients is prohibited by >>> AgResearch >>> Limited. If you have received this message in error, please notify >>> the >>> sender immediately. >>> = >>> = >>> = >>> ==================================================================== >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Dec 4 00:20:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 3 Dec 2007 23:20:07 -0600 Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files In-Reply-To: References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu> Message-ID: The ulimit fix usually works but if this is for Gbrowse it probably isn't prudent. It would be nice to get Bio::Assembly working as an Bio::AlignI; it would be easier to manipulate for display. Here's a script I wrote up as an example: http://www.bioperl.org/wiki/HOWTO_Discussion:Graphics chris On Dec 3, 2007, at 8:32 PM, Smithies, Russell wrote: > Thanx Chris, > I'm only writing a simple .ace viewer to display assembled contigs > in a > Bio::Graphics::Panel so I'll parse the coords from the .ace files > "manually". > Unless anyone else has a better idea ? > (and some example code ;-) > > Russell From avilella at gmail.com Tue Dec 4 06:51:05 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 4 Dec 2007 11:51:05 +0000 Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the SLR program Message-ID: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> Hi all, There is a new wrapper in bioperl-run for SLR: http://www.bioperl.org/wiki/SLR Right now, output parsing is very simple, and I have only tested it on my linux machine. Can someone with a Mac give it a try? update your bioperl-run to cvs head, then: # try the installer, SLR is option 6 perl scripts/bioperl_application_installer.PLS # then try to run the tests (should take about a minute) perl t/SLR.t Any comments on the code would be appreciated, Thanks in advance, Cheers, Albert. From captainrave at hotmail.com Tue Dec 4 06:04:57 2007 From: captainrave at hotmail.com (Captainrave) Date: Tue, 4 Dec 2007 03:04:57 -0800 (PST) Subject: [Bioperl-l] extracting CDS location from Genbank Message-ID: <14148723.post@talk.nabble.com> Help. I'm very new to perl and bioperl. Basically I need to extract the location of each CDS in a genbank entry e.g.103...120 and export them to an output file as a list. How would I do this? Your help would be much appreciated! -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14148723 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From michael.watson at bbsrc.ac.uk Tue Dec 4 09:48:27 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 4 Dec 2007 14:48:27 -0000 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14148723.post@talk.nabble.com> References: <14148723.post@talk.nabble.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> >From the SeqIO howto: #!/bin/perl use strict; use Bio::SeqIO; my $file = shift; # get the file name, somehow my $seqio_object = Bio::SeqIO->new(-file => $file); my $seq_object = $seqio_object->next_seq; >From the Feature HOWTO: for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } Surely you could have fouind that yourself? ;0 -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave Sent: 04 December 2007 11:05 To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] extracting CDS location from Genbank Help. I'm very new to perl and bioperl. Basically I need to extract the location of each CDS in a genbank entry e.g.103...120 and export them to an output file as a list. How would I do this? Your help would be much appreciated! -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm l#a14148723 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From captainrave at hotmail.com Tue Dec 4 10:07:19 2007 From: captainrave at hotmail.com (Captainrave) Date: Tue, 4 Dec 2007 07:07:19 -0800 (PST) Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <14152264.post@talk.nabble.com> Yes but actually implementing it is another story. I get an error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: file argument provided, but with an undefined value STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 STACK: test3.pl:7 ----------------------------------------------------------- Basically because I dont understand the code well enough. For example, how do I tell it which input file to read? I know this might sound stupid, but I dont understand the Biowiki very well! -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152264 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From michael.watson at bbsrc.ac.uk Tue Dec 4 10:21:34 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 4 Dec 2007 15:21:34 -0000 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14152264.post@talk.nabble.com> References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> Post the script that produces that error, and your file's location -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave Sent: 04 December 2007 15:07 To: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] extracting CDS location from Genbank Yes but actually implementing it is another story. I get an error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: file argument provided, but with an undefined value STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 STACK: test3.pl:7 ----------------------------------------------------------- Basically because I dont understand the code well enough. For example, how do I tell it which input file to read? I know this might sound stupid, but I dont understand the Biowiki very well! -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm l#a14152264 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Tue Dec 4 10:39:31 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 04 Dec 2007 15:39:31 +0000 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14152264.post@talk.nabble.com> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> Message-ID: <475574B3.8050700@sendu.me.uk> Captainrave wrote: > Yes but actually implementing it is another story. > > I get an error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: file argument provided, but with an undefined value > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 > STACK: test3.pl:7 > ----------------------------------------------------------- The best way to get help is to give us your script and the error message, and the command you used to run your script. The less you know, the more you should give us (ie. don't edit anything out). From captainrave at hotmail.com Tue Dec 4 10:41:37 2007 From: captainrave at hotmail.com (Captainrave) Date: Tue, 4 Dec 2007 07:41:37 -0800 (PST) Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <14152907.post@talk.nabble.com> #!/bin/perl use strict; use Bio::SeqIO; my $file = shift; # get the file name, somehow my $seqio_object = Bio::SeqIO->new(-file => $file); my $seq_object = $seqio_object->next_seq; for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } exit; The file is on the same folder. But how do I tell it to use this file? michael watson (IAH-C) wrote: > > Post the script that produces that error, and your file's location > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave > Sent: 04 December 2007 15:07 > To: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] extracting CDS location from Genbank > > > Yes but actually implementing it is another story. > > I get an error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: file argument provided, but with an undefined value > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 > STACK: test3.pl:7 > ----------------------------------------------------------- > > Basically because I dont understand the code well enough. For example, > how > do I tell it which input file to read? I know this might sound stupid, > but I > dont understand the Biowiki very well! > > -- > View this message in context: > http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm > l#a14152264 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From michael.watson at bbsrc.ac.uk Tue Dec 4 10:53:22 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 4 Dec 2007 15:53:22 -0000 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14152907.post@talk.nabble.com> References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk><14152264.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> <14152907.post@talk.nabble.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F77A@iahce2ksrv1.iah.bbsrc.ac.uk> Same script as below, but try: my $file = 'C:\path\to\my\filename.gbk'; -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave Sent: 04 December 2007 15:42 To: Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] extracting CDS location from Genbank #!/bin/perl use strict; use Bio::SeqIO; my $file = shift; # get the file name, somehow my $seqio_object = Bio::SeqIO->new(-file => $file); my $seq_object = $seqio_object->next_seq; for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } exit; The file is on the same folder. But how do I tell it to use this file? michael watson (IAH-C) wrote: > > Post the script that produces that error, and your file's location > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave > Sent: 04 December 2007 15:07 > To: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] extracting CDS location from Genbank > > > Yes but actually implementing it is another story. > > I get an error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: file argument provided, but with an undefined value > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 > STACK: test3.pl:7 > ----------------------------------------------------------- > > Basically because I dont understand the code well enough. For example, > how > do I tell it which input file to read? I know this might sound stupid, > but I > dont understand the Biowiki very well! > > -- > View this message in context: > http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm > l#a14152264 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm l#a14152907 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Dec 4 11:20:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 4 Dec 2007 10:20:34 -0600 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14152907.post@talk.nabble.com> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> <14152907.post@talk.nabble.com> Message-ID: The 'my $file = shift;' is a perl idiom. The built-in 'shift' used implicitly in this way uses @ARGV (from command line); the file would the be passed as the first arg when running the script: get_features.pl myfile.gb This should work for any OS. Personally, I use something like the following to indicate how the script is used in case a file is never entered: my $USAGE = < Perl script to grab features from a GenBank file and print to a table END_USE my $file = shift || die $USAGE; chris On Dec 4, 2007, at 9:41 AM, Captainrave wrote: > > #!/bin/perl > > use strict; > use Bio::SeqIO; > my $file = shift; # get the file name, somehow > my $seqio_object = Bio::SeqIO->new(-file => $file); > my $seq_object = $seqio_object->next_seq; > > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object->primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object->get_tag_values($tag)) { > > print " value: ", $value, "\n"; > } > } > } > > exit; > > The file is on the same folder. But how do I tell it to use this > file? > > > > michael watson (IAH-C) wrote: >> >> Post the script that produces that error, and your file's location >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Captainrave >> Sent: 04 December 2007 15:07 >> To: Bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] extracting CDS location from Genbank >> >> >> Yes but actually implementing it is another story. >> >> I get an error: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: file argument provided, but with an undefined value >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359 >> STACK: test3.pl:7 >> ----------------------------------------------------------- >> >> Basically because I dont understand the code well enough. For >> example, >> how >> do I tell it which input file to read? I know this might sound >> stupid, >> but I >> dont understand the Biowiki very well! >> >> -- >> View this message in context: >> http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm >> l#a14152264 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Tue Dec 4 11:22:12 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 04 Dec 2007 16:22:12 +0000 Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <14152907.post@talk.nabble.com> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk> <14152907.post@talk.nabble.com> Message-ID: <47557EB4.10003@sendu.me.uk> Captainrave wrote: > #!/bin/perl > my $file = shift; # get the file name, somehow > > The file is on the same folder. But how do I tell it to use this file? http://stein.cshl.org/genome_informatics/perl_intro/command_line.html Basically, when you run your script add the name of the file to your command line. me% perl myscript.pl myfile By saying 'my $file = shift' inside myscript.pl, the variable $file now contains the filename 'myfile'. You could also have hardcoded the filename: my $file = 'myfile'; Anyway, you're going to run into lots of these issues, and they're beyond the scope of this mailing list. For basic perl problems seek help via www.perl.org. When you have a BioPerl-specific question, don't hesitate to post here. From jason at bioperl.org Tue Dec 4 12:16:30 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 4 Dec 2007 09:16:30 -0800 Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the SLR program In-Reply-To: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> Message-ID: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> Excellent - thanks for this ! I'm giving it whirl on linux and the SLR.t test is currently taking more than 30 minutes to run -- is it possible to cook up an example that is going to finish in a more reasonable amount of time? Also - I would prefer if the default exe could be 'Slr' rather than Slr_Linux_static - it seems like it is possible for users to install it this way. Similarly whether or not the Slr_osx or Slr is the default name, is it too big of a deal to expect the user to rename it? I'll give it a whirl on OSX later, but might be easier if the test runs shorter. Thanks! -jason On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote: > Hi all, > > There is a new wrapper in bioperl-run for SLR: > > http://www.bioperl.org/wiki/SLR > > Right now, output parsing is very simple, and I have only tested it on > my linux machine. > Can someone with a Mac give it a try? > > update your bioperl-run to cvs head, then: > > # try the installer, SLR is option 6 > perl scripts/bioperl_application_installer.PLS > # then try to run the tests (should take about a minute) > perl t/SLR.t > > Any comments on the code would be appreciated, > > Thanks in advance, > > Cheers, > > Albert. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Tue Dec 4 13:17:08 2007 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 04 Dec 2007 10:17:08 -0800 Subject: [Bioperl-l] New Bio::Tools::Run::TigrAssembler In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> Message-ID: <475599A4.1040500@gmail.com> Hi all, I pushed a new module into bioperl-run CVS a few days ago. It's called Bio::Tools::Run::TigrAssembler. It is a wrapper for TIGR Assembler, an open-source software that assembles DNA sequences. Input is a list of sequence objects and output assembly objects... easy enough... Let me know if you experience problems with it. Florent From jason at bioperl.org Tue Dec 4 13:51:34 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 4 Dec 2007 10:51:34 -0800 Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST reload In-Reply-To: <475499C3.20801@sendu.me.uk> References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk> Message-ID: <8273f6c20712041051k2bfe36efgb2ae40550df9341@mail.gmail.com> You can pass in an array reference of sequences instead of a single sequence object and the module will build a multi-FASTA database. You can also pass in a filename instead of a Sequence object and the file can be an already built multi-FASTA database. This is described in the documentation: http://search.cpan.org/~birney/bioperl-1.4/Bio/Tools/Run/StandAloneBlast.pm#blastall You can also just run BLAST without StandAloneBlast part as I do an just build your multifile ahead of time with SeqIO and do # wublast my $cmd = "blastp -i MULTIFASTA -d DATABASE --cpus 2 |"; # or NCBI blast # my $cmd = "blastall -a 2 -i MULTIFASTA -p blastp -d DATABASE |"; my $fh; open($fh, $cmd) my $searchio = Bio::SearchIO->new(-format => 'blast', -fh => $fh); The advantage of StandAloneBlast in theory is it takes care of the temporary file creation (sequncefiles) and cleanup. Personally I find I want easier access to my programs that are simple cmdline like this. You can do similar things withe SSEARCH or FASTA searching too. -jason On Dec 3, 2007 4:05 PM, Sendu Bala wrote: > Sven Boekhoff wrote: > > HI! > > I just started working with Perl and BioPerl. I'm quite impressed what > > can be easily done with this module. Today I found that my second CPU > > ist not used, but the first one run's at 100%. I tried to include the > > "-a"-parameter, but I was not successful: > > > > my @params = ( > > -database => 'my_db', > > -a => '2', > > -outfile => 'blast1.out' > > ); > > > > How do I have to use it? > > This should work in the CVS version of StandAloneBlast. In other > versions, perhaps try using $object->a(2); > > > > Second question: In my perlscript I start BLAST-searches in a loop. > > Everytime BLAST has finished its search, the memory is cleared and BLAST > > is started again. I think most of the time is used to reload the > > database. Is it somehow possible to keep the database loaded (e.g. by > > starting a second search) or is BLAST reloaded anyway? > > I hope someone will correct me for being wrong, but I think you'd have > to that with a 2-way pipe. StandAloneBlast only uses output to a file > and input from that file, finishing with the executable inbetween. I've > thought about improving it with a 2-way pipe, but never got around to > it, being apprehensive about stability on all platforms. > > The more obvious solution, which may be possible depending on exactly > what you're doing, is to avoid the loop and just supply Blast all your > input in one go. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From stefan.kirov at bms.com Tue Dec 4 14:25:21 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 04 Dec 2007 14:25:21 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: References: Message-ID: <4755A9A1.2040608@bms.com> Jason Stajich wrote: > PAML4 breaks our PAML parser right now because the order of things in > the result file has changed. Now sequences precede the information > about the version or the program run. This means that $result- > >get_seqs() fails because we don't parse the sequences. > > We'll see what we can do, but as usual with supporting 3rd party > programs it is brittle when file formats change. Th > > -jason > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason, I saw a commit after this post on codeml, but not on PAML.pm- I assume this is not fixed, am I correct? Thanks! Stefan From avilella at gmail.com Tue Dec 4 15:34:38 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 4 Dec 2007 20:34:38 +0000 Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the SLR program In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> Message-ID: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com> hmmm, 30 minutes is quite a lot... it takes much less for me: avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t 1..7 ok 1 - use Bio::Root::IO; ok 2 - use Bio::Tools::Run::Phylo::SLR; ok 3 - use Bio::AlignIO; ok 4 - use Bio::TreeIO; ok 5 ok 6 ok 7 real 0m21.517s user 0m20.717s sys 0m0.100s On Dec 4, 2007 5:16 PM, Jason Stajich wrote: > Excellent - thanks for this ! I'm giving it whirl on linux and the > SLR.t test is currently taking more than 30 minutes to run -- is it > possible to cook up an example that is going to finish in a more > reasonable amount of time? > > Also - I would prefer if the default exe could be 'Slr' rather than > Slr_Linux_static - it seems like it is possible for users to install > it this way. Similarly whether or not the Slr_osx or Slr is the > default name, is it too big of a deal to expect the user to rename it? > > I'll give it a whirl on OSX later, but might be easier if the test > runs shorter. > > Thanks! > -jason > > On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote: > > > Hi all, > > > > There is a new wrapper in bioperl-run for SLR: > > > > http://www.bioperl.org/wiki/SLR > > > > Right now, output parsing is very simple, and I have only tested it on > > my linux machine. > > Can someone with a Mac give it a try? > > > > update your bioperl-run to cvs head, then: > > > > # try the installer, SLR is option 6 > > perl scripts/bioperl_application_installer.PLS > > # then try to run the tests (should take about a minute) > > perl t/SLR.t > > > > Any comments on the code would be appreciated, > > > > Thanks in advance, > > > > Cheers, > > > > Albert. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From avilella at gmail.com Tue Dec 4 15:39:26 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 4 Dec 2007 20:39:26 +0000 Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the SLR program In-Reply-To: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com> References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com> Message-ID: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com> oh, I forgot to mention: SLR uses the lapack and blas libraries if installed, which makes it a lot faster (according to the author)... maybe that's the reason... On Dec 4, 2007 8:34 PM, Albert Vilella wrote: > hmmm, 30 minutes is quite a lot... it takes much less for me: > > avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t > 1..7 > ok 1 - use Bio::Root::IO; > ok 2 - use Bio::Tools::Run::Phylo::SLR; > ok 3 - use Bio::AlignIO; > ok 4 - use Bio::TreeIO; > ok 5 > ok 6 > ok 7 > > real 0m21.517s > user 0m20.717s > sys 0m0.100s > > > > On Dec 4, 2007 5:16 PM, Jason Stajich wrote: > > Excellent - thanks for this ! I'm giving it whirl on linux and the > > SLR.t test is currently taking more than 30 minutes to run -- is it > > possible to cook up an example that is going to finish in a more > > reasonable amount of time? > > > > Also - I would prefer if the default exe could be 'Slr' rather than > > Slr_Linux_static - it seems like it is possible for users to install > > it this way. Similarly whether or not the Slr_osx or Slr is the > > default name, is it too big of a deal to expect the user to rename it? > > > > I'll give it a whirl on OSX later, but might be easier if the test > > runs shorter. > > > > Thanks! > > -jason > > > > On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote: > > > > > Hi all, > > > > > > There is a new wrapper in bioperl-run for SLR: > > > > > > http://www.bioperl.org/wiki/SLR > > > > > > Right now, output parsing is very simple, and I have only tested it on > > > my linux machine. > > > Can someone with a Mac give it a try? > > > > > > update your bioperl-run to cvs head, then: > > > > > > # try the installer, SLR is option 6 > > > perl scripts/bioperl_application_installer.PLS > > > # then try to run the tests (should take about a minute) > > > perl t/SLR.t > > > > > > Any comments on the code would be appreciated, > > > > > > Thanks in advance, > > > > > > Cheers, > > > > > > Albert. > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > From jason at bioperl.org Tue Dec 4 16:43:03 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 4 Dec 2007 13:43:03 -0800 Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the SLR program In-Reply-To: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com> References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com> <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org> <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com> <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com> Message-ID: <2CF76A38-5A9E-4A4E-8C9F-29EDD732BDDF@bioperl.org> My own icc compiled version seemed to have caused the problem. whoops. fixed that. -jason On Dec 4, 2007, at 12:39 PM, Albert Vilella wrote: > oh, I forgot to mention: SLR uses the lapack and blas libraries if > installed, which makes it a lot faster (according to the author)... > maybe that's the reason... > > On Dec 4, 2007 8:34 PM, Albert Vilella wrote: >> hmmm, 30 minutes is quite a lot... it takes much less for me: >> >> avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t >> 1..7 >> ok 1 - use Bio::Root::IO; >> ok 2 - use Bio::Tools::Run::Phylo::SLR; >> ok 3 - use Bio::AlignIO; >> ok 4 - use Bio::TreeIO; >> ok 5 >> ok 6 >> ok 7 >> >> real 0m21.517s >> user 0m20.717s >> sys 0m0.100s >> >> >> >> On Dec 4, 2007 5:16 PM, Jason Stajich wrote: >>> Excellent - thanks for this ! I'm giving it whirl on linux and the >>> SLR.t test is currently taking more than 30 minutes to run -- is it >>> possible to cook up an example that is going to finish in a more >>> reasonable amount of time? >>> >>> Also - I would prefer if the default exe could be 'Slr' rather than >>> Slr_Linux_static - it seems like it is possible for users to install >>> it this way. Similarly whether or not the Slr_osx or Slr is the >>> default name, is it too big of a deal to expect the user to >>> rename it? >>> >>> I'll give it a whirl on OSX later, but might be easier if the test >>> runs shorter. >>> >>> Thanks! >>> -jason >>> >>> On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote: >>> >>>> Hi all, >>>> >>>> There is a new wrapper in bioperl-run for SLR: >>>> >>>> http://www.bioperl.org/wiki/SLR >>>> >>>> Right now, output parsing is very simple, and I have only tested >>>> it on >>>> my linux machine. >>>> Can someone with a Mac give it a try? >>>> >>>> update your bioperl-run to cvs head, then: >>>> >>>> # try the installer, SLR is option 6 >>>> perl scripts/bioperl_application_installer.PLS >>>> # then try to run the tests (should take about a minute) >>>> perl t/SLR.t >>>> >>>> Any comments on the code would be appreciated, >>>> >>>> Thanks in advance, >>>> >>>> Cheers, >>>> >>>> Albert. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> From stefan.kirov at bms.com Tue Dec 4 16:51:51 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 04 Dec 2007 16:51:51 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: References: <4755A9A1.2040608@bms.com> Message-ID: <4755CBF7.5010709@bms.com> Jason Stajich wrote: > should be fixed. > > $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm > revision 1.56 > date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14 > Parsing PAML4 and PAML3.15 should work now. Dealing with variable > order for the sequences and summary results in > the top of the MLC files > Yes, this is the version I have and in some cases the sequences do not get parsed. I have missed this commit. I will try to assemble a testcase and send it. Cannot promise when but will try to do it tomorrow. My gut feeling so far is that the parser works whenever there are gaps in the alignment, otherwise it does not. PAML surely has very peculiar format. Thanks again! Stefan > On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: > >> Jason Stajich wrote: >>> PAML4 breaks our PAML parser right now because the order of things in >>> the result file has changed. Now sequences precede the information >>> about the version or the program run. This means that $result- >>>> get_seqs() fails because we don't parse the sequences. >>> >>> We'll see what we can do, but as usual with supporting 3rd party >>> programs it is brittle when file formats change. Th >>> >>> -jason >>> >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> Jason, >> I saw a commit after this post on codeml, but not on PAML.pm- I assume >> this is not fixed, am I correct? >> Thanks! >> Stefan > > From jason at bioperl.org Tue Dec 4 16:36:09 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 4 Dec 2007 13:36:09 -0800 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: <4755A9A1.2040608@bms.com> References: <4755A9A1.2040608@bms.com> Message-ID: should be fixed. $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm revision 1.56 date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14 Parsing PAML4 and PAML3.15 should work now. Dealing with variable order for the sequences and summary results in the top of the MLC files On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: > Jason Stajich wrote: >> PAML4 breaks our PAML parser right now because the order of things in >> the result file has changed. Now sequences precede the information >> about the version or the program run. This means that $result- >>> get_seqs() fails because we don't parse the sequences. >> >> We'll see what we can do, but as usual with supporting 3rd party >> programs it is brittle when file formats change. Th >> >> -jason >> >> -- >> Jason Stajich >> jason at bioperl.org >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Jason, > I saw a commit after this post on codeml, but not on PAML.pm- I assume > this is not fixed, am I correct? > Thanks! > Stefan From johan.nilsson at sh.se Wed Dec 5 06:35:58 2007 From: johan.nilsson at sh.se (Johan Nilsson) Date: Wed, 5 Dec 2007 12:35:58 +0100 Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm" Message-ID: Hello, I have a bunch of multiple sequence alignments of protein coding genes, which I would like to analyse with the SLAC method of the HyPhy package. I tried using the SLAC.pm module in bioperl-run, but I could not get it to work properly. Basically, for each MSA file, I create the Bio::Tree::Tree and Bio::SimpleAlign objects ($tree and $aln, respectively) required as arguments to SLAC, and call the method with: "($rc,$result) = $slac->run($aln,$tree)" in a loop procedure in my script. When I choose not to save the tmp files (the default option in SLAC.pm), the program complains that it cannot find the file "$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA (which works fine). Apparently, it looks for the wrapper.bf file in the first tmp dir created, which is deleted in the end of the first SLAC call. If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')), all calls to SLAC give returncode 1, and no error message is received. However, when I look at the resulting $result hashref, it turns out that all results are for the FIRST alignment read. I've made sure there is nothing strange with my loop procedure, and I checked that the tree and alignment objects look OK for each MSA. Apparently, it does create new "results.tsv" files in the tmp directory after each run, but it is identical each time it's created. Also, it only creates ONE tmp directory, no matter how many times SLAC is executed (I would imagine it was supposed to save each result in separate tmp dirs?) Thus, it seems to me like the errors occur because something goes wrong in the creation of temporary files. Have I done something wrong here, or have any other of you experienced the same problem? Best regards /Johan -- Johan Nilsson, Ph.D. School of Life Sciences S?dert?rns University College S-141 89 Huddinge, Sweden E-mail: johan.nilsson at sh.se Phone: +46 8 608 47 05, +46 70 456 10 51 From bernd.web at gmail.com Wed Dec 5 08:10:04 2007 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 5 Dec 2007 14:10:04 +0100 Subject: [Bioperl-l] SimpleAlign is_flush Message-ID: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com> Hi, SimpleAlign has an is_flush: Function : Tells you whether the alignment is flush, i.e. all of the same length Returns : 1 or 0 I noticed that a file with multiple fasta sequences with different lengths has an is_flush value of 1. Printing the "alignment" shows that sequences are appended with "-" so that the all are the same length. Does this mean that is_flush for alignments read in via AlignIO is indeed always true and thus as such a so useful ? (using bioperl version: 1.005002102) Regards, Bernd From cjfields at uiuc.edu Wed Dec 5 08:53:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 5 Dec 2007 07:53:59 -0600 Subject: [Bioperl-l] SimpleAlign is_flush In-Reply-To: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com> References: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com> Message-ID: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu> Yes; it's a convenient way to make sure all seqs have the same length (including gaps). Nice for checking when adding new seqs to an alignment or building new parsers. chris On Dec 5, 2007, at 7:10 AM, Bernd Web wrote: > Hi, > > SimpleAlign has an is_flush: > Function : Tells you whether the alignment is flush, i.e. all of the > same length > Returns : 1 or 0 > > I noticed that a file with multiple fasta sequences with different > lengths has an is_flush value of 1. Printing the "alignment" shows > that sequences are appended with "-" so that the all are the same > length. Does this mean that is_flush for alignments read in via > AlignIO is indeed always true and thus as such a so useful ? > > (using bioperl version: 1.005002102) > > > Regards, > Bernd > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From captainrave at hotmail.com Wed Dec 5 07:37:02 2007 From: captainrave at hotmail.com (Captainrave) Date: Wed, 5 Dec 2007 04:37:02 -0800 (PST) Subject: [Bioperl-l] extracting CDS location from Genbank In-Reply-To: <475574B3.8050700@sendu.me.uk> References: <14148723.post@talk.nabble.com> <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk> <14152264.post@talk.nabble.com> <475574B3.8050700@sendu.me.uk> Message-ID: <14170499.post@talk.nabble.com> Thanks, it works great now. Do any of you know if there is a tag to pull out CDS location. i.e. the values such as 132...145 etc? Those are all I need. Also, is there anyway to stop it reporting tag and value and literally JUST output the value? Thanks!!! -- View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14170499 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From stefan.kirov at bms.com Wed Dec 5 09:24:20 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 05 Dec 2007 09:24:20 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: References: <4755A9A1.2040608@bms.com> Message-ID: <4756B494.7020100@bms.com> Jason, When there is a gapless alignment we have a differently formatted output from codeml: kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc seed used = 492211105 3 141 ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA And parsing this fails... The next one has gaps and works fine: kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc seed used = 492252697 Before deleting alignment gaps 2 162 ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC CTT GGT TCA GGA GGT CAG TTC CTG ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT CCT GGT ACA GGA AAC AAG CTT CTG I will send both whole files as an attachment with another mail (I do not know if these are going to pass through). My guess is that the whole _parse_summary method has to be re-worked as there is no tag to look for before the sequences start. Ugly. I am not sure what else could become broken if I try to fix it, so I will leave it to you. Stefan > should be fixed. > > $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm > revision 1.56 > date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14 > Parsing PAML4 and PAML3.15 should work now. Dealing with variable > order for the sequences and summary results in > the top of the MLC files > > On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: > >> Jason Stajich wrote: >>> PAML4 breaks our PAML parser right now because the order of things in >>> the result file has changed. Now sequences precede the information >>> about the version or the program run. This means that $result- >>>> get_seqs() fails because we don't parse the sequences. >>> >>> We'll see what we can do, but as usual with supporting 3rd party >>> programs it is brittle when file formats change. Th >>> >>> -jason >>> >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> Jason, >> I saw a commit after this post on codeml, but not on PAML.pm- I assume >> this is not fixed, am I correct? >> Thanks! >> Stefan > > From stefan.kirov at bms.com Wed Dec 5 09:35:23 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 05 Dec 2007 09:35:23 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: <4756B494.7020100@bms.com> References: <4755A9A1.2040608@bms.com> <4756B494.7020100@bms.com> Message-ID: <4756B72B.6000103@bms.com> Here are the files. Stefan Stefan Kirov wrote: > Jason, > When there is a gapless alignment we have a differently formatted output > from codeml: > kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc > > seed used = 492211105 > 3 141 > > ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC > CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG > CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA > ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC > CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG > CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA > ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC > CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG > CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA > > And parsing this fails... > The next one has gaps and works fine: > > kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc > > seed used = 492252697 > > Before deleting alignment gaps > 2 162 > > ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA > AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC > AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC > CTT GGT TCA GGA GGT CAG TTC CTG > ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA > AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT > AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT > CCT GGT ACA GGA AAC AAG CTT CTG > > I will send both whole files as an attachment with another mail (I do > not know if these are going to pass through). > My guess is that the whole _parse_summary method has to be re-worked as > there is no tag to look for before the sequences start. Ugly. > I am not sure what else could become broken if I try to fix it, so I > will leave it to you. > Stefan > >> should be fixed. >> >> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm >> revision 1.56 >> date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14 >> Parsing PAML4 and PAML3.15 should work now. Dealing with variable >> order for the sequences and summary results in >> the top of the MLC files >> >> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: >> >> >>> Jason Stajich wrote: >>> >>>> PAML4 breaks our PAML parser right now because the order of things in >>>> the result file has changed. Now sequences precede the information >>>> about the version or the program run. This means that $result- >>>> >>>>> get_seqs() fails because we don't parse the sequences. >>>>> >>>> We'll see what we can do, but as usual with supporting 3rd party >>>> programs it is brittle when file formats change. Th >>>> >>>> -jason >>>> >>>> -- >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> Jason, >>> I saw a commit after this post on codeml, but not on PAML.pm- I assume >>> this is not fixed, am I correct? >>> Thanks! >>> Stefan >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -------------- next part -------------- A non-text attachment was scrubbed... Name: mlc.tar.gz Type: application/x-gzip Size: 3237 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071205/bd77cde1/attachment.gz From aaron.j.mackey at gsk.com Wed Dec 5 09:56:31 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Wed, 5 Dec 2007 09:56:31 -0500 Subject: [Bioperl-l] SimpleAlign is_flush In-Reply-To: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu> Message-ID: Well, if you use AlignIO::fasta to read in a multi-fasta file of *unaligned* sequences, AlignIO::fasta makes the assumption that all of your sequences are aligned, and pads the ends of shorter sequences with gap characters (essentially, enforcing a rather silly, yet valid alignment). The fact that is_flush() then returns 1 is secondary. If you just want to read in an array of unaligned sequences, use SeqIO::fasta instead. It doesn't really make much sense to use AlignIO for sequences that are not aligned ... conversely, if you *do* have aligned sequences in a multi-fasta file, then it does make sense to use AlignIO, and it also makes sense for AlignIO::fasta to end-pad sequences with gaps as necessary to get a fully valid, flush multiple sequence alignment matrix. -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM: > Yes; it's a convenient way to make sure all seqs have the same length > (including gaps). Nice for checking when adding new seqs to an > alignment or building new parsers. > > chris > > On Dec 5, 2007, at 7:10 AM, Bernd Web wrote: > > > Hi, > > > > SimpleAlign has an is_flush: > > Function : Tells you whether the alignment is flush, i.e. all of the > > same length > > Returns : 1 or 0 > > > > I noticed that a file with multiple fasta sequences with different > > lengths has an is_flush value of 1. Printing the "alignment" shows > > that sequences are appended with "-" so that the all are the same > > length. Does this mean that is_flush for alignments read in via > > AlignIO is indeed always true and thus as such a so useful ? > > > > (using bioperl version: 1.005002102) > > > > > > Regards, > > Bernd > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Dec 5 11:22:01 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 5 Dec 2007 10:22:01 -0600 Subject: [Bioperl-l] SimpleAlign is_flush In-Reply-To: References: Message-ID: That's true. I assumed Bernd's seqs were aligned. chris On Dec 5, 2007, at 8:56 AM, aaron.j.mackey at gsk.com wrote: > Well, if you use AlignIO::fasta to read in a multi-fasta file of > *unaligned* sequences, AlignIO::fasta makes the assumption that all of > your sequences are aligned, and pads the ends of shorter sequences > with > gap characters (essentially, enforcing a rather silly, yet valid > alignment). The fact that is_flush() then returns 1 is secondary. > > If you just want to read in an array of unaligned sequences, use > SeqIO::fasta instead. It doesn't really make much sense to use > AlignIO > for sequences that are not aligned ... conversely, if you *do* have > aligned sequences in a multi-fasta file, then it does make sense to > use > AlignIO, and it also makes sense for AlignIO::fasta to end-pad > sequences > with gaps as necessary to get a fully valid, flush multiple sequence > alignment matrix. > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM: > >> Yes; it's a convenient way to make sure all seqs have the same length >> (including gaps). Nice for checking when adding new seqs to an >> alignment or building new parsers. >> >> chris >> >> On Dec 5, 2007, at 7:10 AM, Bernd Web wrote: >> >>> Hi, >>> >>> SimpleAlign has an is_flush: >>> Function : Tells you whether the alignment is flush, i.e. all of >>> the >>> same length >>> Returns : 1 or 0 >>> >>> I noticed that a file with multiple fasta sequences with different >>> lengths has an is_flush value of 1. Printing the "alignment" shows >>> that sequences are appended with "-" so that the all are the same >>> length. Does this mean that is_flush for alignments read in via >>> AlignIO is indeed always true and thus as such a so useful ? >>> >>> (using bioperl version: 1.005002102) >>> >>> >>> Regards, >>> Bernd >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Wed Dec 5 14:56:47 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 05 Dec 2007 14:56:47 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: <4756B494.7020100@bms.com> References: <4755A9A1.2040608@bms.com> <4756B494.7020100@bms.com> Message-ID: <4757027F.407@bms.com> Here is a patch that seems to be working and does not break the existing tests: --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm 2007-12-05 10:16:53.120720000 -0500 +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm 2007-12-05 14:46:31.436278000 -0500 @@ -419,7 +419,10 @@ # CODONML (in paml 3.12 February 2002) <<-- what we want to see! my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) | YN00 )x; + my $line; + $self->{'_already_parsed_seqs'}=$self->{'_already_parsed_seqs'}?1:0; while ($_ = $self->_readline) { + $line++; if ( m/^($SEQTYPES) \s+ # seqtype: CODONML, AAML, BASEML, CODON2AAML, YN00, etc (?: \(in \s+ ([^\)]+?) \s* \) \s* )? # version: "paml 3.12 February 2002"; not present < 3.1 or YN00 (\S+) \s* # tree filename @@ -436,8 +439,11 @@ } elsif (m/^Data set \d$/) { $self->{'_summary'} = {}; $self->{'_summary'}->{'multidata'}++; - } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) { - my ($phylip_header) = $self->_readline; + } + elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap + my ($phylip_header) = $self->_readline; + $self->_parse_seqs; + } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1)) {#No gap $self->_parse_seqs; } } @@ -681,7 +687,6 @@ } sub _parse_seqs { - # this should in fact be packed into a Bio::SimpleAlign object instead of # an array but we'll stay with this for now my ($self) = @_; What this does is trigger sequence parsing if the /Before.../ pattern is not seen until line 4. Since phylip_header seems to be doing nothing one could completely eliminate the first seq parse elsif (even though counting lines is not a good thing). Since I am not aware of all consequences of changing the sequence parsing and I have no idea how extensive the tests are, I am not committing anything, but feel free to use that if you wish. Stefan Stefan Kirov wrote: > Jason, > When there is a gapless alignment we have a differently formatted output > from codeml: > kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc > > seed used = 492211105 > 3 141 > > ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC > CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG > CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA > ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC > CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG > CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA > ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC > CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG > CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA > > And parsing this fails... > The next one has gaps and works fine: > > kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc > > seed used = 492252697 > > Before deleting alignment gaps > 2 162 > > ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA > AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC > AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC > CTT GGT TCA GGA GGT CAG TTC CTG > ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA > AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT > AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT > CCT GGT ACA GGA AAC AAG CTT CTG > > I will send both whole files as an attachment with another mail (I do > not know if these are going to pass through). > My guess is that the whole _parse_summary method has to be re-worked as > there is no tag to look for before the sequences start. Ugly. > I am not sure what else could become broken if I try to fix it, so I > will leave it to you. > Stefan > >> should be fixed. >> >> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm >> revision 1.56 >> date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: +21 -14 >> Parsing PAML4 and PAML3.15 should work now. Dealing with variable >> order for the sequences and summary results in >> the top of the MLC files >> >> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: >> >> >>> Jason Stajich wrote: >>> >>>> PAML4 breaks our PAML parser right now because the order of things in >>>> the result file has changed. Now sequences precede the information >>>> about the version or the program run. This means that $result- >>>> >>>>> get_seqs() fails because we don't parse the sequences. >>>>> >>>> We'll see what we can do, but as usual with supporting 3rd party >>>> programs it is brittle when file formats change. Th >>>> >>>> -jason >>>> >>>> -- >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> Jason, >>> I saw a commit after this post on codeml, but not on PAML.pm- I assume >>> this is not fixed, am I correct? >>> Thanks! >>> Stefan >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Wed Dec 5 15:01:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 5 Dec 2007 12:01:29 -0800 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: <4757027F.407@bms.com> References: <4755A9A1.2040608@bms.com> <4756B494.7020100@bms.com> <4757027F.407@bms.com> Message-ID: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org> sounds good - can you - make it as a bug with the patch and sample files in bugzilla - commit changes and I'll test as well thanks, -j On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote: > Here is a patch that seems to be working and does not break the > existing > tests: > > --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm 2007-12-05 > 10:16:53.120720000 -0500 > +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm > 2007-12-05 14:46:31.436278000 -0500 > @@ -419,7 +419,10 @@ > # CODONML (in paml 3.12 February 2002) <<-- what we want to see! > > my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) | > YN00 )x; > + my $line; > + $self->{'_already_parsed_seqs'}=$self-> > {'_already_parsed_seqs'}?1:0; > while ($_ = $self->_readline) { > + $line++; > if ( m/^($SEQTYPES) \s+ # seqtype: > CODONML, > AAML, BASEML, CODON2AAML, YN00, etc > (?: \(in \s+ ([^\)]+?) \s* \) \s* )? # version: "paml > 3.12 February 2002"; not present < 3.1 or YN00 > (\S+) \s* # tree filename > @@ -436,8 +439,11 @@ > } elsif (m/^Data set \d$/) { > $self->{'_summary'} = {}; > $self->{'_summary'}->{'multidata'}++; > - } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) { > - my ($phylip_header) = $self->_readline; > + } > + elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap > + my ($phylip_header) = $self->_readline; > + $self->_parse_seqs; > + } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1)) > {#No gap > $self->_parse_seqs; > } > } > @@ -681,7 +687,6 @@ > } > > sub _parse_seqs { > - > # this should in fact be packed into a Bio::SimpleAlign object > instead of > # an array but we'll stay with this for now > my ($self) = @_; > > > What this does is trigger sequence parsing if the /Before.../ > pattern is > not seen until line 4. Since phylip_header seems to be doing > nothing one > could completely eliminate the first seq parse elsif (even though > counting lines is not a good thing). > Since I am not aware of all consequences of changing the sequence > parsing and I have no idea how extensive the tests are, I am not > committing anything, but feel free to use that if you wish. > Stefan > > Stefan Kirov wrote: >> Jason, >> When there is a gapless alignment we have a differently formatted >> output >> from codeml: >> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc >> >> seed used = 492211105 >> 3 141 >> >> ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC >> ACC CAC >> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC >> AGT CTG >> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC >> ACC CTC ATA >> ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC >> ACC CAC >> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC >> AGC CTG >> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC >> ACC CTC ATA >> ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC >> ACC CAC >> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC >> AGC ATG >> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC >> ACC CTC ATA >> >> And parsing this fails... >> The next one has gaps and works fine: >> >> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc >> >> seed used = 492252697 >> >> Before deleting alignment gaps >> 2 162 >> >> ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG >> GCA GAA >> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA >> CCG AAC >> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA >> GAT CTC >> CTT GGT TCA GGA GGT CAG TTC CTG >> ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA >> GCA GAA >> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC >> CCA ACT >> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- >> --- ATT >> CCT GGT ACA GGA AAC AAG CTT CTG >> >> I will send both whole files as an attachment with another mail (I do >> not know if these are going to pass through). >> My guess is that the whole _parse_summary method has to be re- >> worked as >> there is no tag to look for before the sequences start. Ugly. >> I am not sure what else could become broken if I try to fix it, so I >> will leave it to you. >> Stefan >> >>> should be fixed. >>> >>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm >>> revision 1.56 >>> date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: >>> +21 -14 >>> Parsing PAML4 and PAML3.15 should work now. Dealing with variable >>> order for the sequences and summary results in >>> the top of the MLC files >>> >>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: >>> >>> >>>> Jason Stajich wrote: >>>> >>>>> PAML4 breaks our PAML parser right now because the order of >>>>> things in >>>>> the result file has changed. Now sequences precede the >>>>> information >>>>> about the version or the program run. This means that $result- >>>>> >>>>>> get_seqs() fails because we don't parse the sequences. >>>>>> >>>>> We'll see what we can do, but as usual with supporting 3rd party >>>>> programs it is brittle when file formats change. Th >>>>> >>>>> -jason >>>>> >>>>> -- >>>>> Jason Stajich >>>>> jason at bioperl.org >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> Jason, >>>> I saw a commit after this post on codeml, but not on PAML.pm- I >>>> assume >>>> this is not fixed, am I correct? >>>> Thanks! >>>> Stefan >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From stefan.kirov at bms.com Wed Dec 5 15:33:47 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 05 Dec 2007 15:33:47 -0500 Subject: [Bioperl-l] PAML/Codeml parsing In-Reply-To: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org> References: <4755A9A1.2040608@bms.com> <4756B494.7020100@bms.com> <4757027F.407@bms.com> <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org> Message-ID: <47570B2B.5090602@bms.com> Done. Jason Stajich wrote: > sounds good - can you > - make it as a bug with the patch and sample files in bugzilla > - commit changes and I'll test as well > > thanks, > -j > > On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote: > > >> Here is a patch that seems to be working and does not break the >> existing >> tests: >> >> --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm 2007-12-05 >> 10:16:53.120720000 -0500 >> +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm >> 2007-12-05 14:46:31.436278000 -0500 >> @@ -419,7 +419,10 @@ >> # CODONML (in paml 3.12 February 2002) <<-- what we want to see! >> >> my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) | >> YN00 )x; >> + my $line; >> + $self->{'_already_parsed_seqs'}=$self-> >> {'_already_parsed_seqs'}?1:0; >> while ($_ = $self->_readline) { >> + $line++; >> if ( m/^($SEQTYPES) \s+ # seqtype: >> CODONML, >> AAML, BASEML, CODON2AAML, YN00, etc >> (?: \(in \s+ ([^\)]+?) \s* \) \s* )? # version: "paml >> 3.12 February 2002"; not present < 3.1 or YN00 >> (\S+) \s* # tree filename >> @@ -436,8 +439,11 @@ >> } elsif (m/^Data set \d$/) { >> $self->{'_summary'} = {}; >> $self->{'_summary'}->{'multidata'}++; >> - } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) { >> - my ($phylip_header) = $self->_readline; >> + } >> + elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap >> + my ($phylip_header) = $self->_readline; >> + $self->_parse_seqs; >> + } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1)) >> {#No gap >> $self->_parse_seqs; >> } >> } >> @@ -681,7 +687,6 @@ >> } >> >> sub _parse_seqs { >> - >> # this should in fact be packed into a Bio::SimpleAlign object >> instead of >> # an array but we'll stay with this for now >> my ($self) = @_; >> >> >> What this does is trigger sequence parsing if the /Before.../ >> pattern is >> not seen until line 4. Since phylip_header seems to be doing >> nothing one >> could completely eliminate the first seq parse elsif (even though >> counting lines is not a good thing). >> Since I am not aware of all consequences of changing the sequence >> parsing and I have no idea how extensive the tests are, I am not >> committing anything, but feel free to use that if you wish. >> Stefan >> >> Stefan Kirov wrote: >> >>> Jason, >>> When there is a gapless alignment we have a differently formatted >>> output >>> from codeml: >>> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc >>> >>> seed used = 492211105 >>> 3 141 >>> >>> ENSRNOE00000058637 GCG AGC AAG TGT GAC AGC CAT GGC >>> ACC CAC >>> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC >>> AGT CTG >>> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC >>> ACC CTC ATA >>> ENSMUSE00000366347 GCG AGC AAG TGT GAC AGC CAC GGC >>> ACC CAC >>> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC >>> AGC CTG >>> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC >>> ACC CTC ATA >>> ENSE00001279150 GCC AGC AAG TGT GAC AGT CAT GGC >>> ACC CAC >>> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC >>> AGC ATG >>> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC >>> ACC CTC ATA >>> >>> And parsing this fails... >>> The next one has gaps and works fine: >>> >>> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc >>> >>> seed used = 492252697 >>> >>> Before deleting alignment gaps >>> 2 162 >>> >>> ENSMUSE00000460297 AAT ATC GAT ACA TTT TAC AAG GAG >>> GCA GAA >>> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA >>> CCG AAC >>> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA >>> GAT CTC >>> CTT GGT TCA GGA GGT CAG TTC CTG >>> ENSE00000939192 AAT ATT GAC ATA CTT TGC AAT GAA >>> GCA GAA >>> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC >>> CCA ACT >>> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- >>> --- ATT >>> CCT GGT ACA GGA AAC AAG CTT CTG >>> >>> I will send both whole files as an attachment with another mail (I do >>> not know if these are going to pass through). >>> My guess is that the whole _parse_summary method has to be re- >>> worked as >>> there is no tag to look for before the sequences start. Ugly. >>> I am not sure what else could become broken if I try to fix it, so I >>> will leave it to you. >>> Stefan >>> >>> >>>> should be fixed. >>>> >>>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm >>>> revision 1.56 >>>> date: 2007/11/01 14:52:56; author: jason; state: Exp; lines: >>>> +21 -14 >>>> Parsing PAML4 and PAML3.15 should work now. Dealing with variable >>>> order for the sequences and summary results in >>>> the top of the MLC files >>>> >>>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote: >>>> >>>> >>>> >>>>> Jason Stajich wrote: >>>>> >>>>> >>>>>> PAML4 breaks our PAML parser right now because the order of >>>>>> things in >>>>>> the result file has changed. Now sequences precede the >>>>>> information >>>>>> about the version or the program run. This means that $result- >>>>>> >>>>>> >>>>>>> get_seqs() fails because we don't parse the sequences. >>>>>>> >>>>>>> >>>>>> We'll see what we can do, but as usual with supporting 3rd party >>>>>> programs it is brittle when file formats change. Th >>>>>> >>>>>> -jason >>>>>> >>>>>> -- >>>>>> Jason Stajich >>>>>> jason at bioperl.org >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>> Jason, >>>>> I saw a commit after this post on codeml, but not on PAML.pm- I >>>>> assume >>>>> this is not fixed, am I correct? >>>>> Thanks! >>>>> Stefan >>>>> >>>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Thu Dec 6 09:58:31 2007 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 6 Dec 2007 15:58:31 +0100 Subject: [Bioperl-l] graphics - Panel Message-ID: <716af09c0712060658t5504b377ob2d46adb85754284@mail.gmail.com> Hi, For map $segstart is available. This holds the left most start of the feature (The left end of $ref displayed in the detailed view). However, is it accessible also for track coderefs? I'd like to access it in add_track, like -bgcolor => sub { my $feature = shift; my $start = $feature->segstart; .... do something with the segstart }, I realize I can add a -tag which holds the left most start of by segmented feature, and then get it out in from $feature, but I wonder if the $segstart can also be accessed in the coderef some how. Does someone know this? Best regards, Bernd From georose at gmail.com Thu Dec 6 10:28:24 2007 From: georose at gmail.com (geo rose) Date: Thu, 6 Dec 2007 08:28:24 -0700 Subject: [Bioperl-l] getting sequences from external databank Message-ID: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com> Hi Bioperl, In the past, I have been able to retrieve sequences from an external databank, but my scripts are not working anymore. I am afraid that I may have broken my Bioperl installation while updating my Fedora7 machine with yum update. Below is an example of what happens. The script is from http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/node2.html and it works. (I used it on an older machine with Bioperl and MacOS Tiger) __________________________________________________________________________________ #!/usr/bin/perl -w use Bio::SeqIO; use Bio::DB::GenBank; $genBank = new Bio::DB::GenBank; # This object knows how to talk to GenBank my $seq = $genBank->get_Seq_by_acc('AF060485'); # get a record by accession my $seqOut = new Bio::SeqIO(-format => 'genbank'); $seqOut->write_seq($seq); _________________________________________________________________________________________ This is the error I get _________________________________________________________________________________________ [home at home Desktop]# perl final-seq-db-test1.pl Bio::SeqIO: genbank cannot be found Exception ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Failed to load module Bio::SeqIO::genbank. Weak references are not implemented in the version of perl at /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91 BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172. BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172. Compilation failed in require at /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425. STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::Root::Root::_load_module /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427 STACK: Bio::SeqIO::_load_format_module /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555 STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376 STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458 STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172 STACK: final-seq-db-test1.pl:8 ----------------------------------------------------------- For more information about the SeqIO system please see the SeqIO docs. This includes ways of checking for formats at compile time, not run time ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: acc AF060485 does not exist STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173 STACK: final-seq-db-test1.pl:8 ----------------------------------------------------------- [home at home Desktop]# Use of uninitialized value in concatenation (.) or string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/Util.pm line 30. [home at home Desktop]# ________________________________________________________________________________________ Before I mess things up further I thought I'd ask: Can I fix this problem by reinstalling some part of Bioperl or Perl? Thanks, George From barry.moore at genetics.utah.edu Thu Dec 6 12:56:50 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 6 Dec 2007 10:56:50 -0700 Subject: [Bioperl-l] getting sequences from external databank In-Reply-To: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com> References: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com> Message-ID: George, This is a hideous little bug in Red Hat/Fedora installations of perl. It's happened to me a couple time on upgrades, but it's always fixed with perl -MCPAN -e shell force install Scalar::Util http://www.perlmonks.org/?node_id=460411 Barry On Dec 6, 2007, at 8:28 AM, geo rose wrote: > Hi Bioperl, > > In the past, I have been able to retrieve sequences from an external > databank, but my scripts are not working anymore. > I am afraid that I may have broken my Bioperl installation while > updating my > Fedora7 machine with yum update. > > Below is an example of what happens. > > The script is from > http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/ > node2.html and > it works. > (I used it on an older machine with Bioperl and MacOS Tiger) > > ______________________________________________________________________ > ____________ > #!/usr/bin/perl -w > > use Bio::SeqIO; > use Bio::DB::GenBank; > > $genBank = new Bio::DB::GenBank; # This object knows how to talk > to GenBank > > my $seq = $genBank->get_Seq_by_acc('AF060485'); # get a record by > accession > > > my $seqOut = new Bio::SeqIO(-format => 'genbank'); > > $seqOut->write_seq($seq); > > > ______________________________________________________________________ > ___________________ > This is the error I get > ______________________________________________________________________ > ___________________ > > [home at home Desktop]# perl final-seq-db-test1.pl > Bio::SeqIO: genbank cannot be found > Exception > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Failed to load module Bio::SeqIO::genbank. Weak references are > not > implemented in the version of perl at > /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91 > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172. > BEGIN failed--compilation aborted at > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172. > Compilation failed in require at > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425. > > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::Root::Root::_load_module > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427 > STACK: Bio::SeqIO::_load_format_module > /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555 > STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172 > STACK: final-seq-db-test1.pl:8 > ----------------------------------------------------------- > > For more information about the SeqIO system please see the SeqIO docs. > This includes ways of checking for formats at compile time, not run > time > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: acc AF060485 does not exist > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173 > STACK: final-seq-db-test1.pl:8 > ----------------------------------------------------------- > [home at home Desktop]# Use of uninitialized value in concatenation > (.) or > string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/ > Util.pm > line 30. > > [home at home Desktop]# > > > ______________________________________________________________________ > __________________ > > > Before I mess things up further I thought I'd ask: > Can I fix this problem by reinstalling some part of Bioperl or Perl? > > Thanks, > > George > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Thu Dec 6 18:58:02 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 7 Dec 2007 10:58:02 +1100 Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST reload In-Reply-To: <47545590.1000703@boekhoff.info> References: <47545590.1000703@boekhoff.info> Message-ID: Sven, > I just started working with Perl and BioPerl. I'm quite impressed what > can be easily done with this module. Today I found that my second CPU > ist not used, but the first one run's at 100%. I tried to include the > "-a"-parameter, but I was not successful: My experience agrees with you, in that "-a" does not seem to work with the pre-compiled BLAST binaries you get from NCBI on a multi-core system. I'm not sure why, as "ldd blastall" shows it links against "/lib64/tls/libpthread.so.0". Any others have any ideas? -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University --Tel +61 3 9905 9010 From lzhtom at hotmail.com Thu Dec 6 23:25:42 2007 From: lzhtom at hotmail.com (zhihuali) Date: Fri, 7 Dec 2007 04:25:42 +0000 Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db? Message-ID: Hi netters, I've installed BioSQL and bioperl-db, and successfully created and stored a persistent object: use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB; my $dbadp=Bio::DB::BioDB->new(-database=>'biosql', -user=>'annoymous', -dbname=>'bioseqdb'); my $seqobj=Bio::Seq->new(-accession_number=>"test", -id=>"test1", -seq=>"AGCTAGCT", -version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj->create;$dbobj->commit; It's successful because I found corresponding rows in the bioseqdb tables. Now I want to retrieve the object back from the database. There's not much documents available and I've tried find_by_unique_key/primary_key but all failed. Maybe I didn't use them correctly. Could anyone give me an example as how to retrieve the stored Bio::Seq object? Thanks a lot! Zhihua Li _________________________________________________________________ ?? Live Search ?????????????? http://www.live.com/?searchOnly=true From Marc.Logghe at ablynx.com Fri Dec 7 03:33:17 2007 From: Marc.Logghe at ablynx.com (Marc Logghe) Date: Fri, 7 Dec 2007 09:33:17 +0100 Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db? In-Reply-To: Message-ID: <03C512635899144083CADB0EE222018901216FA5@alpaca.lan.ablynx.com> Hi, The BOSC presentation of Hilmar is a very good way to start with. Have a look at http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf Slide 18 for instance. Regards, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of zhihuali > Sent: vrijdag 7 december 2007 5:26 > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db? > > > Hi netters, > > I've installed BioSQL and bioperl-db, and successfully created and stored > a persistent object: > > use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB; > my $dbadp=Bio::DB::BioDB->new(-database=>'biosql', > -user=>'annoymous', -dbname=>'bioseqdb'); > > my $seqobj=Bio::Seq->new(-accession_number=>"test", - > id=>"test1", -seq=>"AGCTAGCT", - > version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj- > >create;$dbobj->commit; > > It's successful because I found corresponding rows in the bioseqdb tables. > > Now I want to retrieve the object back from the database. There's not much > documents available and I've tried find_by_unique_key/primary_key but all > failed. Maybe I didn't use them correctly. Could anyone give me an example > as how to retrieve the stored Bio::Seq object? > > Thanks a lot! > > Zhihua Li > _________________________________________________________________ > ?? Live Search ?????????????? > http://www.live.com/?searchOnly=true From avilella at gmail.com Fri Dec 7 05:32:43 2007 From: avilella at gmail.com (Albert Vilella) Date: Fri, 7 Dec 2007 10:32:43 +0000 Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm" In-Reply-To: References: Message-ID: <358f4d650712070232s3d9ed27xf1c5f17e2985bd90@mail.gmail.com> Hi Johan, It would be great if you could upload an example reproducible case: http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl Maybe simply doing a tar.gz of the directory with the sample files and the script, and a simple explanation on how to run it. If you have any special "env" vars regarding tmp files, could you specify those as well? Thanks, Albert. On Dec 5, 2007 11:35 AM, Johan Nilsson wrote: > > Hello, > > I have a bunch of multiple sequence alignments of protein coding genes, > which I would like to analyse with the SLAC method of the HyPhy package. I > tried using the SLAC.pm module in bioperl-run, but I could not get it to > work properly. > > Basically, for each MSA file, I create the Bio::Tree::Tree and > Bio::SimpleAlign objects ($tree and $aln, respectively) required as > arguments to SLAC, and call the method with: "($rc,$result) = > $slac->run($aln,$tree)" in a loop procedure in my script. > > When I choose not to save the tmp files (the default option in SLAC.pm), > the program complains that it cannot find the file > "$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA > (which works fine). Apparently, it looks for the wrapper.bf file in the > first tmp dir created, which is deleted in the end of the first SLAC call. > > If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')), > all calls to SLAC give returncode 1, and no error message is received. > However, when I look at the resulting $result hashref, it turns out that > all results are for the FIRST alignment read. I've made sure there is > nothing strange with my loop procedure, and I checked that the tree and > alignment objects look OK for each MSA. Apparently, it does create new > "results.tsv" files in the tmp directory after each run, but it is > identical each time it's created. Also, it only creates ONE tmp directory, > no matter how many times SLAC is executed (I would imagine it was supposed > to save each result in separate tmp dirs?) > > Thus, it seems to me like the errors occur because something goes wrong in > the creation of temporary files. Have I done something wrong here, or have > any other of you experienced the same problem? > > Best regards > /Johan > > > -- > Johan Nilsson, Ph.D. > School of Life Sciences > S?dert?rns University College > S-141 89 Huddinge, Sweden > E-mail: johan.nilsson at sh.se > Phone: +46 8 608 47 05, +46 70 456 10 51 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From J.Hane at murdoch.edu.au Mon Dec 10 02:31:17 2007 From: J.Hane at murdoch.edu.au (James Hane) Date: Mon, 10 Dec 2007 16:31:17 +0900 Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32 In-Reply-To: References: Message-ID: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au> I've been trying to compile some bioperl based scripts for win32 using perl2exe which have worked out really well - except I've noticed I cannot compile Align::IO, Bio::Location::Simple or Bio::Location::Atomic despite requiring perl2exe to include them. Anyone have any suggestions how to get these to compile? From Kevin.M.Brown at asu.edu Mon Dec 10 10:34:35 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 10 Dec 2007 08:34:35 -0700 Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32 In-Reply-To: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au> References: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au> Message-ID: <1A4207F8295607498283FE9E93B775B4041D0B82@EX02.asurite.ad.asu.edu> I use PAR to create exe's for windows users and it works fine with bioperl. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of James Hane > Sent: Monday, December 10, 2007 12:31 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32 > > I've been trying to compile some bioperl based scripts for win32 using > perl2exe which have worked out really well - except I've noticed I > cannot compile Align::IO, Bio::Location::Simple or > Bio::Location::Atomic > despite requiring perl2exe to include them. Anyone have any > suggestions > how to get these to compile? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Kevin.M.Brown at asu.edu Mon Dec 10 13:23:01 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 10 Dec 2007 11:23:01 -0700 Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoidBLAST reload In-Reply-To: References: <47545590.1000703@boekhoff.info> Message-ID: <1A4207F8295607498283FE9E93B775B4041D0CAD@EX02.asurite.ad.asu.edu> I use the -a option with blast all the time and it works, even on multicore systems. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Torsten Seemann > Sent: Thursday, December 06, 2007 4:58 PM > To: Sven Boekhoff > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] [StandAloneBLAST] Use more than one > CPU + avoidBLAST reload > > Sven, > > > I just started working with Perl and BioPerl. I'm quite > impressed what > > can be easily done with this module. Today I found that my > second CPU > > ist not used, but the first one run's at 100%. I tried to > include the > > "-a"-parameter, but I was not successful: > > My experience agrees with you, in that "-a" does not seem to work with > the pre-compiled BLAST binaries you get from NCBI on a multi-core > system. > > I'm not sure why, as "ldd blastall" shows it links against > "/lib64/tls/libpthread.so.0". > > Any others have any ideas? > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > --Tel +61 3 9905 9010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From nadav.denekamp at gmail.com Wed Dec 12 08:29:18 2007 From: nadav.denekamp at gmail.com (Nadav Y. Denekamp) Date: Wed, 12 Dec 2007 15:29:18 +0200 Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of idenifiers Message-ID: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2> Hello, I am trying to retrieve a list of sequences from an indexed flast FASTA file. I tried to use the script bp_fetch.pl but I could only retrieve one sequence for one identifier. I am looking for a way to provide a list of accession numbers to a script and to retrieve the sequences. I don't have much experience with perl so I appologize if this question is very basic thanks - Nadav ------------------------------------------------------------------------------------------------------------ Nadav Y. Denekamp, Ph.D., Israel Oceanographic and Limnological Research, National Institute for Oceanography Tel-Shikmona, Haifa, 31080. Tel: 972-4-8565259 Fax: 972-4-8511911 mobile: 972-50-2167318 Skype: nadavden Email: nadavd at ocean.org.il; nadav.denekamp at gmail.com; Visit the ?Sleeping Beauty? website: http://www.gmm.gu.se/SB From biojoiner at gmail.com Wed Dec 12 08:06:42 2007 From: biojoiner at gmail.com (=?GB2312?B?s8y35Q==?=) Date: Wed, 12 Dec 2007 21:06:42 +0800 Subject: [Bioperl-l] problem_About_Bioperl_Installation Message-ID: Dear Admin: I have a computer which out of network service, but wanted to have bioperl installed in it. I found the installation method all need net to link CPAN to get the pakage needed, so is there some complete installation program for me to install it in a net-isolated computer, or some other method to solve the problom? Wait for your kindful answer. Thanks very much! -- ============================================================ ???? ??????????????????????????HapMap?? ??????????????????????B??6???? ??????+86-10-80481102/1176 E-mail: chengf at genomics.org.cn http://www.big.ac.cn/ *********************************************************************************************** Feng Cheng Division of HapMap Project Beijing Institute of Genomics, Chinese Academy of Sciences (CAS) Beijing Airport Industrial Zone B-6, Beijing, 101318, China Tel: +86-10-80481102/1176 E-mail: chengf at genomics.org.cn http://www.big.ac.cn/ ============================================================ From avilella at gmail.com Wed Dec 12 09:50:16 2007 From: avilella at gmail.com (Albert Vilella) Date: Wed, 12 Dec 2007 14:50:16 +0000 Subject: [Bioperl-l] problem_About_Bioperl_Installation In-Reply-To: References: Message-ID: <358f4d650712120650u2ef40089ofe27725ea8497dd7@mail.gmail.com> You can also download the tar.gz packages from the bioperl.org website, and copy them to the computer. Then unpack the tar.gzs, and update your PERL5LIB env var. On Dec 12, 2007 1:06 PM, ???? wrote: > Dear Admin: > > I have a computer which out of network service, but wanted to have > bioperl installed in it. > I found the installation method all need net to link CPAN to get the > pakage needed, so is there some complete installation program for me to > install it in a net-isolated computer, or some other method to solve the > problom? > Wait for your kindful answer. > Thanks very much! > > -- > > ============================================================ > ???? > > ??????????????????????????HapMap?? > ??????????????????????B??6???? > ??????+86-10-80481102/1176 > E-mail: chengf at genomics.org.cn > http://www.big.ac.cn/ > > *********************************************************************************************** > Feng Cheng > > Division of HapMap Project > Beijing Institute of Genomics, Chinese Academy of Sciences (CAS) > Beijing Airport Industrial Zone B-6, Beijing, 101318, China > Tel: +86-10-80481102/1176 > E-mail: chengf at genomics.org.cn > http://www.big.ac.cn/ > ============================================================ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Dec 12 10:22:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 12 Dec 2007 09:22:45 -0600 Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of idenifiers In-Reply-To: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2> References: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2> Message-ID: If you use Bio::Index::Fasta (which is what bp_index.pl uses for FASTA files) then you can write up your own script. From 'perldoc Bio::Index::Fasta': # Once the index is made it can accessed, either in the # same script or a different one use Bio::Index::Fasta; use strict; my $Index_File_Name = shift; my $inx = Bio::Index::Fasta?>new(?filename => $Index_File_Name); my $out = Bio::SeqIO?>new(?format => ?Fasta?, ?fh => \*STDOUT); foreach my $id (@ARGV) { my $seq = $inx?>fetch($id); # Returns Bio::Seq object $out?>write_seq($seq); } # or, alternatively my $id; my $seq = $inx?>get_Seq_by_id($id); # identical to fetch() .... chris On Dec 12, 2007, at 7:29 AM, Nadav Y. Denekamp wrote: > Hello, > > I am trying to retrieve a list of sequences from an indexed flast > FASTA file. I tried to use the script bp_fetch.pl but I could only > retrieve one sequence for one identifier. I am looking for a way to > provide a list of accession numbers to a script and to retrieve the > sequences. I don't have much experience with perl so I appologize if > this question is very basic > thanks - Nadav > > > ------------------------------------------------------------------------------------------------------------ > Nadav Y. Denekamp, Ph.D., > Israel Oceanographic and Limnological Research, > National Institute for Oceanography > Tel-Shi